Other Posts

Posts, not listed in Friday Fun Session and JLTi Code Jam

  1. Dissecting Dates in the Context of C# MVC and Kendo Grid
  2. MS SQL Server Data (Table) Storage
  3. MS SQL Server Nvarchar Issues
  4. MS SQL Server Recovery Model
  5. Heartbleed Bug
  6. Cashier’s Order from DBS Online Banking
  7. FILESTREAM – Considerations, Restrictions and Limitations
  8. FILESTREAM – Setup
  9. FILESTREAM – Hardware/OS Consideration
  10. FILESTREAM – What and When
  11. Optimizing Inserts
  12. Replacing Subqueries With Join Might Drastically Boost Query Performance
  13. Are You Blindly Trusting Plans Generated by MS SQL Server?
  14. Searching Just One Record Taking Several Seconds?

Index

Problems in JLTi Code Jam

Transactions in Nested SP

34th Friday Fun Session (Part 2) – 22nd Sep 2017

When one (MS SQL Server) Stored Procedure (SP), say, A calls another SP, say, B inside it; we call the SPs are nested. It is not always the case that A calls B. B can be called directly as well. Now if they are required to execute within transaction, in both the cases – when executed individually or nested, we face some interesting situations. Here we discuss how to manage transactions for those cases, when scope is limited within a database.

Requirements

We usually face two kinds of requirements.

  1. Full Commit: either all in both parent (A) and child (B) have to be executed or all should fail.
  2. Partial Commit: if one or more calls to child fail, let them fail. But persist/commit the changes done by parent and the successful calls to child.

Nesting does not need to be limited within two levels (parent and child) as stated here. For example, B can further call another SP C, or A can call any of B or C and so on. The requirements can vary in some other ways as well. However, if we solve the above two cases, the same mechanism/logic can be used/extended to handle all other situations.

Note that, we are also considering the situation that the child, B can be directly called, and in that situation, we are making sure that it is executing within its own transaction.

Full commit

The way we have written the SPs A and B as shown below, A is calling B 6 times. However B fails on the last call. A is inserting data into a table called OuterTable and B is doing the same for InnerTable.

In all or nothing case, we want to make sure, when we execute A, if any call to B fails, no data should be inserted in any of the tables; all should fail. It is obvious that if all the calls pass, then all data changed by them should persist.

It is also required that if B is executed directly, everything happening inside it must happen within a transaction.

Code for SP A:

CREATE Procedure [A]
AS
BEGIN
  BEGIN TRANSACTION;

  BEGIN TRY
    DECLARE @Success AS INT
    DECLARE @Counter AS INT

    SET @Success = 0
    SET @Counter = 0

    INSERT INTO [OuterTable] ([Name]) VALUES(@Counter)

    WHILE < 6
    BEGIN
      EXEC @Success = [B] @Counter;

      IF @Success = 0
        THROW 50001, 'B Failed! Rollback all!', 1

      SET @Counter = @Counter + 1;
    END

    IF @@TRANCOUNT > 0
      COMMIT TRANSACTION;
  END TRY

  BEGIN CATCH
    SET @Success = 0

    IF @@TRANCOUNT > 0
      ROLLBACK TRANSACTION;
  END CATCH;

  RETURN @Success
END

Code for SP B:

CREATE Procedure [B] (@CallCounter INT)
AS
BEGIN
  BEGIN TRANSACTION;

  BEGIN TRY
    DECLARE @Success AS INT
    SET @Success = 0

    IF @CallCounter = 5
      THROW 50001, 'Last call cannot be executed! Rollback all!', 1

    INSERT INTO [InnerTable] ([Name]) VALUES(@Counter)

    SET @Success = 1

    IF @@TRANCOUNT > 0
      COMMIT TRANSACTION;
  END TRY

  BEGIN CATCH
    SET @Success = 0

    IF @@TRANCOUNT > 0
      ROLLBACK TRANSACTION;
  END CATCH;

  RETURN @Success
END

When A gets called it starts a transaction. It then calls B, that succeeds 5 times, and subsequently fails on the last one.

@@TRANCOUNT is important

When B starts, it begins a transaction, unconditionally. Each time A calls B, @@TRANCOUNT gets incremented by 1 and reaches 2. Each time B succeeds, it executes a commit, thus reducing @@TRANCOUNT Back to 1, corresponding to the transaction started by A.

However, the last call to B fails. It then throws an exception, that gets the execution flow jump into the catch block, where it checks @@TRANCOUNT, that is now 2. It then rollbacks the transaction. A rollback discards all transactions in the nested call and resets @@TRANCOUNT. Before commit or rollback, it is a safe practice to make sure that it is still under a transaction, by checking @@TRANCOUNT value that should be greater than 0.

At this situation, B returns 0 to indicate failure (usual practice is to use 0 for indicating success though). A gets to know that B has failed. A throws exception, and ends up in catch block, where it checks @@TRANCOUNT, that is already 0. No further rollback is called and at the end 0 is returned from A to mean that it has failed.

Everything that A and B have done so far are all rolled back. After all, no matter how many transactions are started, a single rollback can discard all. At the end, no data changes happen to either of the tables, where they inserted data.

If B executes directly, depending on the parameter, data would persist or it would fail. However, all inside it would happen within a transaction.

Partial Commit

In this case, we want to persist the changes made by 5 successful calls to B, along with what A has done. The only thing that should get discarded is the failed 6th call to B.

Code for SP A:

CREATE Procedure [A]
AS
BEGIN
  BEGIN TRANSACTION;

  BEGIN TRY
    DECLARE @Success AS INT
    DECLARE @Counter AS INT

    SET @Success = 0
    SET @Counter = 0

    INSERT INTO [OuterTable] ([Name]) VALUES(@Counter)

    WHILE < 6
    BEGIN
      EXEC @Success = [B] @Counter;

      IF @Success = 0
        PRINT 'I know B Failed! But I am just ignoring that!', 1

      IF @Success = 2
        THROW 50001, 'Something terribly wrong! Rollback all!', 1

      SET @Counter = @Counter + 1;
    END

    IF @@TRANCOUNT > 0
      COMMIT TRANSACTION;
  END TRY

  BEGIN CATCH
    SET @Success = 0

    IF @@TRANCOUNT > 0
      ROLLBACK TRANSACTION;
  END CATCH;

  RETURN @Success
END

Code for SP B:

CREATE Procedure [B] (@CallCounter INT)
AS
BEGIN
  DECLARE @ProcedureSave NVARCHAR(MAX)
  DECLARE @TranCounter INT;

  SET @ProcedureSave = 'B-' + CONVERT(CHAR(23), GETDATE(), 121)
  SET @TranCounter = @@TRANCOUNT;

  IF @TranCounter > 0
    SAVE TRANSACTION @ProcedureSave;
  ELSE
    BEGIN TRANSACTION;

  BEGIN TRY
    DECLARE @Success AS INT
    SET @Success = 0

    IF @CallCounter = 5
      THROW 50001, 'Last call cannot be executed! Rollback all!', 1

    INSERT INTO [InnerTable] ([Name]) VALUES(@Counter)

    SET @Success = 1

    IF @TranCounter = 0 AND @@TRANCOUNT > 0
      COMMIT TRANSACTION;
  END TRY

  BEGIN CATCH
    SET @Success = 0

    IF @TranCounter = 0
      ROLLBACK TRANSACTION;
    ELSE
      IF XACT_STATE() <> -1
        ROLLBACK TRANSACTION @ProcedureSave;
      ELSE
        SET @Success = 2
  END CATCH;

  RETURN @Success
END

Savepoint

B starts a transaction, only if it is already not inside one. Meaning, if it is directly called then it starts a transaction of its own. However, if it is called by A, it does not do so. In that case, it creates a savepoint. A savepoint name is created in a way so that for each instance of the SP’s execution, it remains unique. A savepoint is a point where it can come back by executing a partial rollback. In such a situation, all changes between the savepoint and the place where partial rollback is called get discarded.

That is precisely what happens, when B fails and an exception is thrown. In catch block, it checks whether it started the transaction on its own. If that is the case, it simply calls a (full) rollback. However, if it was called by A, that also means it created a savepoint; it would simply execute a partial rollback using the savepoint. A partial rollback only discards the changes done by B itself.

XACT_STATE() is a safe choice

However, prior to executing the partial rollback it should check whether it is safe to do so. If XACT_STATE() returns -1, that means something went wrong, something that should happen rarely. In this situation, a partial rollback is no longer possible – only a full rollback can be done. Special failure code 2 is returned by B to indicate such a colossal failure.

A remains mostly the same. The only thing that gets changed is when it knows B has failed. Earlier, it would have thrown an exception. But this time, if the returned failure code is 0, it simply ignores that B has failed. If the returned failure code is 2, it knows that B fails to execute partial rollback  – something really bad – A executes a (full) rollback.

In the above case, we see that the outer table gets one new row, and the inner table gets 5 new rows; notwithstanding the fact that 6th call to B fails.

When B is called directly – not from A – it manages its own transaction; savepoint is never used.

Index

Executing SP using EF

34th Friday Fun Session (Part 1) – 22nd Sep 2017

Many a times, we use Entity Framework (EF), Microsoft’s recommended data access technology for an application, to execute (MS SQL Server) Stored Procedure (SP), and retrieve the results emitted by them. Here we discuss different kinds of output that a SP can produce and how we can retrieve them using EF.

SP output

A SP typically provides the following kinds of results:

  1. Return code (single integer)
  2. Output parameters (one or more, any data type)
  3. Result set
    1. Single result set
    2. Multiple result set
      1. All having the same schema
      2. Having different schema

Return code

SP can return a single integer return code. Return statement without any value (null) would automatically return 0. It is mostly used to exit execution of a SP when certain condition is met.

CREATE Procedure SpReturnCode
AS
BEGIN
  RETURN 1;
END

Using T-SQL we can execute SP like below.

DECLARE @Success AS INT
EXEC @Success = [SpReturnCode];
PRINT @Success

Output parameter

SP can return one or more values, each having its own data type.

CREATE Procedure SpOutputParameter (@InputValue INT, @OutputValue INT OUTPUT)
AS
BEGIN
  SET @OutputValue = @InputValue + 1;
  RETURN 0;
END

Using T-SQL we can execute SP like below.

DECLARE @ReturnValue INT;
EXECUTE [SpOutputParameter] 2, @ReturnValue OUTPUT;
PRINT @ReturnValue

Single result set

Returns a result set having 0 or more rows of a certain schema. The following SP returns a result set with a single column named, Success.

CREATE Procedure SpSingleResultSet
AS
BEGIN
  SELECT 3 AS Success
  RETURN 0;
END

Multiple result set, same schema

The following SP returns the result set for Employee schema twice.

CREATE Procedure SpMultipleResultSetSameSchema
AS
BEGIN
  SELECT * FROM [Employee]
  SELECT * FROM [Employee] SELECT [EmployeeId] > 10

  RETURN 0;
END

The following SP returns a result set that is not associated with any database entity.

CREATE Procedure SpMultipleResultSetNonDbContextEntity
AS
BEGIN
  DECLARE @Loop AS INT
  SET @Loop = 0

  WHILE @Loop < 10
  BEGIN
    EXEC SpSingleResultSet
    SET @Loop = @Loop + 1
  END

  RETURN 0;
END

Multiple result set, multiple schema

The following SP returns two different result sets: one for Company and another for Employee.

CREATE Procedure SpMultipleResultSetMultipleSchema
AS
BEGIN
  SELECT * FROM [Company]
  SELECT * FROM [Employee]

  RETURN 0;
END

Executing SP using EF

We will use the following different ways in EF to read different kinds of SP output as described earlier:

  1. ExecuteSqlCommand
  2. SqlQuery
  3. CreateCommand/ExecuteReader

ExecuteSqlCommand

This executes a given DDL/DML command. This can be executed when no result set needs to be returned.

Return code

Return code does not require any explicit output parameter to be used in the SP. However, while calling from EF, it should be treated as an output parameter by specifying the direction for the parameter.

SqlParameter returnCode = new SqlParameter("@ReturnCode", SqlDbType.Int);
returnCode.Direction = ParameterDirection.Output;

Db.Database.ExecuteSqlCommand("exec @ReturnCode = [SpReturnCode] ", returnCode);
var returnCodeValue = (int)returnCode.Value;

Output Parameter

For each of the output parameters, we need to declare an output parameter in EF matching the appropriate data type.

SqlParameter inputParam = new SqlParameter("@InputValue", SqlDbType.Int);
inputParam.Direction = ParameterDirection.Input;

SqlParameter outputParam = new SqlParameter("@OutputValue ", SqlDbType.Int);
outputParam.Direction = ParameterDirection.Output;

Db.Database.ExecuteSqlCommand("[SpOutputParameter] @InputValue, @OutputValue OUT", inputParam, outputParam);
var returnValue = (int)outputParam.Value;

SqlQuery

SqlQuery is usually used when SP returns a single result set. However, it can return any data type including primitive types – not necessarily only entity type. If the SP returns multiple result sets, it will only get the first one. However, the complete execution of the entire SP does happen.

public class SuccessSet
{
  public int Success { get; set; }
}

var result = Db.Database.SqlQuery("[SpSingleResultSet]").ToList();

CreateCommand/ExecuteReader

When multiple result sets to be returned this method can be used. We need to use IObjectContextAdapter interface that makes use explicit interface implementation.

Db.Database.Initialize(force: false);
var cmd = Db.Database.Connection.CreateCommand();
cmd.CommandText = "[SpMultipleResultSetSameSchema]";

Db.Database.Connection.Open();

var reader = cmd.ExecuteReader();
var employees =
  ((System.Data.Entity.Infrastructure.IObjectContextAdapter)Db)
    .ObjectContext
    .Translate<Employee>(reader, "Employee", System.Data.Entity.Core.Objects.MergeOption.AppendOnly);

foreach (var employee in employees)
  Console.WriteLine(employee. Name);

While(!reader.NextResult())
{
  employees =
    ((System.Data.Entity.Infrastructure.IObjectContextAdapter)Db)
    .ObjectContext
    .Translate<Employee>(reader, "Employee", System.Data.Entity.Core.Objects.MergeOption.AppendOnly);

  foreach (var employee in employees)
    Console.WriteLine(employee. Name);
}

Db.GetDatabase().Connection.Close();

When result set is not in DbContext, slight modification is required.

var successSets =
  ((System.Data.Entity.Infrastructure.IObjectContextAdapter)Db)
  .ObjectContext
  .Translate<SuccessSet>(reader);

When different schema are used we can still use the same, we just need to use the right object while translating.

var companies =
  ((System.Data.Entity.Infrastructure.IObjectContextAdapter)Db)
  .ObjectContext
  .Translate<Company>(reader, "Company", System.Data.Entity.Core.Objects.MergeOption.AppendOnly);

reader.NextResult();

var employees =
  ((System.Data.Entity.Infrastructure.IObjectContextAdapter)Db)
  .ObjectContext
  .Translate<Employee>(reader, "Employee", System.Data.Entity.Core.Objects.MergeOption.AppendOnly);

Index

Synchronizing Web System

22nd Friday Fun Session (Part 3) – 16th Jun 2017

When we have to implement an idempotent operation in web application, where the respective method execution is not idempotent when executed simultaneously, it is essential that we use concurrency control. Here we focus on .NET application deployed in IIS.

Idempotent operation

Suppose user requests to approve a certain transaction, say, transaction Id 100, hence it is data specific. We need to make sure this is an idempotent operation, meaning, no matter how many times the request comes, simultaneously or serially, the end result should be the same.

Concurrency and thread synchronization

If more than one request comes, from the same user or from multiple users simultaneously, we have seen IIS might launch multiple threads, one for each request. It is possible that they are executed simultaneously.

Let’s take a closer look. Suppose the two requests were initiated at 12:00:00:000 PM. The requests ended up in IIS at 12:00:01:000 PM, two threads starts processing them at the same time at 12:00:01:001 PM. Both the threads find that the transaction has not been approved yet, and proceed to do the same thing, like send an email, make some database entries etc. At the end both mark that the transaction is approved.

So we see that, what was supposed to be an idempotent operation has ended up sending two emails and so on, clearly failing to be one. If we don’t have any concurrency control in place, nothing is stopping from sending two emails from the two threads. After all, both the threads are executing simultaneously checking, finding and doing the same thing, at the exact same time.

Concurrency control is essential here. That means, at the beginning we need to place a gate through which only one thread can enter, at a time. Once it enters, it should mark that it is approving a certain transaction. Then all other threads who would enter the gate later serially (one after another), can detect that somebody else is processing that exact same request and quit gracefully.

Intra-process synchronization

We will walk through some options, even though they would not qualify to be the desired solution. We would do so just to explain the issues around the options, so that we can rationalize the final solution(s).

Let us start with lock, provided by C#. lock can implement critical section, meaning it can make a certain portion of the code executable only by one thread at a time. So the transaction processing method can be wrapped using a lock. That will make sure only one thread of a process is executing the method at any given point of time.

However, we have two issues here:

First, we are locking the whole method. We wanted to make sure only transaction Id 100 gets approved only by one thread at any point of time. But we end up blocking approval for all other transactions, say, transaction Id 99 or 101, when approval for transaction Id 100 is going on.

We can solve this issue by implementing a named lock, meaning the lock will have a name, say, TranApproval_100. That way, only threads executing approval requests for the same transaction will be executed serially. Instead of using lock, we can achieve the same using interned string, named mutex etcas well.

Second, the scope of the lock is within the process. Nobody outside the process knows about it. However, we know that in web garden configuration, there might be more than one process running for the same web application and the two threads can come from two different processes. In that case, the thread in the first process would not know that the thread in the second process is having a lock. Hence both the threads would happily execute the same method simultaneously. We can solve this problem by using named mutex, an inter-process synchronization mechanism.

Inter-process synchronization

We see that using a named mutex, say, the name of the mutex is TranApproval_100, we can make sure we don’t block approval for other transactions. Meaning only thread approving transaction Id 100 will execute, without blocking approval for, say, transaction Id 99 or 101.

We also see that between the two threads running in two processes in web garden configuration, only one would do the approval and other would quit. This is because the named mutex is visible to all processes throughout the operating system.

However, we also know that in web farm configuration, the two processes, executing the two threads, both intending to serve the approval processing for transaction Id 100, might be running in two different web servers. In such a case, this inter-process synchronization mechanism that we just explored, would not work.

External and Common

So we see that we have to use something that is both external and common to web servers. If we have one common database that is used by all the processes then we can use something from the database to make sure that the threads are processing the requests one by one.

Using transaction within application

Using the concurrency control support provided by database, we can make sure only one method is doing the database operations, at any point of time. If we are using ORM, say Entity Framework, then we can use the transaction support provided from version 6 onwards (Database.BeginTransaction(), Commit, Rollback etc.). Since we want them execute serially, we know we have to use serializable isolation level System.Data.IsolationLevel.Serializable. We can begin the transaction with this isolation level parameter.

There are two problems associated with this. First, we are again serializing the whole transaction approval process, meaning blocking approval for transaction Id 101, while processing the same for transaction Id 100.

Second, we cannot stop non-database operations like sending email etc.

Named synchronization using database

The first problem can be solved by, as we have seen previously, using a named synchronization mechanism. The second problem can also be solved using the same, but we need to make sure we are using the synchronization mechanism to serialize the whole method, not just the database transaction.

Named lock table

We first create the following table that will keep track of the transactions, presently under processing.

CREATE TABLE [Web].[LockInfo]
(
  [LockInfoId] [int] IDENTITY(1,1) NOT NULL, 
  [LockName] [nvarchar](256) NULL, 
  [DurationInSeconds] [int] NULL, 
  [CreatedOn] [datetime] NOT NULL, 
  CONSTRAINT [PK_LockInfoId] PRIMARY KEY CLUSTERED 
  ( 
    [LockInfoId] ASC
  )
  WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, 
  IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) 
  ON [PRIMARY]
) ON [PRIMARY]

GO

ALTER TABLE [Web].[LockInfo] ADD  DEFAULT ((60)) FOR [DurationInSeconds]
Lock name

LockName is the name of the lock, like TranAproval_100.

Duration

We should not hold the lock forever. If after acquiring the lock things go wrong, for example, the caller somehow misses to unlock it or there is a crash, this resource cannot be locked again by any thread in future. The created date along with the default duration of 1 minute would make sure, after a minute this lock becomes invalid. This is based on the assumption that approval processing would be done within a minute.

Using stored procedure

We create the following stored procedure.

CREATE Procedure [Web].[usp_Lock] 
  (@Lock BIT, @LockName NVARCHAR(256), @LockDuration INT = NULL) 
  AS 
    BEGIN SET TRANSACTION ISOLATION LEVEL SERIALIZABLE 
    BEGIN TRY 
      BEGIN TRANSACTION 

      DECLARE @Success AS BIT 
      SET @Success = 0

      IF(@Lock IS NULL OR @LockName IS NULL) 
      BEGIN  
        SELECT @Success AS Success; 
        ROLLBACK TRANSACTION
        RETURN; 
      END 

      IF(@Lock = 1) -- LOCK
      BEGIN 
        DELETE FROM [Web].[LockInfo] 
        WHERE @LockName = [LockName] AND 
        DATEADD(SECOND, [DurationInSeconds], [CreatedOn]) < GETUTCDATE();

        IF NOT EXISTS (
                       SELECT * FROM [Web].[LockInfo] 
                       WHERE @LockName = [LockName]) 
        BEGIN 
          IF(@LockDuration IS NULL) 
            INSERT INTO [Web].[LockInfo]  
            ([LockName], [CreatedOn]) 
            VALUES(@LockName, GETUTCDATE()) 
          ELSE 
            INSERT INTO [Web].[LockInfo]  
            ([LockName], [DurationInSeconds], [CreatedOn]) 
            VALUES(@LockName, @LockDuration, GETUTCDATE())
          
          SET @Success = 1 
        END 
      END 
      
      ELSE -- UNLOCK
      BEGIN 
        DELETE FROM [Web].[LockInfo] 
        WHERE @LockName = [LockName] 
        SET @Success = 1 
      END 
      
      COMMIT TRANSACTION 
    END TRY
  
    BEGIN CATCH 
      SET @Success = 0 
      IF(@@TRANCOUNT > 0) 
        ROLLBACK TRANSACTION
    END CATCH;
 
    SELECT @Success AS Success; 
  END

Lock and unlock

This stored procedure can be used for both lock and unlock operation of a named resource. The lock can be acquired for a given interval. In absence of any duration parameter, default duration of 1 minute will be used.

Leveraging isolation level

We make sure that this stored procedure can be executed serially, thus making sure only one transaction execute, at any point of time.

Here we focus on making the lock/unlock function faster, compromising concurrency by using the highest isolation level.

Leveraging consistency

We could have used unique constraint on lock name column in the Web.LockInfo table. In that case, we could use the default READ COMMITTED isolation of MS SQL Server, in the stored procedure, increasing concurrency. If a second thread would want to take the lock on the same resource, it would try to insert a row with the same lock name, resulting in failure, due to the unique constraint checking.

Performance

The two have different performance implications that we explain using an example.

Suppose two threads started executing simultaneously. The faster stored procedure takes one millisecond to execute. The first thread takes one millisecond to lock and then 100 milliseconds to execute the main approval processing, taking a total of 101 milliseconds.

The second stored procedure would wait one millisecond for the first stored procedure to lock. Then it would take one more millisecond to check that it is already locked, and hence it would take a total of 2 milliseconds to quit the processing.

On the other hand, suppose the stored procedure using default isolation level and unique constraint takes 3 milliseconds to lock. The first thread would take 103 milliseconds to execute.

The second thread that starts at the same time would take 3 milliseconds to fail while acquiring the lock and quit.

Based on the load and usage pattern, an appropriate mechanism can be chosen.

Usage for general purpose locking

This mechanism can be used not only for synchronizing transaction approval but all other cases that use named resources. Based on the load, more than one lock table (and/or stored procedure) can be used, each supporting certain modules.

GitHub: Named Resource Lock

Index

Interpreting IIS Internals

22nd Friday Fun Session (Part 2) – 16th Jun 2017

We are trying to understand how an HTTP request is processed by .NET web application, hosted in IIS in various scenarios with a focus on synchronization of the processing of that request. To be precise, we are interested in the thread and process contexts involved while serving an HTTP request.

Client request

Multiple clients across the globe, using their respective browsers, sending HTTP request to the web server.

Web server

All these http requests are ending up in IIS web server, hosting the application.

IIS kernel mode

In web server, HTTP listener, a kernel mode device driver, part of network subsystem, and part of the IIS – kernel mode of IIS to be precise, called http protocol stack (Http.sys), listens for http requests.

HTTPS.sys, as a forwarder, might directly pass the request to the right worker process, or as a request queuer, queues it unless a worker process picks it up. Once the response of that request reaches to it, it returns that back to client browser. Also as a kernel level cacher, it does some kernel level caching and if possible, returns the cached output directly, without involving any user level processing.

Worker process

Worker process, w3wp.exe, an executable, a process to OS, runs inside IIS user mode, is little different than other processes in the operating system (OS), in the sense that it can contain multiple application domains.

Application domain

Application domain represented by AppDomain object, a .NET concept, is a virtual process within a process. As said, it runs within a worker process. One application domain does not share its static variables etc. with another application domain running within the same worker process.

Usually, all static variables etc. of a process are visible to all within a process. It is not the case for worker process. This is how worker process is a special process, where one or more application domains are running inside it, as if each of them is a separate process, providing isolation. So what usually we are used to thinking as per process, in the world of IIS, inside worker process, it is actually per application domain.

Why Application domain, you might ask. Well, a web server can have many applications. Creating one worker process for each of them will end up creating many processes that is quite expensive. Creating an application domain for each of them and putting them together inside a single process is much cheaper.

Note that, one of the application domains can be stopped without affecting other application domains inside a worker process.

Worker thread

When worker process receives a request, it uses worker thread to process that request. If two users send two requests to the same web application, both of them can be simultaneously executed by two worker threads and served.

At any point of time, a worker thread can only be executed within a single application domain. An application domain can have multiple worker threads. But these worker threads are not confined to a single application domain. Rather they belong to worker process. When a thread is done with serving a request for a particular application domain, it is freed. At a later point of time, the same thread can be used to serve another request, belonging to a different application domain.

Web application

We develop web application. We are interested to know how this ending up running in IIS environment. Application domains are associated with web application. One web application has typically, one application domain running inside a worker process. One worker process might be running many application domains, each supporting a separate web application.

Application pool

Application pool is the container for (web) applications. Every application has to be assigned to a single application pool. A number of web applications can be assigned to a single application pool. But as mentioned earlier, a single application cannot be assigned to multiple application pools. All applications belonging to an application pool share the same configuration.

Worker process runs inside the application pool.

Application pool can be recycled, restarted. Applications belonging to other application pool are not affected by this. Thus application pool provides isolation.

So we see that a number of applications are contained within an application pool. And then a worker process running inside an application pool is again running a number of application domains, each application domain serving a different web application. Thus, we have two different levels of isolation.

Web garden

How many worker processes can be there inside an application pool? Well, it is configurable. By default, it is only one worker process running inside an application pool, supporting multiple web applications, by creating one application domain for each of the applications. And all these application domains are running as separate processes within that single worker process.

However, application pool can be configured to run more than one worker processes. In that case, it is called a web garden. In this situation, multiple worker processes can be running in a single application pool. Each of these worker processes, once again running multiple application domains, each belonging to one application.

In this scenario, each of the worker processes can have its own application domain for the same application. In other words, for a certain web application, we can have multiple application domains, each running in a separate worker process, all in the same application pool. To be precise, one application or web site can have multiple instances running in a single web server, if web garden is enabled.

This is important as it renders uses of static variables, application-wide variables etc. problematic in web application.

Web farm

When one web server is not enough to serve the clients requests, we need more of them to host the same application/web site. We call it web farm.

A load balancer would sit in front of the web servers and its IP will be exposed to external world. HTTP requests will come to load balancer first and it will then distribute the load to different web servers.

Individual web server can share the same database or replicated/partitioned database.

In a nutshell

Single server, application pool running one worker process

So we see that, multiple https requests for the same web application would be simultaneously served by multiple threads. Those threads can be executed within a single application domain belonging to a single worker process. This happens when only one worker process is set to run for an application pool.

Simple.png

In the above image, we see IIS having two parts – system and user mode. HTTP.sys is in kernel mode, forwarding HTTP request to 3rd application pool, belonging to application X. We further see that a single worker process inside that 3rd application pool is running two application domains X and Y. Two threads within application domain X – Thread 1 and Thread 2 are serving the two requests, respectively. The response will go back to client browser through HTTP.sys.

Single web server, application pool running more than one worker process, called web garden

Or the threads can come from different application domains associated with the same web application or web site, running inside different worker processes, all contained within the same application pool. This can happen in web garden configuration, where multiple worker processes are allowed to execute within a single application pool. We can understand any locking mechanism that works within a single process would not work in this setup. We would need to implement an inter-process synchronization mechanism, if our application is deployed in web garden.

Web Garden

In the above image, showing web garden, two requests are being served by two worker threads, belonging to two application domains (both associated with same web application), each running in a separate worker process, both of them (worker processes) contained within the same application pool.

Multiple web servers behind load balancer, called web farm

Or the threads can come from different physical web servers. This can happen in a web farm scenario, where multiple web servers sit behind a load balancer. We can understand that an inter-process synchronization mechanism, which works across the processes within an OS, would not work here. Since we have multiple web servers here, each running its own OS, inter-process synchronization mechanism would not work for application-wide synchronization.

Web Farm.png

In the above image, showing web farm, two requests are being served by two worker threads, each running in a separate web server.

Index

Incision into Isolation Levels

22nd Friday Fun Session (Part 1) – 16th Jun 2017

We are trying to see how isolation level, serializable to be precise, can help us implementing a synchronization mechanism for web application.

Let us start with ACID

ACID stands for Atomicity, Consistency, Isolation and Durability. It is detailed in ISO standard. Database systems implement this so that a sequence of operations, called as transaction, can be perceived as a single logical operation.

Atomicity

All operations of a transaction are all done or nothing done. Logging with undo capability can be used to achieve this.

Consistency

Given that all database constraints (foreign key, unique etc.) are valid at the beginning, the same should be maintained, at the end of the transaction as well.

Durability

All changes done by a committed transaction must go to storage even if database system crashes in the middle. Logging with redo capability can be used to achieve this.

Why Logging?

We talked about logging and then redo/undo in the previous sections. Why Logging? Well, when some transactions changes data, they are not immediately written to disk. Rather those pages are marked as dirty. Lazy writing flushes them to disk later. Instant writing to disk is expensive. Instead, logging the operation that is directly written to disk immediately, is much cheaper.

However, performance, while important is not a must. Logging is essential to ensure atomicity and durability. Any modification must be written to log before applying to actual database. This is known as write-ahead logging (WAL) This is to make sure that in case of a crash (say, 2 out 5 operations of a transaction are written to database storage and then it crashes), system can come back, read the log and figure out what was supposed to be done and what was not supposed to be done. By redoing and undoing necessary operations, durability and atomicity is ensured.

Focus on the I of ACID

Today we focus on the I of ACID, called isolation. When we are writing a transaction, we write the operations inside it thinking nobody else is doing anything else to the data that we are dealing with. Isolation property defines such an environment and database systems implements that.

So, why do we need such an environment? Well, without this, in a highly concurrent transaction execution environment, our understanding of the data we are working with will not hold true, as other would change them simultaneously. It will happen largely due to three problems: dirty read, non-repeatable read, and phantom read.

However, creating such an isolated environment is expensive in terms of performance. Hence, a number of other isolation levels are introduced, giving various degrees of isolation rather than a complete isolation.

The ISO standard defines the following Isolation levels that we will describe in terms of two transactions T1 and T2 that executes in parallel.

Read Uncommitted

Transaction 1 (T1) updates salary for Joe
Transaction 2 (T2) reads updated salary for Joe
T1 aborts transaction

We see that, T2 read dirty (because T1 did not commit the updated salary) data and went ahead with his decisions/operations inside it based on it, that was of course a wrong thing it did.

As the name implies Read Uncommitted reads uncommitted data, also called dirty data that is wrong. So we see, this isolation level does not guarantee isolation property and it is an example of a weaker isolation level. Note that along with dirty read, it also has non-repeatable read and phantom read problems.

Read Committed

The next better isolation level, as the name Read Committed implies, reads only committed data and solves the dirty read problem encountered previously in Read Uncommitted isolation level. Let us see through an example. Now T1 is running in Read Committed isolation level.

T1 reads the salary of Joe
T2 updates the salary of Joe and commits
T1 reads the salary of Joe

So we see T1 reads the salary of Joe twice, and it is different in the two cases. In the second case, it reads the data that was modified and committed by T2. No more dirty read by T1. Good.

But the isolation property expects each transaction to happen in complete isolation, meaning it would assume it is the only transaction that is taking place now. Joe’s salary was not updated by T1. Then why should T1 see different data when it reads the second time?

So we see, T1 could not repeat a read (the same salary for Joe). Hence, this problem is called non-repeatable read. Read Committed, like Read Uncommitted is another weaker isolation level. Again, note that, along with non-repeatable read it also has the phantom read problem.

Repeatable Read

To solve the non-repeatable read problem, Repeatable Read isolation level comes into picture. Since T1 reads the salary of Joe, no other transaction should be able to modify Joe’s data if we run T1 in Repeatable Read isolation level.

If we repeat the previous transactions we did earlier we would see T2 waits for T1 to finish first. Because T1 would use the right locks on the rows it reads so that others cannot delete/modify it. Repeatable read is solving the non-repeatable read problem as the name implies.

However, that won’t stop new data insertion. After all, Repeatable Read put necessary locks only on the data that it has read, not on future data. Hence, we will see ghost/phantom data. Let’s see an example.

T1 reads 4 rows in employee table
T2 inserts one record in employee table and commits
T1 reads 5 rows in employee table

We see that T1 sees a phantom row (the newly inserted row by T2) in its second read of employee table. Repeatable Read, once again, another weaker isolation level.

Serializable

So far, we see different isolation levels providing different degrees of isolation level but not what I of ACID really defines as isolation. We also know that weaker isolation levels are introduced to avoid the performance penalty that occurs for executing transactions in complete isolation. But at times, it becomes an absolute necessity to execute transaction in full isolation. Serializable comes into picture to implement that complete isolation. In serializable isolation level, it is ensured that we get the effect as if all transactions are happened one after another, in the order they started.

So if we rerun the earlier two transactions, we would see T2 waiting for T1 to complete first. Hence both the reads of T1 would read 4 rows. Only after T1 is done that T2 would insert a new row.

This solves all the three problems: dirty read, non-repeatable read and phantom reads.

At this point, it can be mentioned here that ISO standard expects serializable, not serial. The end result of a serializable execution is to produce a result equivalent to executing them one after another. Serializable does not necessarily executing transaction one after another, just that the end result is the same, had they executed serially.

MS SQL Server implementation

With Serializable, we are done with the 4 ISO transaction isolation levels. MS SQL Server implements all of them. In addition, it implements a fifth one, called Snapshot.

Snapshot

It is an isolation level that solves all the three problems just like serializable. So, why do we have two isolation levels doing the same thing? What special thing snapshot is doing?

If we closely observe the earlier serializable isolation level, implemented using locks, we see that it is too pessimistic. T2 has to wait for T1 to finish. But T2 could be simultaneously executed. After all, T1 is only reading, not modifying any data.

Snapshot comes into picture with optimistic concurrency control. It uses multiversion concurrency control (MVCC) to implement this. Every transaction starts with the latest committed copy it sees and keeps on executing the operations inside it.

So, for the last example we saw, in snapshot, T1 would read 4 rows in both the reads. After all, it had its own private copy. On the other hand T2 would start with its own copy, add a row in the middle. At the end, it would see there was no conflict. This is because no other transaction, T1 in this case, did anything conflicting. So, two transactions are simultaneously executed without violating the requirements of a serializable solation level.

What would happen if both T1 and T2 modify the same data, creating a conflicting situation? Well, both the transaction started with its own copy hoping that at the end there would no conflict, hence it is called optimistic. But if there is a conflict, the one committed first would win. The other would fail and rollback.

By the way, even though it is called serializable, write skew, anomaly is still present and it cannot be called serializable in ISO definition.

Database needs to be configured, which at times takes a while, to use this isolation. Again, since each transaction uses its own private copy, it is resource intensive.

At a glance

Isolation levels

Default transaction isolation level for MS SQL Server

Read Committed is the default isolation level set for MS SQL Server. Keeping performance in mind, it is done this way. So all along if you had thought, by default, you were getting the I of ACID by SQL Server, you are wrong. You are living with non-repeatable read and phantom read unless you have explicitly changed the isolation level or used locks.

How to set isolation level in MS SQL Server?

We can set one isolation level at a time using the following command:

SET TRANSACTION ISOLATION LEVEL 
   { READ UNCOMMITTED 
   | READ COMMITTED 
   | REPEATABLE READ 
   | SNAPSHOT 
   | SERIALIZABLE 
   } 
[ ; ]

As mentioned earlier, to set snapshot isolation level some database specific configuration is required before executing the above command.

How long isolation level remains active?

Once set, it lasts for the session, the duration of which is largely controlled by the component that creates it. When another session starts, it starts with the default Read Committed.

Transaction

It is obvious yet important to remember, isolation levels works on transaction. After all, it is called transaction isolation level. If we want to isolate (using isolation level) the execution of a set operation as a single logical operation, then they have to be wrapped with Begin and Commit transaction.

Index

Dissecting Dates in the Context of C# MVC and Kendo Grid

The date in server

Recently, I wanted to send a date from server to client and in client a Kendo grid to show as it is without applying local time zone.

Let us try to understand date in server side in the world of C#. I have created a date using the below line. It is 1st April 2015 and tagging it as local date. When I say local date I mean the local date of the computer where this code is executing, that is the application server’s date. Obviously, the application server and browser can be located in two different time zones.

image001

image002

I could also say that the date is not local but UTC.

image003

image004

If we look at the two dates in details we see that both say it is 1st April 2015. We also see an important thing Ticks. This says the number of ticks since 0:00:00 UTC on 1st Jan, 0001. 1 millisecond = 10, 000 ticks.

This means no matter whether we say UTC or local the number of ticks remains the same till 1st April 2015 as long as we have a different attribute in C# DateTime object called Kind indicating what kind of date it is.

Apart from UTC and Local, there is a third Kind called Unspecified. If you are reading the date from say, database where the date read does not have any extra property for Kind it will be marked as Unspecified. Json would treat it as local Kind while creating a string. If you know that the date was indeed saved as UTC and if you want to mark it as a UTC date then the below function would do that without applying any time zone offset.

image005

The date in client

Now that we have a C# date object with its Kind property we would like to pass this all the way to client as a JSON string. Let us use the Json function.

image006

In client side, this is what we get for the first case where Kind property is local.

image007

It returns the number of milliseconds since 0:00:00 UTC 1st Jan 1970 (epoch), also called ticks. What is important is to note that the string has no trace of Kind. What does it mean? This only means that the Kind is hardwired here. And that is UTC.

By the way, did you notice two important things? First, we lost the precision beyond milliseconds. And second, how it supports a date less than 1970, by using negative values. The numbers (Ticks and ticks) are different in server and in client. This is because both the starting time and the unit are different.

Now that we have the number of milliseconds elapsed since epoch we can easily get the date, both in UTC and local time in client side. How it will be rendered depends on how we do things in client side.

If we have the freedom to convert the time then we can show it however we want. Let’s first create the JavaScript date object from the string.

image008

Now JavaScript date can support both UTC and local at once. The signature/value used to create the JavaScript object determines how the date will interpret the input. For example, in the above example, since we have initialized the date with number it will add that many milliseconds since epoch (UTC) and give a date. That date can be interpreted however we want.

image009

toString prints it in local time and toUTCString prints in UTC.

image010

Note that the UTC date is no longer 1st April 2015. It is reduced by 8 hours and went back to 31st Mar 2015. This is because from server we sent 1st April 2015 in local kind (Server in Singapore time zone that is +8).

We can as well use Kendo functions and the output will be like below. The toLocaleString gives 12:00:00 AM. There is nothing to worry about here. If the time component is absent then the function assumes 12 AM midnight, essentially means the first moment of that day.

image011

You might wonder why both the local date of server and the local date at client are the same. Well this is because both server and client are in the same time zone (Singapore +8).

Let us move the browser to a different time zone. Say, in the middle of UTC and Singapore (+8) to Moscow (+4). See the difference in number of milliseconds, the local date (server date is reduced by 4 hours and it moves to the previous day 8 PM, 4 hours short of server/Singapore local time).

image012

image013

The UTC time is still the same. It makes sense since our application server is still in Singapore and is still passing 1st April 2015 local. But why the toLocaleString() giving 7 PM? Should it not give 8 PM? Yes, it should and internet explorer does give 8 PM. I did run a new instance of chrome after changing the time zone to Moscow. And why does it say 7 PM, I have no clue. Both were using kendo version 2014.1.318. Let’s not get distracted here. Just say, oops and move on.

Internet explorer shows the below for this case. In my understanding that is correct.

image014

Had we sent the same date with UTC Kind would this still be the same? The answer is no. The number of milliseconds since epoch (UTC) would have been different for the same date of UTC Kind. They would have looked like below:

image015

image016

Note that the number of milliseconds since epoch is different and the date in UTC shows 1st April 2015.

Showing the date in Kendo grid in client

Let us assume that we have some columns and some rows to show in Kendo grid that includes some date columns. Some of those date columns should show the dates in UTC while the rest in local time zone.

So we send all these data as JSON string to client side and assign that data source to Kendo grid. In Kendo we have defined some date columns using code like

Type: “Date”

By default Kendo Grid would show the date in local time zone. That means it is going to apply the time zone offset on the string representing UTC date. Apparently, there is no way to simply indicate in the column definition to show the date in UTC.

That leaves us with only one option and that is to nullify the adjustment. If the browser in UTC -01:00, the 1st Apr 2015 00:00:00 is going to be shown as 31st Mar 2015 23:00:00. That means the date is reduced by 1 hour. So to nullify this effect, we will add 1 hour.

image017

image018

So the end result is that whatever date we send from server will be shown as it is no matter where the browser is located.

Sending the date back to server

The same thing happens when the date comes back. The date user picks is treated as local (browser) date. When we use say, kendo.stringify (converts JavaScript object to JSON) to serialize it before sending it to server then the date is converted to UTC.

Suppose user picks 1st April 2015 from Moscow that looks like below:

image019

When it goes through kendo.stringify it looks like the below (4 hours reduced to make it UTC):

“2015-03-31T20:00:00.000Z”

So in server what we get is 31st Mar 2015 and not 1st April 2015. We lost where in the world this date came from (well, we can find it by some means but we are not talking about that here). So we cannot get back what user really chose.

Hence it is important that from client itself we nullify the serialization effect. We do this by adding 4 hours. Well we actually subtract the offset that is – 4 hours for Moscow. So when kendo.stringify converts the date to UTC during serialization it still remains 1st April 2015 and we get that in server.

“2015-04-01T00:00:00.000Z”

Is adjusting the offset enough?

Let us move our browser to Samoa that is 14 hours ahead from UTC (+14) during Local daylight saving time and UTC (+13) when day light saving time ends (on 5th April). 1st April 2015 would look like below:

image020

If we get the time zone offset using today’s (say 18th April 2015) date as done above it would be – 13 hours. If we do the same with 1st April 2015, we would get – 14 hours.

Kendo.stringify would use the specific day’s offset and not today’s to make it UTC. Hence we have to get the offset of the day that user chose. If we use today’s offset then the adjustment won’t be correct and in server we will get the wrong date/time.

At this moment, you might be wondering was it correct to use today’s offset to adjust the UTC date that we got from server before passing it to Kendo.

The answer is No. We have to use the offset of that date. However that is a little bit tricky. Why? Well, how would we know the date in local time? In client we got the UTC date. That is the right date. We got to use the JavaScript UTC functions related to Date like getUTCFullYear() etc. to extract the year, month and day. Then create a local JavaScript date with those components. Get the time zone offset from that date and use that to adjust.

That is too much!

Can we get rid of day light saving time issue altogether?

We can either choose to adjust the way above or we ignore it altogether. How? When user chooses 1st April 2015, let us add 12 hours to that and make it 1st April 2015 12:00:00 PM. Take today’s time zone offset to adjust and that way we ignore that day light saving time.

When the date ends up in server in UTC, we outright set the date to 12 noon again and save that. It is the same story when the date makes it journey to client. There also use only today’s offset. Since we have 12 hours buffer, few hours’ fluctuation might change the time but not the date.

You might be interested in knowing a little issue here: if user manually types the date in Kendo date picker with a date based format the time is lost. The time is retained if calendar is used to input the same.

A few words on JSON date

We have seen two kinds of JSON dates:

image007

“2015-04-01T00:00:00.000Z”

They are Microsoft’s ASP.NET built-in JSON format and ISO 8601 format respectively. Both would be treated the same way by Kendo. However the former is compatible with most browsers, especially the old ones.

Is this the right approach?

Well, adjusting the date in client side may not look good. It is not bad either. This is what Kendo is suggesting: adjust it at requestEnd function.

One way to handle this is to send string to client instead of date. However when Kendo would show them and we try to sort them the sorting will be based on the strings, not dates.

Another approach is to adjust the time zone offset in server side. When client request it for it let it pass the offset to it.

The best way would have been a flag while defining the Kendo column, especially in our case while the columns are dynamically decided, some of which would be dates, among them few would be in UTC and the rest in Local.

Index