MS SQL Server Recovery Model

Recovery model

A recovery model is a database property that controls how transactions are logged. Backup and restore operations occur within the context of the recovery model of the database.

Why log transactions?

To guarantee ACID properties over crashes or hardware failures every action is logged (to disk). In case of system failure, it would read the log and redo/undo the logged actions to bring the database to a consistent state.

Where it is logged?

MS SQL Server maintains mainly two kinds of files. Data file and log file. Log file can map to one or more physical files. Also note that transactional log is different from other logs like setup log, error log, profiler log, agent log, event log etc.

Types of recovery model

Depending on how extensively MS SQL Server logs there are 3 types of recovery models. They are:

  1. Simple recovery model
  2. Full recovery model
  3. Bulked logged recovery model

Simple recovery model

Suppose every day you take a backup of the database. If there is a failure, you at most lose the data for one day and you can afford that. In that case, you can choose simple recovery model. Here, every transaction is logged. However, as soon as those transactions are written to disk (checkpoint), the log is discarded.

Here you don’t have the provision to take back up for log. You only backup data. And as said before, in case of failure, you can recover the data that you backed up last time. You loose everything else, done since the data backup.

You get rid of the administrative task of taking the log backup. You get rid of running into the risk of managing very large log files.

However you lose the following features:

Log shipping – This is a disaster recovery technique where the transactional log is copied from primary (production) database on a primary server (the production MS SQL Server instance) to one or more secondary database (warm backup of primary database) on separate secondary server instance (where warm standby copy/copies is/are stored). This feature works with the help of full recovery model that will be discussed a bit later.

AlwaysOn or Database mirroring – These features support/create fail-over environment using redundancy. If the primary database fails, the secondary (not backups) database takes over. These features also work with the help full recovery model.

Media recovery without data loss – If media (tape/disk) fails recovery without data loss not possible. It needs full recovery model.

Point-in-time restores – You cannot restore the database to any arbitrary point defined by a date and time. For this to work, you need full recover model.

Full recovery model

By this time, it is clear that for the features above to work you need full recovery model. Well, it will log all the transactions from the beginning even if those transactions are written from memory to disk. So it is obvious that the log size will be too big. It means there is provision to take log backups. And that you have to take log backup regularly else you might run out of production server disk space. Yes, if you backup log then the backed-up up log will be removed from the database log file.

Armed with full recovery model (with the pain of administrating transactional log) you won’t lose any data due to a lost or damaged data file. You can restore database to any point of time.

Unless the tail of transactional log file is damaged all is good. Else you lose the data since last time you took a log backup.

Bulk recovery model

Same as full recovery model just that for the bulk operations (to be precise, a set of operations those can be minimally logged) it does not log everything. This model allows bulk copy to work with high performance by not logging all transactions.

Point-in-time recovery not supported. Recovery possible to the end of any (data and log) backup.

If log is damaged or bulk logged operations occurred since the most recent log backup, changes since the last backup will be lost. Else no data is lost.

Why log size too big?

This is a typical problem people face.

You can check log_reuse_wait_desc column in sys.databases. It will tell you why?

However, most likely the database is using the default full recovery model. You are unaware of it: especially, if the database is created by SharePoint. And you forgot to take regular log backup.

Reference

Index

FILESTREAM – Considerations, Restrictions and Limitations

There are considerations, restrictions and limitations to use FILESTREAM. They are listed below:

Works as it is or with some considerations

  1. SQL Server Integration Service (SSIS) – uses DT_IMAGE SSIS, can use import and export column transformation
  2. Log Shipping – supports – both primary and secondary servers must run MS SQL Server 2008 or above
  3. SQL Server Express – supports
  4. Full Text Indexing – works if there is an additional column in the table with filename extension for each BLOB
  5. Failover clustering – FILESTREAM filegroup must be on a shared disk, enable FILESTREAM on each node hosting FILESTREAM instance

Limited work

  1. Replication – use NEWSEQUENTIALID(), Merge Replication synchronization over https using Web Synchronization has 50 MB limit
  2. Distributed Queries and Linked Server – can not use PathName() for four part name
  3. Contained Database – requires extra configuration outside database, hence not fully contained

Does not work

  1. Transparent Data Encryption (TDE) – does not work
  2. Database Snapshots – does not work with FILESTREAM filegroups
  3. Database Mirroring – does not support

Reference

Prev

Index

FILESTREAM – Setup

To use FILESTREAM feature both Windows administrator and SQL Server administrator have to enable it. While enabling from Windows there are three options to choose from:

  1. Enable FILESTREAM for Transact-SQL Access check box.
  2. Enable FILESTREAM for file I/O streaming access. This is to read and write FILESTREAM data from Windows. Enter the name of the Windows share in the Windows Share Name box.
  3. Select Allow remote clients to have streaming access to FILESTREAM data. This is to let remote clients access the FILESTREAM data stored on this share.

For SQL Server instance there are three options to choose from:

  1. 0 – Disable FILESTREAM support for this instance
  2. 1 – enable FILESTREAM for Transact-SQL access only
  3. 2 – enable FILESTREAM for Transact-SQL and Win32 streaming access

Note that FILESTREAM would perform worse for 1MB+ size files for full read/write if Transact-SQL is chosen (and Win32 streaming is not chosen) than no FILESTREAM BLOB.

Now that FILESTREAM is enabled, we can go ahead to do the rest of the setups.

First, create a file group to keep the FILESTREAM data. There is a 1:1 mapping between data containers and file groups.

       ALTER DATABASE MyDatabase ADD
       FILEGROUP MyFileStreamGroup1 CONTAINS FILESTREAM;

 Second, define data container. In the below example, “D:\Production” must exist already. “D:\Production\FileStreamDataContainerRoot” is going to be the root of the data container.

       ALTER DATABASE MyDatabase ADD FILE (
       NAME = FileStreamGroup1File,
       FILENAME = 'D:\Production\FileStreamDataContainerRoot')
       TO FILEGROUP MyFileStreamGroup1 ;

Third, create a table with a FILESTREAM column. All FILESTREAM columns of a table must go to a single FILESTREAM group that can be specified, defaulting to default FILESTREAM filegroup. Table must have a column of the non-null uniqueidentifier data type with ROWGUIDCOL attribute and either a UNIQUE or PRIMARY KEY single-column constraint. FILESTREAM data container will have a directory for a table with subdirectory corresponding to the FILESTREAM column of the table.

       USE MyDatabase ;
       GO

       CREATE TABLE TableWithFileStreamColumn (
       DocumentID INT IDENTITY PRIMARY KEY,
       Document VARBINARY (MAX) FILESTREAM NULL,
       DocGUID UNIQUEIDENTIFIER NOT NULL ROWGUIDCOL UNIQUE DEFAULT NEWID ())
       FILESTREAM_ON MyFileStreamGroup1;

Now it is time to use FILESTREAM using standard Transact-SQL or Win32 APIs.

Reference

Prev          Next

Index

FILESTREAM – Hardware/OS Consideration

There should be some considerations regarding hardware selection/configuration and OS configuration like:

  • Hard disk – SCSI is better for reliability and performance
  • RAID level – RAID 10 is recommended even though at least 50% space is redundant depending on configuration
  • Storage layout – all data in one volume may not be the right choice if load is high
  • Cluster size – 64KB and above to reduce fragmentation, also defragment periodically
  • Compression – cannot be enabled if cluster size is 4KB or less, may not be effective for all kinds of data, comes at the expense of CPU
  • Anti-virus configuration – AV can need to be setup properly so that it does not delete FILESTREAM files
  • Disable 8.3 naming for NTFS – if number of files are high then NTFS has to work harder to get a unique name
  • Turn off last access time update – since files are not accessed by explorer rather SQL Server turning off this feature is OK and it saves a lot of time
  • Security – SQL server need to be configured to use integrated security (Win32 file APIs to work that uses Windows token associated with the client’s thread accessing files)

Reference

Prev          Next

Index

FILESTREAM – What and When

If you have to store files (BLOB) in MS SQL Server database, you might consider using FILESTREAM (started from MS SQL Server 2008). This is essentially enabling FILESTREAM attribute of varbinary (max) type column. Some other setups are necessary prior to that.

FILESTREAM combines MS SQL Server with NTFS file system. Database is good at reading/writing/processing small pieces (rows and columns) of information scattered around in the database files while file system is good at reading/writing files as a whole. File system start performing better when the files are larger than 1 MB.

By combining the two you can get the best of the both worlds, like getting past the low max size limit of database BLOB and use max NTFS size, good streaming performance of file system, integration with structured data (transactional consistency), better fragmentation management, manageability (backups etc can be done together) etc.

However if you are doing frequent small updates FILESTREAM performs poor. This is because here it has to work more – guarantee transactional consistency as well write to file system that essentially creates/writes a new copy and delete the old one. It also takes more time to take backup.

FILESTREAM data can be accessed in two ways: standard Transact-SQL and by using Win32 APIs. If you want the best performance of File System you need to use Win32 APIs. There are provisions for the Win32 APIs to work within the context of database transaction.

FILESTREAM BLOBS use file system cache and does not use database buffer pool. This improves query processing and overall database performance.

However do note that certain features like Transparent Data Encryption (TDE) are not supported with FILESTREAM.

To summarize, if you are to save files 1 MB or larger in database that will be read a lot with few small updates and that a middle tier is available where you can use Win32 APIs for accessing them then FILESTREAM is a good choice.

Reference

Next

Index

Optimizing Inserts

If you are inserting a lot of rows in a big table one by one then it will take long time to do so given that the table has a number of indexes.

Try to rewrite the code so that you can do bulk insert. That will be faster.

For example, if you are inserting 2, 000 rows in a heavily indexed table, it might take 20 mins. If you can change the logic and get it done in a single bulk insert you can do it in say, 5 seconds.

If you cannot do so, then create a temp table equivalent to the one where the data should have been inserted. Once all the data are inserted into the temp table one by one, at the end at one go you can move them to the original table. You might need to use transaction and put in some more logic depending on the situations.

Another thing that you can do along with the above or alone is temporary disable the (non-clustered) indexes.

When you insert rows in a table, it is not only the data that MS SQL Server writes to disk but also the indexes on the table. The latter can take up a lot of time. So disable the indexes before you start inserting any data and rebuild it at the end of all insertions. That way you create the indexes at one go.

ALTER INDEX [Index_Name] ON Table DISABLE

ALTER INDEX [Index_Name] ON Table REBUILD

Index

Replacing Subqueries With Join Might Drastically Boost Query Performance

Let’s have a look at the following query:

SELECT TransactionNumber
FROM TransactionTable
WHERE TransactionNumber NOT IN
(SELECT TransactionNumber
FROM ClosedTransactionTable)

The following query can be re-written by using JOIN as the following way:

SELECT TransactionNumber
FROM TransactionTable
LEFT JOIN 
(SELECT DISTINCT TransactionNumber AS DistinctTransactionNumber 
FROM ClosedTransactionTable) AS T2
ON TransactionNumber = T2.DistinctTransactionNumber
WHERE T2.DistinctTransactionNumber IS NULL

Writing subqueries makes it easier to understand and at times works better than joins. However, at times a query using JOIN can execute in a few seconds instead few minutes for the same using IN.

Suppose the outer query is a JOIN of several tables including some large ones that produces several thousand or more rows. And the subquery would produce a maximum of several thousand rows every time you execute it. You know this because you have the domain knowledge of the business. In that case, if you can produce all the rows of the subquery by executing just once and create a HASH TABLE in the memory. Then using a HASH JOIN on it becomes quite fast. You can achieve this by giving the query hint to use HASH JOIN. The above query would then look like the below:

SELECT TransactionNumber
FROM TransactionTable
LEFT HASH JOIN 
(SELECT DISTINCT TransactionNumber AS DistinctTransactionNumber 
FROM ClosedTransactionTable) AS T2
ON TransactionNumber = T2.DistinctTransactionNumber
WHERE T2.DistinctTransactionNumber IS NULL

This query would execute really fast.  It is true in general for all kinds of subqueries in MS SQL Server.

Index