MS SQL Server Recovery Model

Recovery model

A recovery model is a database property that controls how transactions are logged. Backup and restore operations occur within the context of the recovery model of the database.

Why log transactions?

To guarantee ACID properties over crashes or hardware failures every action is logged (to disk). In case of system failure, it would read the log and redo/undo the logged actions to bring the database to a consistent state.

Where it is logged?

MS SQL Server maintains mainly two kinds of files. Data file and log file. Log file can map to one or more physical files. Also note that transactional log is different from other logs like setup log, error log, profiler log, agent log, event log etc.

Types of recovery model

Depending on how extensively MS SQL Server logs there are 3 types of recovery models. They are:

  1. Simple recovery model
  2. Full recovery model
  3. Bulked logged recovery model

Simple recovery model

Suppose every day you take a backup of the database. If there is a failure, you at most lose the data for one day and you can afford that. In that case, you can choose simple recovery model. Here, every transaction is logged. However, as soon as those transactions are written to disk (checkpoint), the log is discarded.

Here you don’t have the provision to take back up for log. You only backup data. And as said before, in case of failure, you can recover the data that you backed up last time. You loose everything else, done since the data backup.

You get rid of the administrative task of taking the log backup. You get rid of running into the risk of managing very large log files.

However you lose the following features:

Log shipping – This is a disaster recovery technique where the transactional log is copied from primary (production) database on a primary server (the production MS SQL Server instance) to one or more secondary database (warm backup of primary database) on separate secondary server instance (where warm standby copy/copies is/are stored). This feature works with the help of full recovery model that will be discussed a bit later.

AlwaysOn or Database mirroring – These features support/create fail-over environment using redundancy. If the primary database fails, the secondary (not backups) database takes over. These features also work with the help full recovery model.

Media recovery without data loss – If media (tape/disk) fails recovery without data loss not possible. It needs full recovery model.

Point-in-time restores – You cannot restore the database to any arbitrary point defined by a date and time. For this to work, you need full recover model.

Full recovery model

By this time, it is clear that for the features above to work you need full recovery model. Well, it will log all the transactions from the beginning even if those transactions are written from memory to disk. So it is obvious that the log size will be too big. It means there is provision to take log backups. And that you have to take log backup regularly else you might run out of production server disk space. Yes, if you backup log then the backed-up up log will be removed from the database log file.

Armed with full recovery model (with the pain of administrating transactional log) you won’t lose any data due to a lost or damaged data file. You can restore database to any point of time.

Unless the tail of transactional log file is damaged all is good. Else you lose the data since last time you took a log backup.

Bulk recovery model

Same as full recovery model just that for the bulk operations (to be precise, a set of operations those can be minimally logged) it does not log everything. This model allows bulk copy to work with high performance by not logging all transactions.

Point-in-time recovery not supported. Recovery possible to the end of any (data and log) backup.

If log is damaged or bulk logged operations occurred since the most recent log backup, changes since the last backup will be lost. Else no data is lost.

Why log size too big?

This is a typical problem people face.

You can check log_reuse_wait_desc column in sys.databases. It will tell you why?

However, most likely the database is using the default full recovery model. You are unaware of it: especially, if the database is created by SharePoint. And you forgot to take regular log backup.

Reference

Index

Heartbleed Bug

Let us start by HTTP

When we are browsing internet by typing a web address (URL/Server address) in a browser (client) starting with http we are using HTTP protocol. It functions as a request-response protocol in a client server model. That means client (say, browser) sends a request and server sends back a response. Note that browser is just one type of client. And a client is not necessarily has to be a browser.

HTTP is built over TCP

HTTP protocol is implemented at the top of a reliable transport layer protocol, mostly TCP.

HTTP has security problem

However an attacker can sit on the line between the client and the server and can read what has been transpired between the two. Not only that, an attacker can impersonate a server, i.e., act as if it is the server and can collect all client information that might be sensitive to be revealed.

There comes TLS/SSL to build HTTPS

Hence it is important to make sure that client is talking to the real server. It is also important that all communication between the client and the server is kept secret between the two. TLS/SSL comes into picture here. It implements transport layer security to protect the HTTP line and then the HTTP is called HTTPS. When you are typing a URL starting with https you are using a secured HTTP line.

TLS/SSL uses asymmetric encryption

Asymmetric encryption also known as public-key cryptography uses a key pair; a private key and a public key. The private key remains known only to the server while the public key is open/known to all. Data can be encrypted by any of the two keys. And then only the other key can decrypt that.

There are two things in public-key cryptography:

·      Public key encryption – public key encrypts, private key decrypts

·      Digital signature – private key encrypts, public key decrypts

When a server gets a certificate (X.509) from a certificate authority (CA), it essentially generates/gets a key pair. Then the certificate contains the public key of the website and a signature of the CA (i.e., signed by the private key of the CA) that issued the certificate among other identity information. When browser talks to a website, the website has to first provide its certificate.

Browser has a list of trusted CAs and their public keys. Browser can use the right CA’s (as mentioned in the certificate) public key to verify the certificate (signature).

And symmetric encryption

Now that client trust’s the website, it uses the public key of the website to encrypt a key that it would like to use as the symmetric key for further communication. Well, in reality there can be a number of steps here that essentially decides how the symmetric key to be generated/negotiated/used. Typically a disposable session key is generated from an initially (at the beginning of secure line creation) generated master secret to implement forward secrecy. That means, even if at any point the session key is compromised, only the data transferred using that session key will be compromised. But the attacker won’t be able to figure out the next session key that will be generated after a while.

Unlike asymmetric key where any of the two private and public key can encrypt data that only the other key can decrypt, in symmetric encryption it is a single key that does both the encryption and decryption. Symmetric encryption is better for performance, key distribution and helps to implement forward secrecy by continuously changing session keys.

Here comes OpenSSL

So if you want a server that would support TLS/SSL support you need a component to implement that. OpenSSL is such an open source implementation. There are many other implementations like GnuTLS etc.

But TCP has an issue

So now we have an HTTPS connection built over TCP where client is sending a request and server is giving back a response. But how long would the client wait for the server to respond? At times we do need to wait in a blocking call till the response comes back. But then at times TCP can fail to detect whether the other side is still alive. Especially, if the other side closes the connection without following a normal tear down process.

Also at times, firewalls sitting on the line close idle TCP connection.

Here comes heartbeat

To make sure client is not waiting for a response to come from a server sitting on the other side of a closed connection. And to make sure firewalls are not closing the connection when it is idle, a packet is periodically being sent to server. And the server sends back a response.

OpenSSL’s heartbeat implementation went wrong

OpenSSL’s hearbeat implementation went wrong and this bug was introduced in 2012. The buggy code could read up to around 64KB memory of the server process and send that back to client as each heartbeat response. Client could construct the message in a way that could to some extent dictate where/how much it wants to read from server. That would it turn allow the client to steal some sensitive information to exploit.

How to exploit Heartbleed bug?

A client as said need not be a browser. Rather one could write one’s own HTTPS client and construct the heartbeat message on his/her own. By using multiple and especially simultaneous heartbeat messages one can accumulate a large section of memory of the server process. After analysing the data one can possibly even get the server’s private key. If the server belongs to a e-commerce site one could possibly get the credit card information, user ids and passwords of the users who were using that site that time.

Affected areas

Even though as of now we were mostly focused on web servers it is not the only thing that gets affected. OpenSSL (heartbeat extension) is also used in email server, VPN and other TLS/SSL secured client server systems. Also it is not only the server but the client as well that can be affected.

Solution

There are few sites like ssllabs where one can paste the URL and check whether it is vulnerable. OpenSSL already released a fixed version. If you own a server that is vulnerable it is better to assume that it has been compromised. It is then best to change the certificate. If you are using a server where you have an Id and password, it is better to change your password.

Index

FILESTREAM – Considerations, Restrictions and Limitations

There are considerations, restrictions and limitations to use FILESTREAM. They are listed below:

Works as it is or with some considerations

  1. SQL Server Integration Service (SSIS) – uses DT_IMAGE SSIS, can use import and export column transformation
  2. Log Shipping – supports – both primary and secondary servers must run MS SQL Server 2008 or above
  3. SQL Server Express – supports
  4. Full Text Indexing – works if there is an additional column in the table with filename extension for each BLOB
  5. Failover clustering – FILESTREAM filegroup must be on a shared disk, enable FILESTREAM on each node hosting FILESTREAM instance

Limited work

  1. Replication – use NEWSEQUENTIALID(), Merge Replication synchronization over https using Web Synchronization has 50 MB limit
  2. Distributed Queries and Linked Server – can not use PathName() for four part name
  3. Contained Database – requires extra configuration outside database, hence not fully contained

Does not work

  1. Transparent Data Encryption (TDE) – does not work
  2. Database Snapshots – does not work with FILESTREAM filegroups
  3. Database Mirroring – does not support

Reference

Prev

Index

Cashier’s Order from DBS online banking

Recently I needed to pay some fees to some authorities. In Singapore, typically you pay somebody by cash, ATM transfer, online transfer, NETS etc. However there are cases when you need to pay by cheque. It was such a case. But if you are having savings accounts with the banks they are unlikely to give you a cheque book. Cashier’s order (CO) comes handy as an alternative to cheque. Especially, if you want to avoid all the hassles and fees (small though) to open a current account that provides you with a cheque book.

You can either go down to a branch, fill out a form, pay a fee and collect a CO. However some banks like DBS allows you to apply online for a CO. If you do so, you still need to go to a branch (only certain branches – Jurong Point is one of them, you can chose one while applying online) to collect it (last time I did not need to wait in a queue, a front desk lady collected it from inside and gave it to me) but you save the fee. You can also choose the bank to send it to you by post.

When I tried to apply for a CO online I found I can do so for up to SGD 10,000. I needed more. Without finding a way I went down to the branch and was informed that I can do as many as possible. If the payee is the same they would call me and give an aggregate one (I needed a single one). Had they mentioned it here it would have been better.

By this time I became more careful. When I was issuing the COs one by one I was keeping the references manually as there was no report or anything to figure out how many I have issued so far. Well, I could log out and during that time they indeed show all activities I did in that session.

Overall, DBS provides a nice online experience. Especially I like the e-statement that saves paper.

PS. Last time, I bought a CO of S$ 35. However, I could not manage time to collect it. They called me to collect it. When I said I would not require it anymore, they canceled it, free of cost!

Index

FILESTREAM – Setup

To use FILESTREAM feature both Windows administrator and SQL Server administrator have to enable it. While enabling from Windows there are three options to choose from:

  1. Enable FILESTREAM for Transact-SQL Access check box.
  2. Enable FILESTREAM for file I/O streaming access. This is to read and write FILESTREAM data from Windows. Enter the name of the Windows share in the Windows Share Name box.
  3. Select Allow remote clients to have streaming access to FILESTREAM data. This is to let remote clients access the FILESTREAM data stored on this share.

For SQL Server instance there are three options to choose from:

  1. 0 – Disable FILESTREAM support for this instance
  2. 1 – enable FILESTREAM for Transact-SQL access only
  3. 2 – enable FILESTREAM for Transact-SQL and Win32 streaming access

Note that FILESTREAM would perform worse for 1MB+ size files for full read/write if Transact-SQL is chosen (and Win32 streaming is not chosen) than no FILESTREAM BLOB.

Now that FILESTREAM is enabled, we can go ahead to do the rest of the setups.

First, create a file group to keep the FILESTREAM data. There is a 1:1 mapping between data containers and file groups.

       ALTER DATABASE MyDatabase ADD
       FILEGROUP MyFileStreamGroup1 CONTAINS FILESTREAM;

 Second, define data container. In the below example, “D:\Production” must exist already. “D:\Production\FileStreamDataContainerRoot” is going to be the root of the data container.

       ALTER DATABASE MyDatabase ADD FILE (
       NAME = FileStreamGroup1File,
       FILENAME = 'D:\Production\FileStreamDataContainerRoot')
       TO FILEGROUP MyFileStreamGroup1 ;

Third, create a table with a FILESTREAM column. All FILESTREAM columns of a table must go to a single FILESTREAM group that can be specified, defaulting to default FILESTREAM filegroup. Table must have a column of the non-null uniqueidentifier data type with ROWGUIDCOL attribute and either a UNIQUE or PRIMARY KEY single-column constraint. FILESTREAM data container will have a directory for a table with subdirectory corresponding to the FILESTREAM column of the table.

       USE MyDatabase ;
       GO

       CREATE TABLE TableWithFileStreamColumn (
       DocumentID INT IDENTITY PRIMARY KEY,
       Document VARBINARY (MAX) FILESTREAM NULL,
       DocGUID UNIQUEIDENTIFIER NOT NULL ROWGUIDCOL UNIQUE DEFAULT NEWID ())
       FILESTREAM_ON MyFileStreamGroup1;

Now it is time to use FILESTREAM using standard Transact-SQL or Win32 APIs.

Reference

Prev          Next

Index

FILESTREAM – Hardware/OS Consideration

There should be some considerations regarding hardware selection/configuration and OS configuration like:

  • Hard disk – SCSI is better for reliability and performance
  • RAID level – RAID 10 is recommended even though at least 50% space is redundant depending on configuration
  • Storage layout – all data in one volume may not be the right choice if load is high
  • Cluster size – 64KB and above to reduce fragmentation, also defragment periodically
  • Compression – cannot be enabled if cluster size is 4KB or less, may not be effective for all kinds of data, comes at the expense of CPU
  • Anti-virus configuration – AV can need to be setup properly so that it does not delete FILESTREAM files
  • Disable 8.3 naming for NTFS – if number of files are high then NTFS has to work harder to get a unique name
  • Turn off last access time update – since files are not accessed by explorer rather SQL Server turning off this feature is OK and it saves a lot of time
  • Security – SQL server need to be configured to use integrated security (Win32 file APIs to work that uses Windows token associated with the client’s thread accessing files)

Reference

Prev          Next

Index

FILESTREAM – What and When

If you have to store files (BLOB) in MS SQL Server database, you might consider using FILESTREAM (started from MS SQL Server 2008). This is essentially enabling FILESTREAM attribute of varbinary (max) type column. Some other setups are necessary prior to that.

FILESTREAM combines MS SQL Server with NTFS file system. Database is good at reading/writing/processing small pieces (rows and columns) of information scattered around in the database files while file system is good at reading/writing files as a whole. File system start performing better when the files are larger than 1 MB.

By combining the two you can get the best of the both worlds, like getting past the low max size limit of database BLOB and use max NTFS size, good streaming performance of file system, integration with structured data (transactional consistency), better fragmentation management, manageability (backups etc can be done together) etc.

However if you are doing frequent small updates FILESTREAM performs poor. This is because here it has to work more – guarantee transactional consistency as well write to file system that essentially creates/writes a new copy and delete the old one. It also takes more time to take backup.

FILESTREAM data can be accessed in two ways: standard Transact-SQL and by using Win32 APIs. If you want the best performance of File System you need to use Win32 APIs. There are provisions for the Win32 APIs to work within the context of database transaction.

FILESTREAM BLOBS use file system cache and does not use database buffer pool. This improves query processing and overall database performance.

However do note that certain features like Transparent Data Encryption (TDE) are not supported with FILESTREAM.

To summarize, if you are to save files 1 MB or larger in database that will be read a lot with few small updates and that a middle tier is available where you can use Win32 APIs for accessing them then FILESTREAM is a good choice.

Reference

Next

Index

Optimizing Inserts

If you are inserting a lot of rows in a big table one by one then it will take long time to do so given that the table has a number of indexes.

Try to rewrite the code so that you can do bulk insert. That will be faster.

For example, if you are inserting 2, 000 rows in a heavily indexed table, it might take 20 mins. If you can change the logic and get it done in a single bulk insert you can do it in say, 5 seconds.

If you cannot do so, then create a temp table equivalent to the one where the data should have been inserted. Once all the data are inserted into the temp table one by one, at the end at one go you can move them to the original table. You might need to use transaction and put in some more logic depending on the situations.

Another thing that you can do along with the above or alone is temporary disable the (non-clustered) indexes.

When you insert rows in a table, it is not only the data that MS SQL Server writes to disk but also the indexes on the table. The latter can take up a lot of time. So disable the indexes before you start inserting any data and rebuild it at the end of all insertions. That way you create the indexes at one go.

ALTER INDEX [Index_Name] ON Table DISABLE

ALTER INDEX [Index_Name] ON Table REBUILD

Index

Replacing Subqueries With Join Might Drastically Boost Query Performance

Let’s have a look at the following query:

SELECT TransactionNumber
FROM TransactionTable
WHERE TransactionNumber NOT IN
(SELECT TransactionNumber
FROM ClosedTransactionTable)

The following query can be re-written by using JOIN as the following way:

SELECT TransactionNumber
FROM TransactionTable
LEFT JOIN 
(SELECT DISTINCT TransactionNumber AS DistinctTransactionNumber 
FROM ClosedTransactionTable) AS T2
ON TransactionNumber = T2.DistinctTransactionNumber
WHERE T2.DistinctTransactionNumber IS NULL

Writing subqueries makes it easier to understand and at times works better than joins. However, at times a query using JOIN can execute in a few seconds instead few minutes for the same using IN.

Suppose the outer query is a JOIN of several tables including some large ones that produces several thousand or more rows. And the subquery would produce a maximum of several thousand rows every time you execute it. You know this because you have the domain knowledge of the business. In that case, if you can produce all the rows of the subquery by executing just once and create a HASH TABLE in the memory. Then using a HASH JOIN on it becomes quite fast. You can achieve this by giving the query hint to use HASH JOIN. The above query would then look like the below:

SELECT TransactionNumber
FROM TransactionTable
LEFT HASH JOIN 
(SELECT DISTINCT TransactionNumber AS DistinctTransactionNumber 
FROM ClosedTransactionTable) AS T2
ON TransactionNumber = T2.DistinctTransactionNumber
WHERE T2.DistinctTransactionNumber IS NULL

This query would execute really fast.  It is true in general for all kinds of subqueries in MS SQL Server.

Index