MS SQL Server Nvarchar Issues

Do you need to store non-English characters, like Chinese characters?

Then you need UNICODE characters. nvarchar, ntext, nchar are your options. n is for ISO synonym national. Uses UNICODE UCS-2 character set, 2 bytes for a character, double than that of char/varchar/text. This is a price you need to pay for multi-ligual support.

nchar or nvarchar/ntext?

If you know the data length is always going to be around a fixed size, say, 20 characters then nchar, else nvarchar/ntext.

nvarchar ($fixed size$) or nvarchar (max)/ntext?

If you know your data is always going to be somewhere between a fixed range, say, 1 to 100 characters and would never exceed 100 then nvarchar (100). It makes sense, looks good. Use fixed length nvarchar as long the size you know would be less than 8000 bytes. More than 8K means the data would be saved out-row. Fixed length nvarchar can’t help.

If your data size can be NULL to 2 GB and anything in between then you have no option but nvarchar (max)/ntext.

nvarchar (max) or ntext?

Go for nvarchar (max), not ntext. ntext will not be supported in future and nvarchar (max) better in performance than ntext.

Is nvarchar (max) bad for performance?

SELECT 
[name] AS tablename, 
large_value_types_out_of_row 
FROM sys.tables

If large_value_types_out_of_row for the table that contains your nvarchar (max) column is 0 (=OFF) then MS SQL Server would try to save the data in-row, given that the row fits in 8K page size. If the row does not fit in 8K then the nvarchar (max) data would go out-row, one or more extra pages depending on the nvarchar (max) column data size.

If MS SQL Server has to read extra pages for a row, performance would suffer. But what to do? The data is big, right?

If large_value_types_out_of_row is 1 (=ON), then no matter how small is your nvarchar (max) column data, it will always be saved out-row.

In-row or out-row?

If the select statements you use for the table excludes the nvarchar (max) columns then use out-row. Because when you will be reading the non-nvarchar (max) columns, you are not reading the other pages used to store long nvarchar (max). Not only that, since all those big data are not in the same page, your single page can now fit more rows with the small data. You will be doing good here.

But if you have to read the whole row (all columns) at one go, then in-row good.

How to change in-row to out-row and vice versa?

You can change anytime.

sp_tableoption N'MyTableName', 'large value types out of row', 'ON'

However, doing so won’t immediately change the existing data layout. New inserted data would follow the new way. Existing data, if updated would be saved in the new layout. Well, you can always read the existing data, change it and then update it again with the old value. That way, you save all your data in the new layout.

What about ntext?

Well, we already decided not to use next. But since we are discussing this let’s see the respective issues.

This is how you can know the existing layout.

SELECT 
[name] AS tablename, 
text_in_row_limit 
FROM sys.tables

text_in_row_limit = 0 says, the option is not set. If non-zero, say, 256 then that is the number of bytes that will be saved in-row.

This is how to change it.

sp_tableoption N'MyTableName', 'text in row', ON

text_in_row_limit would be 256.

sp_tableoption N'MyTableName', 'text in row', '1000'

If you want that limit to be 1000 bytes. Yes, setting itself won’t change the existing data layout just like nvarchar.

How full are the pages?

Remember, you pages should be full as much as possible. Otherwise you will be reading a page but get very small data. You will be doing bad.

SELECT 
page_count, 
avg_page_space_used_in_percent 
FROM sys.dm_db_index_physical_stats(DB_ID(N'MyDatabaseName'), 
                                    OBJECT_ID(N'MyTableName'), 
                                    NULL, NULL, 'DETAILED')

But ignore small tables having less than 1000 pages (page_count).

Are the rows fitting in 8K?

SELECT 
OBJECT_NAME(object_id) as TableName, 
OBJECT_NAME(index_id) as IndexName, 
min_record_size_in_bytes, 
max_record_size_in_bytes, 
avg_record_size_in_bytes 
FROM sys.dm_db_index_physical_stats(DB_ID(N'MyDatabaseName'), 
                                    OBJECT_ID(N'MyTableName'), 
                                    NULL, NULL, 'DETAILED')

It is better to design tables in way so that data of n (n >= 1) rows fit in 8K, a page.

Is there any other limitation for nvarchar (max)?

You cannot create an index on nvarchar (max) column.

Also avoid using nvarchar (max) column in where clause.

What are the issues for migrating varchar to nvarchar?

Your migrated data is going to be double. Some rows might no longer fit in-row. They may over flow to out-row. Performance might be affected.

Index

MS SQL Server Recovery Model

Recovery model

A recovery model is a database property that controls how transactions are logged. Backup and restore operations occur within the context of the recovery model of the database.

Why log transactions?

To guarantee ACID properties over crashes or hardware failures every action is logged (to disk). In case of system failure, it would read the log and redo/undo the logged actions to bring the database to a consistent state.

Where it is logged?

MS SQL Server maintains mainly two kinds of files. Data file and log file. Log file can map to one or more physical files. Also note that transactional log is different from other logs like setup log, error log, profiler log, agent log, event log etc.

Types of recovery model

Depending on how extensively MS SQL Server logs there are 3 types of recovery models. They are:

  1. Simple recovery model
  2. Full recovery model
  3. Bulked logged recovery model

Simple recovery model

Suppose every day you take a backup of the database. If there is a failure, you at most lose the data for one day and you can afford that. In that case, you can choose simple recovery model. Here, every transaction is logged. However, as soon as those transactions are written to disk (checkpoint), the log is discarded.

Here you don’t have the provision to take back up for log. You only backup data. And as said before, in case of failure, you can recover the data that you backed up last time. You loose everything else, done since the data backup.

You get rid of the administrative task of taking the log backup. You get rid of running into the risk of managing very large log files.

However you lose the following features:

Log shipping – This is a disaster recovery technique where the transactional log is copied from primary (production) database on a primary server (the production MS SQL Server instance) to one or more secondary database (warm backup of primary database) on separate secondary server instance (where warm standby copy/copies is/are stored). This feature works with the help of full recovery model that will be discussed a bit later.

AlwaysOn or Database mirroring – These features support/create fail-over environment using redundancy. If the primary database fails, the secondary (not backups) database takes over. These features also work with the help full recovery model.

Media recovery without data loss – If media (tape/disk) fails recovery without data loss not possible. It needs full recovery model.

Point-in-time restores – You cannot restore the database to any arbitrary point defined by a date and time. For this to work, you need full recover model.

Full recovery model

By this time, it is clear that for the features above to work you need full recovery model. Well, it will log all the transactions from the beginning even if those transactions are written from memory to disk. So it is obvious that the log size will be too big. It means there is provision to take log backups. And that you have to take log backup regularly else you might run out of production server disk space. Yes, if you backup log then the backed-up up log will be removed from the database log file.

Armed with full recovery model (with the pain of administrating transactional log) you won’t lose any data due to a lost or damaged data file. You can restore database to any point of time.

Unless the tail of transactional log file is damaged all is good. Else you lose the data since last time you took a log backup.

Bulk recovery model

Same as full recovery model just that for the bulk operations (to be precise, a set of operations those can be minimally logged) it does not log everything. This model allows bulk copy to work with high performance by not logging all transactions.

Point-in-time recovery not supported. Recovery possible to the end of any (data and log) backup.

If log is damaged or bulk logged operations occurred since the most recent log backup, changes since the last backup will be lost. Else no data is lost.

Why log size too big?

This is a typical problem people face.

You can check log_reuse_wait_desc column in sys.databases. It will tell you why?

However, most likely the database is using the default full recovery model. You are unaware of it: especially, if the database is created by SharePoint. And you forgot to take regular log backup.

Reference

Index

Heartbleed Bug

Let us start by HTTP

When we are browsing internet by typing a web address (URL/Server address) in a browser (client) starting with http we are using HTTP protocol. It functions as a request-response protocol in a client server model. That means client (say, browser) sends a request and server sends back a response. Note that browser is just one type of client. And a client is not necessarily has to be a browser.

HTTP is built over TCP

HTTP protocol is implemented at the top of a reliable transport layer protocol, mostly TCP.

HTTP has security problem

However an attacker can sit on the line between the client and the server and can read what has been transpired between the two. Not only that, an attacker can impersonate a server, i.e., act as if it is the server and can collect all client information that might be sensitive to be revealed.

There comes TLS/SSL to build HTTPS

Hence it is important to make sure that client is talking to the real server. It is also important that all communication between the client and the server is kept secret between the two. TLS/SSL comes into picture here. It implements transport layer security to protect the HTTP line and then the HTTP is called HTTPS. When you are typing a URL starting with https you are using a secured HTTP line.

TLS/SSL uses asymmetric encryption

Asymmetric encryption also known as public-key cryptography uses a key pair; a private key and a public key. The private key remains known only to the server while the public key is open/known to all. Data can be encrypted by any of the two keys. And then only the other key can decrypt that.

There are two things in public-key cryptography:

·      Public key encryption – public key encrypts, private key decrypts

·      Digital signature – private key encrypts, public key decrypts

When a server gets a certificate (X.509) from a certificate authority (CA), it essentially generates/gets a key pair. Then the certificate contains the public key of the website and a signature of the CA (i.e., signed by the private key of the CA) that issued the certificate among other identity information. When browser talks to a website, the website has to first provide its certificate.

Browser has a list of trusted CAs and their public keys. Browser can use the right CA’s (as mentioned in the certificate) public key to verify the certificate (signature).

And symmetric encryption

Now that client trust’s the website, it uses the public key of the website to encrypt a key that it would like to use as the symmetric key for further communication. Well, in reality there can be a number of steps here that essentially decides how the symmetric key to be generated/negotiated/used. Typically a disposable session key is generated from an initially (at the beginning of secure line creation) generated master secret to implement forward secrecy. That means, even if at any point the session key is compromised, only the data transferred using that session key will be compromised. But the attacker won’t be able to figure out the next session key that will be generated after a while.

Unlike asymmetric key where any of the two private and public key can encrypt data that only the other key can decrypt, in symmetric encryption it is a single key that does both the encryption and decryption. Symmetric encryption is better for performance, key distribution and helps to implement forward secrecy by continuously changing session keys.

Here comes OpenSSL

So if you want a server that would support TLS/SSL support you need a component to implement that. OpenSSL is such an open source implementation. There are many other implementations like GnuTLS etc.

But TCP has an issue

So now we have an HTTPS connection built over TCP where client is sending a request and server is giving back a response. But how long would the client wait for the server to respond? At times we do need to wait in a blocking call till the response comes back. But then at times TCP can fail to detect whether the other side is still alive. Especially, if the other side closes the connection without following a normal tear down process.

Also at times, firewalls sitting on the line close idle TCP connection.

Here comes heartbeat

To make sure client is not waiting for a response to come from a server sitting on the other side of a closed connection. And to make sure firewalls are not closing the connection when it is idle, a packet is periodically being sent to server. And the server sends back a response.

OpenSSL’s heartbeat implementation went wrong

OpenSSL’s hearbeat implementation went wrong and this bug was introduced in 2012. The buggy code could read up to around 64KB memory of the server process and send that back to client as each heartbeat response. Client could construct the message in a way that could to some extent dictate where/how much it wants to read from server. That would it turn allow the client to steal some sensitive information to exploit.

How to exploit Heartbleed bug?

A client as said need not be a browser. Rather one could write one’s own HTTPS client and construct the heartbeat message on his/her own. By using multiple and especially simultaneous heartbeat messages one can accumulate a large section of memory of the server process. After analysing the data one can possibly even get the server’s private key. If the server belongs to a e-commerce site one could possibly get the credit card information, user ids and passwords of the users who were using that site that time.

Affected areas

Even though as of now we were mostly focused on web servers it is not the only thing that gets affected. OpenSSL (heartbeat extension) is also used in email server, VPN and other TLS/SSL secured client server systems. Also it is not only the server but the client as well that can be affected.

Solution

There are few sites like ssllabs where one can paste the URL and check whether it is vulnerable. OpenSSL already released a fixed version. If you own a server that is vulnerable it is better to assume that it has been compromised. It is then best to change the certificate. If you are using a server where you have an Id and password, it is better to change your password.

Index

FILESTREAM – Considerations, Restrictions and Limitations

There are considerations, restrictions and limitations to use FILESTREAM. They are listed below:

Works as it is or with some considerations

  1. SQL Server Integration Service (SSIS) – uses DT_IMAGE SSIS, can use import and export column transformation
  2. Log Shipping – supports – both primary and secondary servers must run MS SQL Server 2008 or above
  3. SQL Server Express – supports
  4. Full Text Indexing – works if there is an additional column in the table with filename extension for each BLOB
  5. Failover clustering – FILESTREAM filegroup must be on a shared disk, enable FILESTREAM on each node hosting FILESTREAM instance

Limited work

  1. Replication – use NEWSEQUENTIALID(), Merge Replication synchronization over https using Web Synchronization has 50 MB limit
  2. Distributed Queries and Linked Server – can not use PathName() for four part name
  3. Contained Database – requires extra configuration outside database, hence not fully contained

Does not work

  1. Transparent Data Encryption (TDE) – does not work
  2. Database Snapshots – does not work with FILESTREAM filegroups
  3. Database Mirroring – does not support

Reference

Prev

Index