MS SQL Server Data (Table) Storage

What does MS SQL Server store?

MS SQL Server has data files and (transactional) log files.

How is the log file organized?

Log files contain a series of log records, not divided by pages.

How is the data file organized?

Data files are grouped by extent and each extent by pages. Every page belongs to an extent.

What is page and extent?

Each page is 8 KB (128 pages/MB) and 8 contiguous pages make an extent of 64KB (16 extent/MB).

Disk I/O is performed at page level. That means, the minimum SQL server would read/write at one go is a page ~ 8KB.

MS SQL Server can also read a whole extent at one go instead of reading a single page when it makes sense. Look out for partition alignment so that reading one extent does not end up touching two disk blocks, affecting performance.

What are the different types of pages and what do they store?

Each page starts with a 96-byte header that stores information about the page including page number, page type etc.

Page types include Data, Index, Text/Image, Page Free Space (PFS, information about page allocation/free pages), Global Allocation Map (GAM, extent allocation information), Shared Global Allocation Map (SGAM, mixed extent with at least one unused page information), Bulk Changed Map (information about extents that got changed by a bulk logged operation since last BACKUP LOG statement) Differential Changed Map (information about extents that got changed since last BACKUP DATABASE statement), Index allocation map (IAM, page/extent allocation for an object, say a heap of a table).

So basically, pages store data and metadata for the MS SQL Server to make efficient use of them for various I/O (related) operations.

What are the different types of extents?

There are two types of extents: uniform and mixed extent. When a table is created instead of allocating a whole extent (8 pages), a single page (a page from a mixed extent type) is allocated. This goes on until the table crosses 8-page size. Onwards, every time a whole extent (uniform extent type) is allocated.

IAM pages (come from a mixed extent) keep track of these allocations.

Where does my table data go?

First of all, a table can have partitions. Let’s say we have not partitioned a table. Then we have just one partition. By the way, partition resides in a single file group.

Table can have a clustered index or not. If clustered index then the data organization type is B-tree else Heap (no logical order of data). It can be mentioned here that nonclustered index has same b-tree structure with the difference that it does not have data pages. Rather, it has index pages containing index rows, each having a row locator pointing to a data row in the clustered index or heap. Heap or b-tree of a partition can have up to 3 allocations units.

What is an allocation unit?

Allocation unit is a collection of pages within a heap or B-tree. We have three different types of allocation units based on the kind of data they would store. They are:

  1. IN_ROW_DATA: for data (data page) or index (index page) rows, can save all but LOB.
  1. LOB_DATA: for text, ntext, image, xml, varchar (max), nvarchar (max), varbinary (max) data types. Uses Text/Image pages.
  1. ROW_OVERFLOW_DATA: for variable length data stored in varchar, nvarchar, varbinary, or sql_variant columns that exceed the 8060 byte row size limit. Uses Text/Image pages.

How are the data rows saved in a data page?

We are talking about data pages (used by both heap and b-tree to store data rows and not index rows) in IN_ROW_DATA allocation unit. They are saved serially starting from the end of 96-byte header, not necessarily in the logical order as dictated by a clustered index, if any. It is the row offset table that maintains the logical order.

What is row offset table?

For each data row there is a 2-byte entry, saving the offset. It tells from where the data row starts. The entry for the first data row (logical, not physical) is there at the very end of the page. Suppose the data page contains 20 rows. Then there will be 20 entries and these entries make a row offset table. The first (0th) entry would say where the first logical data row would start. The physical position of this might be somewhere in the middle of the page and not necessarily at the beginning meaning just after the header.

What is the maximum data size in a data page?

Maximum data and overhead size in a data page is 8060 bytes.

How to save a row bigger than 8060 bytes?

A table can contain maximum 8060 bytes per row. Unless it has a variable type column defined. It would then use a 24-byte pointer in data page to point a Text/Image page in ROW_OVERFLOW_DATA allocation unit where they would move the data, starting with the largest width column. Data of a single variable column must not exceed 8000 bytes but combined width can exceed 8060 bytes. However, if a row is less than 8060 bytes and there is no space in the page then page split – not a row-overflow will occur.

A variable column value must be completely in-row or completely out-row. Multiple such columns can spread over several pages in ROW_OVERFLOW_DATA allocation unit. Such Text/Image pages are managed by a chain of IAM pages. Note that, the index key of a clustered index cannot contain varchar columns that have existing data in the ROW_OVERFLOW_DATA allocation unit.

How to save single column data bigger than 8000 bytes?

LOB column can store up to 2 GB data. Whether at attempt to save the data in-row would take place depends on the settings. If saved out-row, a 16-byte pointer from data page for each LOB column would point to the initial Text/Image page in LOB_DATA allocation unit. A page in LOB_DATA can store multiple columns from multiple rows but for a single table. Text/Image pages storing LOB data are managed by a chain of IAM pages.

Index