Category: Table Partitioning

Stairway to SQL Server Table Partitioning – Whats is Table Partitioning in SQL Server

There are two types of partitioning in SQL Server – Vertical & Horizontal. Today, we are quickly going to see these two types in SQL Server and its different use cases. One of the easiest way to explain Vertical partitioning is a partition at Column level otherwise Horizontal Partitioning is at Row level.

Vertical Partitioning

Vertical Partitioning is a process of splitting a really big table into multiple smaller tables for various reasons. Performance improvement is the major benefit of vertical partitioning. If your table is having large number of columns and lots of Large Object columns or very lengthy textual information, then partitioning the table into multiple smaller tables would really help to boost the performance especially if not all of your columns are frequently accessed. A simple pictorial representation as below: Lets quickly see how vertical partitioning helps in performance. (Note that this is a very simple example for readers to understand the impact of query execution). Few guys will have different arguments, please post those questions in comment, we will discuss in upcoming blog post. But for now, I strictly wanted to limit the scope of this post ONLY to the context of vertical partitioning.

Script:

Drop table if exists Employee_Tab1
Create Table Employee_Tab1
(EmployeeID int,FirstName varchar(10),
MiddleName varchar(10)
,LastName varchar(10)
,DOB Date
)DECLARE @i int
SET @i = 1
BEGIN TRAN
WHILE @i<>100000
BEGIN
INSERT INTO Employee_Tab1 
VALUES(@i,'a' ,'b','c', getdate())
SET @i=@i+1
END
COMMIT TRAN
GO

Set statistics io on
Select EmployeeID,DOB From Employee_Tab1
where dob = cast(getdate() as date)
Set statistics io off
Drop table if exists Employee
Create Table Employee
(EmployeeID int,FirstName varchar(10),
MiddleName varchar(10),
LastName varchar(10),
DOB Date,Address1 nvarchar(max),
Address2 nvarchar(max),
Address3 nvarchar(max))
DECLARE @i int
SET @i = 1
BEGIN TRAN
WHILE @i<>100000
BEGIN
INSERT INTO Employee 
VALUES(@i,'a' ,'b','c',
getdate(),Replicate('a',8000),
Replicate('a',8000),Replicate('a',8000))
SET @i=@i+1
END
COMMIT TRAN
GO
Set statistics io on
Select EmployeeID,DOB From Employee
where dob = cast(getdate() as date)
Set statistics io off

So if you look at the code, we tried to create two tables Employee and Emploee_Tab1 and we inserted almost 100000 records to these tables. You can observe the difference between these two tables – Employee table contains 3 large text columns for address where as the Employee_Tab1 does not have those columns.When we query these tables (note that we are not referring the address columns), we can observe a huge number of logical reads happens with Employee table where as the table employee_tab1 which does not have address columns has to do only less logical reads(532 compared to 2000). This indicates that SQL has to get more data from disk to cache for second query (using employee table) than first query (using vertically partitioned table Employee_Tab1). You may observe the behavior in the below side by side query window screenshot clearly.

Points to ponder:

Use Vertical partitioning

1. If your table is having more columns

2. If your table has lots of BLOB or wide text columns those are not frequently used

3. If your table is a highly transaction table

4. with extreme care if you are restructuring your table since it may have big impact like many objects may need to change.

Horizontal Partitioning

As mentioned earlier, horizontal partitioning is a partition process that carries out at row level. That means, if you a table with (m/b)illions of data, you can partition the table horizontally. This would enable better manageability/maintainability etc. Horizontal partitioning may also have performance benefits, however, we should know how to implement this technique efficiently.

Let us quickly see a pictorial representation of the Horizontal partitioning as below.

Horizontal Partitioning requires understanding of few important terminologies.

1. Partition Key Column

Partition Key column determines on what basis the table has to be partitioned. In our example, the partition key column is AuditDate. It is important to be noted that the choice of partition key column is very important for performance aspects. We will cover this in details in next blog posts.

2. Partition Function

Just partition key column does not partition the table. We need a partition logic to be implemented on key column to partition the table. This logic can be implemented through a function called partition function. We can implement the function using RANGE Right and RANGE Left. Simply illustration of Range as below.
    1. Range left means that the actual boundary value belongs to its left partition, it is the last value in the left partition.

    2. Range right means that the actual boundary value belongs to its right partition, it is the first value in the right partition.

Usually, for an year column partition, the last day of the year (31st Dec year) is the range left and first day of the year (1st Jan year) is the range right. If range has not been specified properly, then there will be an overlapping of two years data in the same partition that can cause lots of confusion and difficulties, in fact it may even defeat the purpose of partition altogether.

3. Partition Scheme

The partition scheme maps the logical partitions to physical filegroups. It is possible to map each partition to separate filegroup or all partitions to one filegroup. There are many benefits with a proper mapping of partition and filegroup that we can cover in next blog posts.

Points to ponder:

Use Horizontal partitioning

1. If you know your table and data in detail (I am not joking – šŸ™‚ )

2. If your table has lots of data and you have a definite partition requirement

3. If your table needs an archival of data or efficient movement of data

We will cover more on this topic in coming days, until then stay tuned!

Refer Also:

Stairway to SQL Server Table Partitioning – How do we partition an existing table

I’d like to grow my readership. If you enjoyed this blog post, please share it with your friends!

Stairway to SQL Server Table Partitioning – How do we partition an existing table

This post explains how to implement table partitioning for an existing table in SQL Server. Let us create a table as below for our explanation.

Create Table and Populate table data

Create Table AuditData (ID int identity(1,1) Primary Key not null, AuditDate Datetime not null)

Insert into AuditData Values(getdate() - (365*5)) ,(getdate() - (365*4)) ,(getdate() - (365*3))
,(getdate() - (365*2)) ,(getdate() - (365*1)) ,(getdate()) ,(getdate() + (365*5))
,(getdate() + (365*4)) ,(getdate() + (365*3)) ,(getdate() + (365*2)) ,(getdate() + (365*1))

--Check the data
Select * From AuditData

The above code will create a table called AuditData and inserts a single row for every previous and upcoming 5 years from now. Please note this test data is created on year 2020, so there might be difference in the data at the time you refer this blog.

Check the partition and its data allocation

select Object_name(p.object_id) Table_Name ,(Select name From sys.indexes where Object_id = p.object_id and index_id = p.index_id) 'Index_name',
partition_number, lv.value leftValue, rv.value rightValue,p.rows,
s.name 'Partition_Schema_Name',f.name 'Partition_Function_Name'
from sys.partitions p
join sys.allocation_units a on p.hobt_id = a.container_id
join sys.indexes i on p.object_id = i.object_id
Left join sys.partition_schemes s on i.data_space_id = s.data_space_id
left join sys.partition_functions f on s.function_id = f.function_id
left join sys.partition_range_values rv on f.function_id = rv.function_id and p.partition_number = rv.boundary_id
left join sys.partition_range_values lv on f.function_id = lv.function_id and p.partition_number - 1 = lv.boundary_id
where p.object_id = object_id('AuditData')

From the above, its clear that the index created on the table as part of Primary key has only one partition and all the data is part of that partition. Now, let us quickly convert this existing table as as partitioned table. To do so, we need to create partition function and partition schema as below. The function fn_AuditDate has been created on Datetime field so that in our example, we can create partition based on AuditDate.
--Create partition function on Datetime
Create partition function fn_AuditDate (Datetime) as
Range right for values('20150101', '20160101','20170101','20180101', '20190101','20200101')
/*Please note this test data is created on year 2020, so there might be difference in the data at the time you refer this blog.*/

--Create partition Schema to associate the function
Create partition scheme sc_AuditDate As 
Partition  fn_AuditDate ALL to ([Primary])
On successful execution, we will get the below Message:
Partition scheme ‘sc_AuditDate’ has been created successfully. ‘PRIMARY’ is marked as the next used filegroup in partition scheme ‘sc_AuditDate’.

Recreate clustered index to make the partition column as part of clustered index

ALTER TABLE dbo.AuditData DROP CONSTRAINT [PK__AuditDat__3214EC2760DC1A18]
GO
ALTER TABLE dbo.AuditData ADD CONSTRAINT [PK__AuditDat__3214EC2760DC1A18] PRIMARY KEY NONCLUSTERED  (ID)
   ON [PRIMARY]
GO
CREATE CLUSTERED INDEX IX_AuditData_AuditDate_ID ON dbo.AuditData (AuditDate)
  ON sc_AuditDate(AuditDate)
GO

Now, it is important to make sure that the clustered index to be recreated on partition schema to partition the AuditData table. If you look at the above example, since the clustered index is created as part of Primary key creation, we need to drop the primary key and re create the primary key as non clustered index and create a separate clustered index on AuditDate. Note that the clustered index is created on partition that we created recently – sc_AuditDate.

Check the partition and its data allocation

select Object_name(p.object_id) 'Table_Name',(Select name From sys.indexes where Object_id = p.object_id and index_id = p.index_id) 'Index_name',
partition_number, lv.value leftValue, rv.value rightValue,p.rows,
s.name 'Partition_Schema_Name',f.name 'Partition_Function_Name'
from sys.partitions p
join sys.allocation_units a on p.hobt_id = a.container_id
join sys.indexes i on p.object_id = i.object_id
join sys.partition_schemes s on i.data_space_id = s.data_space_id
join sys.partition_functions f on s.function_id = f.function_id
left join sys.partition_range_values rv on f.function_id = rv.function_id and p.partition_number = rv.boundary_id
left join sys.partition_range_values lv on f.function_id = lv.function_id and p.partition_number - 1 = lv.boundary_id
where p.object_id = object_id('AuditData')

Now, you can see the clustered index has 7 partitions as per the range that we defined in the schema and function. The last partition contains all the data right to the range (eg: 2020/2021/2022/2023/2024/2025); 6 rows in our AuditData table.

Hope this explains how to partition an existing table simply. But wait, that may not be so easy as I explained for your real time scenario. You may want to implement partition on a table that has lots of data/ multiple indexes/foreign key relations defined etc. We need to carefully evaluate the steps to reduce the down time during the implementation of partition. I would suggest partitioning a table (especially bigger ones) should be an offline activity rather than an online for better performance and easy implementation and testing.

Cleanup Objects

Clean up is very important, so my test objects.

DROP TABLE AuditData
DROP PARTITION SCHEME sc_AuditDate
DROP PARTITION FUNCTION fn_AuditDate

I’d like to grow my readership. If you enjoyed this blog post, please share it with your friends!