How do we set up Change Data Capture in SQL Server

This may be an old topic for many who worked with SQL Server, but, As I was doing a proof of concept on CDC recently for one of my projects, thought of sharing about CDC for those who is yet to explore this feature in SQL Server.

What is Change Data Capture?

Change Data Capture or CDC in SQL Server is introduced to track the DML changes and records insert/update/delete operations in respective change tables. The source of truth is Transaction Logs in SQL Server. Capture process jobs will be picking up the required table information from the Transaction Log and records in the Change tables. As these jobs are asynchronous, you may see a latency ideally between the commit and population of change tables. However, this has a performance advantages compared to Trigger based solutions. Similarly there is another clean up job to prune the data (older) from the change tables.

Supporting Editions: Enterprise/Developer/Standard.

How do we implement CDC on a database?

Step 1: : Let us create a database and table to see the proof of Concept


--Create a Test Database
Create Database CDCORigin;

--Create a Table 
create Table TestCDC(FirstName Varchar(100),SecondName Varchar(100))

--Insert values into tables
Insert into TestCDC Values('Ron','David'),('Chirs','Martin')

Step 1: : Enable CDC (Need two steps as below)


EXEC sys.sp_cdc_enable_db
EXEC sys.sp_cdc_enable_table  
@source_schema = N'dbo',  
@source_name   = N'TestCDC',  
@role_name     = NULL

EXEC sys.sp_cdc_add_job @job_type = N'capture'; 
EXEC sys.sp_cdc_add_job @job_type = N'cleanup'; 

You need to make sure that the SQLServerAgent is up and running while configuring the CDC, failing on this, it may end up with not creating the respective jobs in the environment.

Step 3: : Verification of CDC configuration

Once, CDC has set up, we can see the CDC specific tables and Changes tables in Object explorer as below.

In addition, we can also verify the data changes using below scripts.


--Check the CDC tables
Select * From [cdc].[dbo_TestCDC_CT]  --No data

Insert into TestCDC Values('Reddick','Pitt')

--Check the CDC tables
Select * From [cdc].[dbo_TestCDC_CT]

Update TestCDC Set SecondName = 'Pi' Where FirstName='Chirs'

--Check the CDC tables
Select * From [cdc].[dbo_TestCDC_CT]

Delete From TestCDC Where FirstName='Ron'

--Check the CDC tables
Select * From [cdc].[dbo_TestCDC_CT]

Precisely, the __$operation will indicate the action on the table as below.

Hope you enjoyed this post, and I am excited to write more about CDC with my real client experience in next posts. Until then, happy and continuous learning!

Advertisements

Configure BlockedProcessReport in SQL Server

In this blog we will discuss on how to configure Blocked Process report on a particular database. This will be useful for troubleshooting any processes which are getting blocked for more than a specific time threshold. Even though blocking is common behavior in sql, but if the block duration is longer than expected, then we can experience performance issues. We have many ways to identify the blocking in sql (which are not going to be discussed here) but I am going to cover one method using event notifications.

By default, the “blocked process threshold” is zero, meaning that SQL Server won’t generate the blocked process reports. We need to configure by the “blocked process threshold” to a specific value (in seconds) using the sp_configure option. For example, if we set it to 15 seconds, then the event will fire three times if a session is blocked for 45 seconds.

Lets see the step by step process how to configure the blocked process report:

First the database should be enabled for service broker.

step1

We need to create QUEUE,SERVICE,ROUTE and EVENT NOTIFICATION to capture the blocking/deadlock events.

step2

step3

Please note in the above step , we have created a Event Notification at Server level , and on the database where you want to track the blocking for more than the configured value. All the above 3 steps are specific to DB level where as creating notification is at server level.  You can get the list of  already configured event notifications on your server with the below query:

SELECT * FROM sys.server_event_notifications

Now lets create a table to hold the blocking information

IF OBJECT_ID('[dbo].[BlockedProcessReports]') IS NULL
CREATE TABLE [dbo].[BlockedProcessReports]
(
blocked_process_id int IDENTITY(1,1) PRIMARY KEY,
database_name sysname,
post_time datetime,
blocked_process_report xml
);
GO

let’s create a stored procedure to read the messages receives from service broker queue  and capture the event in the above table we created:

CREATE PROCEDURE [dbo].[ProcessBlockProcessReports]
WITH EXECUTE AS OWNER
AS
SET NOCOUNT ON
DECLARE @message_body XML
,@message_type INT
,@dialog UNIQUEIDENTIFIER
,@subject VARCHAR(MAX)
,@body VARCHAR(MAX)

WHILE (1 = 1)
BEGIN
BEGIN
BEGIN TRANSACTION
-- Receive the next available message from the queue
WAITFOR (
RECEIVE TOP(1) -- just handle one message at a time
@message_type=message_type_id, --the type of message received
@message_body=CAST(message_body AS XML), -- the message contents
@dialog = conversation_handle -- the identifier of the dialog this message was received on
FROM dbo.BlockedProcessReportQueue
), TIMEOUT 1000 -- if the queue is empty for one second, give UPDATE and go away
-- If we didn't get anything, bail out
IF (@@ROWCOUNT = 0)
BEGIN
ROLLBACK TRANSACTION
BREAK
END

INSERT INTO [dbo].[BlockedProcessReports]
(
database_name
,post_time
,blocked_process_report
)
SELECT DB_NAME(CAST(@message_body AS
XML).value('(/EVENT_INSTANCE/DatabaseID)[1]',
'int'))
,CAST(@message_body AS XML).value('(/EVENT_INSTANCE/PostTime)[1]',
'datetime')
,CAST(@message_body AS XML)

END

-- Commit the transaction. At any point before this, we could roll
-- back - the received message would be back on the queue AND the response
-- wouldn't be sent.
COMMIT TRANSACTION
END;
GO

The final step is to activate the queue to call the stored procedure

ALTER QUEUE BlockedProcessReportQueue
WITH
ACTIVATION
(STATUS=ON,
PROCEDURE_NAME = [dbo].[ProcessBlockProcessReports],
MAX_QUEUE_READERS = 1,
EXECUTE AS OWNER);
GO

We will test this by creating a blocking scenario manually and lets check whether we can see the blocking information on the table we created:

Lets create an exclusive lock and shared locks on a table and see our table for blocking info.
create table
concurrent screen
table data
block xml.JPG

As you can see above we can see the queries which are involved in blocking from the above xml.

Hope you have enjoyed the post and share your comments

String_Split function in SQL Server 2016

Introduction:

SQL Server 2016 has introduced a built-in table valued function string_split to split a string into rows based on the separator.

Earlier, developers need to write custom/user defined function to do the split work. However, with the new function, its a built-in and can be used directly. The problem with user defined function is the way the function is defined and used.There are multiple ways usually developers write custom functions like scalar/Table valued functions. And its observed in many cases, scalar functions are not scaling as desired and causing serious performance issues.

Usages:

1. Split the variable string


Select * From string_split('first,second,third',',')

2. Split the table rows with the help of CROSS APPLY


Create Table splitTable(Col int, string varchar(100))
Insert into splitTable values(1,'First,Second'),(2,'Primary,Secondary')

Select * From splitTable a
Cross apply string_split(a.string,',') B

Drop Table splitTable

If we analyse the execution plan of the above query, we can see “Table Valued Function” operator is used to implement the function.

Limitations:

1. ONLY a single-character delimiters can be used at a time. If you want to apply multiple delimiter, you can use multiple CROSS APPLY.

2. There is no way to understand the ordinal position/eliminate duplicates/empty strings in the string.

3. This function will work ONLY if the compatibility of your database is 130 or later irrespective of SQL Server version.

I would recommend to use this built-in function for future development and provide your experience as part of learning and sharing.

SQL Server 2016 Database Scoped Configuration to force The Legacy CE

Problem Statement:

Recently, we identified a performance issue with a query once we upgraded the database server to SQL Server 2016. If you look at the query, it uses system catalogs to fetch some data. This query can be optimized or rewritten in better way, however, we are going to discuss about how the same query behaves with different scenario/settings.


SELECT UPPER(ccu.table_name) AS SourceTable
    ,UPPER(ccu.constraint_name) AS SourceConstraint
    ,UPPER(ccu.column_name) AS SourceColumn
    ,LOWER(ccu.TABLE_SCHEMA) AS SourceSchema
    ,UPPER(kcu.table_name) AS TargetTable
    ,UPPER(kcu.column_name) AS TargetColumn
    ,LOWER(kcu.TABLE_SCHEMA) as TargetSchema
    ,LOWER(kcu.constraint_name) as TargetConstraint
FROM INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE ccu
    INNER JOIN INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS rc
    ON ccu.CONSTRAINT_NAME = rc.CONSTRAINT_NAME
    INNER JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE kcu
    ON kcu.CONSTRAINT_NAME = rc.UNIQUE_CONSTRAINT_NAME
WHERE ccu.TABLE_NAME not like 'tablename%' and ccu.TABLE_SCHEMA = 'dbo'
ORDER BY ccu.TABLE_SCHEMA, ccu.table_name, ccu.COLUMN_NAME 

The above query runs for 3 minutes in SQL Server 2016 with compatibility 130, but quite faster in SQL Server 2008 R2 that returns in 3 seconds. It is also observed that, when we change the compatibility 100 in SQL Server 2016, the query runs faster as expected.

Analysis:

 
--SQL 2016 (with compatibility 130) Took 3 min 16 sec
(424 row(s) affected)
Table 'sysidxstats'. Scan count 7971, logical reads 38154, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'sysschobjs'. Scan count 13063, logical reads 104556, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'syscolpars'. Scan count 3265, logical reads 8681635, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'sysiscols'. Scan count 2696, logical reads 6120, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.


--SQL 2008 R2 (with compatibility 100) Took 0 min 3 sec
(424 row(s) affected)
Table 'sysidxstats'. Scan count 235, logical reads 1280, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'sysschobjs'. Scan count 11, logical reads 1314787, physical reads 0, read-ahead reads 6, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'syscolpars'. Scan count 1, logical reads 2659, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'sysiscols'. Scan count 2, logical reads 410, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

If we see the IO statistics, we can clearly see that the number of IO is more with compatibility 130 compared to 100. So it is evident that there are some fundamental changes at optimizer level that causes this difference.

Solutions:

We have multiple ways to solve this difference:

1. Change the compatibility to 100


Alter database DBNAME SET COMPATIBILITY_LEVEL = 100

Considerations:

Once we set lower compatibility, we will not be able to use new features introduced in SQL 2016 like RLS/In Memory OLTP and many others.

2. Use Trace Flag 9481

Trace Flag 9481 forces Legacy CE model used by SQL Server 7.0 – 2012.
There are three ways to use this option for different scopes.

a. Query Scope: Use QUERYTRACEON query hint specified in the OPTION clause for a query.


Select * From Table OPTION ( QUERYTRACEON 9481 )

Considerations:

1. This option is to use trace flags at the query level. The change is ONLY applicable for the query, It does not have any impact on other queries.
2. This option requires sysadmin fixed server role. In production, it is unlikely to have this privillage in most cases for application logins.
3. This requires changes in the procedures/application code.

b. Session Scope: If you dont want to make any modification in query or procedures or there are many places we need to make the changes, then you can use this option at session level to use the legacy cardinality estimator.


DBCC TRACEON (9481);
GO
Select * From Table
GO
DBCC TRACEOFF (9481);

Considerations:

1. It may appear to be a quick fix in first place. But, this will not use the latest CE abilities.
2. If the procedure plan is already cached, then it will still be continuing to use the old one.

c. Global Scope: Trace flags can be set at server level and the changes will impact on every connection to the SQL Server instance.

Global trace flags can be set in two different ways as below:


	1. Using Query
		DBCC TRACEON (9481, -1);
	2. Using Startup parameters
                Go To "SQL Server 2016 Configuration Manager" -> "SQL Server services" -> Right Click on the Service -> Properties -> Startup Parameters - > Add "-T9481".
	

Considerations:

1. When you want to move your databases to use compatibility of 130 (to use new feature), but, you still need some time to test the existing code performance with new CE.
2. You may be restricting the new CE cost based benefits while creating the plans.

3. Use LEGACY_CARDINALITY_ESTIMATION Database Scoped Configuration

This is a new feature added to SQL Server 2016 – Database Scoped Configuration. To stick with the subject, lets cover LEGACY_CARDINALITY_ESTIMATION option in this feature. We have seen QUERYTRACEON option at different scopes like Query/session or Server level. SQL Server 2016 provides another option at DATABASE level using LEGACY_CARDINALITY_ESTIMATION. There are two ways, we can use this option.

a. Using GUI:


OR


ALTER DATABASE SCOPED CONFIGURATION
SET LEGACY_CARDINALITY_ESTIMATION = ON;

b. Using Query HINT: SQL Server 2016 SP1 introduced a new query hint which will influence the query optimizer. This does not require the sysadmin fixed server role unlike QUERYTRACEON, however, it is logically equivalent of the same.


SELECT * FROM Tablename
OPTION (USE HINT('FORCE_LEGACY_CARDINALITY_ESTIMATION'));

Additional information: Using DMV sys.database_scoped_configurations, we would be able to see the settings with Database scoped configuration.


SELECT configuration_id, name, value, value_for_secondary
FROM sys.database_scoped_configurations OPTION (RECOMPILE);

Hope, this helps you to understand various options to use Legacy CE in SQL Server 2016.
Happy reading and learning!!!

WITH Encryption clause in SQL Server

This is quite an old feature, however, a very useful one in SQL Server.

Applies to: SQL Server 2005 and above.

Why do we need this or its Usages?

“WITH ENCRYPTION” clause can be used for procedures and function in SQL Server.
I personally used this clause in many places, where I wanted to hide my procedure definition. When we need to manage multiple environments and different people have access to environments, its possible, the procedure definition gets changed by others. So, if we need to have to keep the same definition across all environments, we need to get the definition encrypted which would prevent the users to change the definition.

Usually, DBAs will use this option to prevent any changes or hide the logic implemented for certain purposes.Sometimes business critical logic are also encrypted using this option to hide from others.

How to Implement?

The implementation is very simple to provide the clause at the time of creation or alter.


CREATE OR ALTER procedure Test_Encryption
WITH ENCRYPTION
AS
Begin
	
	Select 'This is an encrypted procedure.'
		
End

Let us try to get the definition using sp_helptext, we will be getting a message “The text for object ‘Test_Encryption’ is encrypted.”

If we try to get the definition through SSMS, we will get the below error message.


And you can see a lock symbol to identify the encrypted procedures.

Gotchas:
1. Once the procedure is encrypted, you will not be ale to see the execution plan for the procedure.
2. It is always important to save the definition safe, you will never be able to retrieve the procedure definition once its encrypted.
3. CLR procedures and CLR User Defined Functions cannot be encrypted.
4. Encrypted procedures or functions cannot be replicated.

Python Dictionaries

A dictionary is a set of unordered key, value pairs. In a dictionary, the keys must be unique and they are stored in an unordered manner.

In this tutorial you will learn the basics of how to use the Python dictionary.

Creating a Dictionary:

dict1

Accessing Items:
You can access the items of a dictionary by referring to its key name, inside square brackets:

dict2

Updating Dictionary:
You can update a dictionary by adding a new entry or a key-value pair, modifying an existing entry, or deleting an existing entry as shown below in the simple example −

dict3

Loop Through a Dictionary:
You can loop through a dictionary by using a for loop.When looping through a dictionary, the return value are the keys of the dictionary, but there are methods to return the values as well.

dict4.JPG

Check if Exists:

You can test the presence of a key using ‘in’ or ‘not in’

dict5

Restrictions on Dictionary Keys:
Almost any type of value can be used as a dictionary key in Python. As an example,  integer, float, and Boolean objects are used as keys:

dict6

However, there are a couple restrictions that dictionary keys must abide by.

First, a given key can appear in a dictionary only once. Duplicate keys are not allowed. A dictionary maps each key to a corresponding value, so it doesn’t make sense to map a particular key more than once.
You could see below that when you assign a value to an already existing dictionary key, it does not add the key a second time, but replaces the existing value:

dict7

Restrictions on Dictionary Values:
By contrast, there are no restrictions on dictionary values. Literally none at all. A dictionary value can be any type of object Python supports, including mutable types like lists and dictionaries, and user-defined objects.There is also no restriction against a particular value appearing in a dictionary multiple times.

Hope you have enjoyed the post. Keep reading

How to identify table rows and size in a database in SQL Server

Problem Statement:

At times, we may need to understand the size of the tables in a database for various reasons. I often query these details to understand the growth of the data and clean up data for lower environments.

Code:
Here is a simple script to understand the rows, size of the table in a database.
You can further make changes to filter for any tables/group the results based on the indexes/partition etc.


SELECT  sc.name + '.' + t.NAME AS TableName,  
        SUM(p.[Rows]) NumerOfRecords,
        -- p.partition_number, i.index_id,i.name,
        ( SUM(a.total_pages) * 8 ) / 1024 AS TotalReservedSpaceMB, 
        ( SUM(a.used_pages) * 8 ) / 1024 AS UsedDataSpaceMB,  
        ( SUM(a.data_pages) * 8 ) / 1024 AS FreeUnusedSpaceMB  
FROM    sys.tables t  
        INNER JOIN sys.schemas sc ON sc.schema_id = t.schema_id  
        INNER JOIN sys.indexes i ON t.OBJECT_ID = i.object_id  
        INNER JOIN sys.partitions p ON i.object_id = p.OBJECT_ID  
                                            AND i.index_id = p.index_id  
        INNER JOIN sys.allocation_units a ON p.partition_id = a.container_id  
WHERE   t.type_desc = 'USER_TABLE'  
        -- Replace with valid table name if you want to filter for a table
        --AND t.NAME='YourTableName' 
GROUP BY sc.name + '.' + t.NAME,  
        i.[object_id]
        --,i.index_id, i.name, p.[Rows], p.partition_number
ORDER BY ( SUM(a.total_pages) * 8 ) / 1024 DESC