Tag: Change Data Capture

CDC Jobs in SQL Server – Capture and Cleanup

Earlier, we found how do we set up Change Data Capture for a Database. Today, lets us quickly look at how CDC is working internally. Once CDC is configured on a database, we can see there two SQL Agent Jobs are created – Capture and Cleanup.

By now, we know CDC will capture the data modification information of a CDC enabled table and stored in separate placeholder – Changetables. Changetables are created for every table for which CDC is enabled under a schema – cdc. In addition to that, there is another system table called – cdc.change_tables which stores all information about CDC enabled tables(changetables).

Select * From cdc.change_tables

Now, we will see How these changetables are getting populated and maintained by SQL Server. This is been done by two SQL Agent jobs – Capture and Cleanup.

CDC Capture Job

Capture Job is responsible to push the DML changes into change tables.This job is created per database in SQL Server once the database is enabled CDC.Capture Job reads Transaction logs and writes the DML changes to respective change tables (those are CDC tracked) asynchronously.
CDC uses the same technology(sp_replcmds) as of Replication to read the transaction logs. So if we use both replication and CDC on an environment, it uses and shares the same log reader that minimizes any resource contentions at environment. However, the SQL agent job does not really uses sp_replcmds directly, but another procedure sys.sp_MScdc_capture_job which again internally uses sys.sp_cdc_Scan.

sys.sp_cdc_scan uses few configuration parameter from msdb.dbo.cdc_jobs system table for capture jobs as below:

Query:

Select db_name(database_id) database_name, job_type, B.name,
maxtrans,continuous,pollinginterval,retention,threshold from msdb.dbo.cdc_jobs A
inner join msdb.dbo.sysjobs B on A.job_id= B.job_id
Order by job_type asc

To change the settings, you may need to use another procedure sys.sp_cdc_change_job as below:

EXEC sp_cdc_change_job @job_type='capture', @maxtrans = 500, @maxscans = 10, @continuous = 1, @pollinginterval = 5

If you make changes to these settings, a restart of the capture job is required. To stop the capture job, use:

EXEC sys.sp_cdc_stop_job @job_type = 'capture'

And to start the job again, use:

EXEC sys.sp_cdc_start_job @job_type = 'capture'

Let us quickly see the parameters of sp_cdc_change_job @job_type=’capture’ as below:

maxtrans
The maxtrans configuration option set the maximum number of transactions to read from the transaction log in each scan cycle. The default is 500. If this parameter is set to NULL, it is interpreted as no change.

maxscans
Configures how many scans will be done to capture all rows from the transaction log. Each scan reads the number of transaction specified in the maxtrans option. The default is 10. 10 scans times 500 transactions means 5000 rows will be read at the most each time. If this parameter is set to NULL, it is interpreted as no change.

continuous
A bit where 1 means the job will run continuously (the default) or only one time (0). If the job is set to run continuously, maxtrans rows will be read from the log maxscans times. The job will then wait pollinginterval seconds and then read from the log again. If this parameter is set to NULL, it is interpreted as no change. Normally, 1 would be used. One time reads (0) are for testing purposes only and must not be used in a production environment since records in the transaction log will be kept active until CDC captures them.

pollinginterval
The number of seconds between log scan cycles. The default is 5. This parameter is only used if continuous = 1. If this parameter is set to NULL, it is interpreted as no change. In effect, a WAITFOR is issued between reads from the log, and this 5 second wait is the main reason why you may have to wait a few seconds before the change is captured to the change table.

Cleanup Job

Cleanup job is responsible to clean up the records from the changetables. This job is created automatically by SQL Server to minimize the number of records in the changetables, failing this job execution will be resulting to a larger changetable. This job internally invokes a procedure sys.sp_MScdc_cleanup_job in MSDB database with no parameter.The procedure reads the configuration again from the table – dbo.cdc_jobs that uses two columns retention and threshold.

Retention
This value specifies, in minutes, how long rows in the change table are kept. If NULL is used as value, this is interpreted as “no change”. The default is 4320 minutes (or 72 hours / 3 days). The maximum value is 52494800 minutes (100 years).

Threshold
This parameter is used to limit the number of rows that can be deleted in a cleanup delete statement. The default is 5000 rows. If NULL is used as value, this is interpreted as “no change”.

Again, To change the settings, you may need to use another procedure sys.sp_cdc_change_job as below:

EXEC sp_cdc_change_job @job_type='cleanup', @retention = 4320, @threshold = 5000

If you make changes to these settings, a restart of the cleanup job is required. To stop the cleanup job, use:

EXEC sys.sp_cdc_stop_job @job_type = 'cleanup'

And to start the job again, use:

EXEC sys.sp_cdc_start_job @job_type = 'cleanup'

Few points to be noted:

1. You must make sure SQL Agent is up and running all the time
2. cdc_jobs configurations are very important to set correctly.Overestimating/underestimating the configurations will have a deterimental impact on you application performance. You may need to genuinely configure as per your workload, a performance test can be carried out as per your workload to reach out your optimal values
3. Cleanup job is scheduled by default to run at 02:00 AM every day
4. Capture job is scheduled as “Start automatically when SQL Server Agent starts”. As it uses continuous parameter further, you may not need to make any change for “Schedule type”.

See Also:

Could not update the metadata that indicates database is enabled for Change Data Capture. The failure occurred when executing the command ‘SetCDCTracked(Value = 1)’
How do we move or restore a database with CDC enabled in SQL Server
Msg 22841, Level 16, State 1, Procedure sp_cdc_vupgrade, Line 323 – While restoring from a cdc enabled database backup
Tuning the Performance of Change Data Capture in SQL Server

Could not update the metadata that indicates database <> is enabled for Change Data Capture. The failure occurred when executing the command ‘SetCDCTracked(Value = 1)’. The error returned was 15517: ‘Cannot execute as the database principal because the principal “dbo” does not exist, this type of principal cannot be impersonated, or you do not have permission.’. Use the action and error to determine the cause of the failure and resubmit the request.

While configuring Change Data Capture on a database that is restored from different environment, we might end up with an error message as below:

Could not update the metadata that indicates database <> is enabled for Change Data Capture. The failure occurred when executing the command ‘SetCDCTracked(Value = 1)’. The error returned was 15517: ‘Cannot execute as the database principal because the principal “dbo” does not exist, this type of principal cannot be impersonated, or you do not have permission.’. Use the action and error to determine the cause of the failure and resubmit the request.

To resolve the issue, we need to change the owner of database and then enable CDC for the database as below. Please note that, the database would have restored irrespective of the error. You just have to run the below statement to avoid similar issues in future.

EXEC sp_changedbowner 'sa'
EXEC sys.sp_cdc_enable_db ;

Ref: https://docs.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sp-changedbowner-transact-sql?view=sql-server-2017

How do we move or restore a database with CDC enabled in SQL Server

This post explains how efficiently we can move a CDC enabled database to different environment.

To move a database from one environment to another environment, Firstly, we need to take back up of the database from the source environment.

BACKUP DATABASE CDCOrigin TO DISK = 'C:\Backup\CDCOrigin.BAK'

Once we move the backup file to the new environment, we have two options:

1. Restore database as always and configure/set up CDC freshly as in this post.

2. Restore database with KEEP_CDC option.

RESTORE DATABASE CDCCopy_CDC From DISK = 'C:\Backup\CDCOrigin.BAK'
WITH
	MOVE 'CDCOrigin' To 'C:\Program Files\Microsoft SQL Server\MSSQL13.SQL2016\MSSQL\DATA\CDCCopy1.mdf',
	MOVE 'CDCOrigin_log' To 'C:\Program Files\Microsoft SQL Server\MSSQL13.SQL2016\MSSQL\DATA\CDCCopy1_log.ldf'
	--The below option would keep the CDC of original database
	, KEEP_CDC	

Once restore is completed successfully, we can quickly verify the configurations with below queries.

Select is_CDC_Enabled ,name from sys.databases where is_CDC_Enabled=1 
USE CDCCopy_CDC
Select name from sys.tables where is_tracked_by_cdc = 1

Wait! still there are few things to be done, configuring jobs for capture and cleanup process. KEEP_CDC option with RESTORE command will keep the CDC enabled and the tables tracked by default. However, we need to create capture and cleanup jobs manually as below.

USE CDCCopy_CDC
EXEC sys.sp_cdc_add_job @job_type = N'capture'; 
EXEC sys.sp_cdc_add_job @job_type = N'cleanup'; 

Caveats:

Do not enable this feature unless there is a good reason. The feature adds more space requirement to store the change data and increases disk activities. Be careful on configuring the capture and clean up jobs. If there are many databases with CDC ( due to non-multi tenant support), CDC on multiple databases will have a performance impact.

How do we set up Change Data Capture in SQL Server

This may be an old topic for many who worked with SQL Server, but, As I was doing a proof of concept on CDC recently for one of my projects, thought of sharing about CDC for those who is yet to explore this feature in SQL Server.

What is Change Data Capture?

Change Data Capture or CDC in SQL Server is introduced to track the DML changes and records insert/update/delete operations in respective change tables. The source of truth is Transaction Logs in SQL Server. Capture process jobs will be picking up the required table information from the Transaction Log and records in the Change tables. As these jobs are asynchronous, you may see a latency ideally between the commit and population of change tables. However, this has a performance advantages compared to Trigger based solutions. Similarly there is another clean up job to prune the data (older) from the change tables.

Supporting Editions: Enterprise/Developer/Standard.

How do we implement CDC on a database?

Step 1: : Let us create a database and table to see the proof of Concept


--Create a Test Database
Create Database CDCORigin;

--Create a Table 
create Table TestCDC(FirstName Varchar(100),SecondName Varchar(100))

--Insert values into tables
Insert into TestCDC Values('Ron','David'),('Chirs','Martin')

Step 1: : Enable CDC (Need two steps as below)


EXEC sys.sp_cdc_enable_db
EXEC sys.sp_cdc_enable_table  
@source_schema = N'dbo',  
@source_name   = N'TestCDC',  
@role_name     = NULL

EXEC sys.sp_cdc_add_job @job_type = N'capture'; 
EXEC sys.sp_cdc_add_job @job_type = N'cleanup'; 

You need to make sure that the SQLServerAgent is up and running while configuring the CDC, failing on this, it may end up with not creating the respective jobs in the environment.

Step 3: : Verification of CDC configuration

Once, CDC has set up, we can see the CDC specific tables and Changes tables in Object explorer as below.

In addition, we can also verify the data changes using below scripts.


--Check the CDC tables
Select * From [cdc].[dbo_TestCDC_CT]  --No data

Insert into TestCDC Values('Reddick','Pitt')

--Check the CDC tables
Select * From [cdc].[dbo_TestCDC_CT]

Update TestCDC Set SecondName = 'Pi' Where FirstName='Chirs'

--Check the CDC tables
Select * From [cdc].[dbo_TestCDC_CT]

Delete From TestCDC Where FirstName='Ron'

--Check the CDC tables
Select * From [cdc].[dbo_TestCDC_CT]

Precisely, the __$operation will indicate the action on the table as below.

Hope you enjoyed this post, and I am excited to write more about CDC with my real client experience in next posts. Until then, happy and continuous learning!