Category: SQL Server 2016

Compress and Decompress in SQL Server 2016

Recently, to enforce a very small level of masking of data, I was thinking of different ways and finally used COMPRESS & DECOMPRESS functions introduced in SQL Server 2016.

Beware, This is a very bad example of implementing as a solution for data encryption! Since, it was just so trivial functionality and I do not really want to spend more time on encryption and do not bother on its performance, also it was not for a full time production application and just for demo purpose, it has been taken as an option.But, again, I would like to reiterate this is a wrong example! Do not apply !

If you are looking for a full fledged solution for encryption/data masking – SQL Server has many options, you can refer the options here.

Now, Lets quickly look into the two functions introduced in SQL Server 2016 – COMPRESS & DECOMPRESS.

COMPRESS function quickly compresses the input using GZIP algorithm. More details here.

DECOMPRESS function otherwise, it decompresses the compressed value using GZIP algorithm. More details here.

Since Microsoft covers a good explanation on the subject, let us not try to re-invent the wheel, instead, let us quickly see some of characteristics and its usages.

Storage and its significance

Storage is important factor, hmm, not really in modern world, but yes since if you want to save cost. This is really matters a lot if you have any plan for cloud migration or something like “you pay for what you use”. So COMPRESS and DECOMPRESS makes more sense in such situations. Let us demonstrate a simple example as below.

I am going to create a table with varchar and nvarchar columns to store values.

drop table if exists SQLZealot_Compress_Test
CREATE TABLE SQLZealot_Compress_Test
(
ID int identity(1,1) NOT NULL Primary Key,
Varchar_Value VARCHAR(MAX),
NVarchar_Value NVARCHAR(MAX)
)

Now, let us insert some sample data as below.

INSERT INTO SQLZealot_Compress_Test(Varchar_Value, NVarchar_Value)
VALUES 
 ( 'This is a Compress test by SQLZealot!', N'This is a Compress test by SQLZealot!!')
,( '1234567890', N'1234567890')
,( REPLICATE('X', 9000), REPLICATE('X' , 4500))
,( REPLICATE(CAST('X' AS VARCHAR(MAX)), 9000), REPLICATE(CAST(N'X' AS NVARCHAR(MAX)), 4500))
,( REPLICATE(CAST('SQLZealot|' AS VARCHAR(MAX)), 9000), REPLICATE(CAST(N'SQLZealot|' AS NVARCHAR(MAX)), 4500))

Now, let us insert some more big sized data as below. I used a sample table from my database, you can use any of your big table in your environment by replacing “tablename”.

Insert into SQLZealot_Compress_Test
Select (Select * From tablename FOR JSON PATH, ROOT('Tables')) varcharval,
		(Select * From tablename FOR JSON PATH, ROOT('Tables')) nvarcharval

Here comes our testing script. We are going to see the length of the values using DATALENGTH as below.

Select ID,
		Varchar_Value,
		DataLength(Varchar_Value) DL_NonCompress_Varchar,
		Datalength(Compress(Varchar_Value)) DL_Compress_Varchar,
		NVarchar_Value,
		DataLength(NVarchar_Value) DL_NonCompress_NVarchar,
		Datalength(Compress(NVarchar_Value)) DL_Compress_NVarchar
From SQLZealot_Compress_Test

Complete Demo Script

drop table if exists SQLZealot_Compress_Test
CREATE TABLE SQLZealot_Compress_Test
(
ID int identity(1,1) NOT NULL Primary Key,
Varchar_Value VARCHAR(MAX),
NVarchar_Value NVARCHAR(MAX)
)

INSERT INTO SQLZealot_Compress_Test(Varchar_Value, NVarchar_Value)
VALUES 
 ( 'This is a Compress test by SQLZealot!', N'This is a Compress test by SQLZealot!!')
,( '1234567890', N'1234567890')
,( REPLICATE('X', 9000), REPLICATE('X' , 4500))
,( REPLICATE(CAST('X' AS VARCHAR(MAX)), 9000), REPLICATE(CAST(N'X' AS NVARCHAR(MAX)), 4500))
,( REPLICATE(CAST('SQLZealot|' AS VARCHAR(MAX)), 9000), REPLICATE(CAST(N'SQLZealot|' AS NVARCHAR(MAX)), 4500))

Insert into SQLZealot_Compress_Test
Select (Select * From OC_TABLEDEFS FOR JSON PATH, ROOT('Tables')) varcharval,
		(Select * From OC_TABLEDEFS FOR JSON PATH, ROOT('Tables')) nvarcharval

Select ID,
		Varchar_Value,
		DataLength(Varchar_Value) DL_NonCompress_Varchar,
		Datalength(Compress(Varchar_Value)) DL_Compress_Varchar,
		NVarchar_Value,
		DataLength(NVarchar_Value) DL_NonCompress_NVarchar,
		Datalength(Compress(NVarchar_Value)) DL_Compress_NVarchar
From SQLZealot_Compress_Test

Screenshot

Observations

1. Do not use COMPRESS & DECOMPRESS as a replacement for encryption/data masking.

2. If table has less size of data, then COMPRESS will not have a benefit, instead there may be a small fraction of overhead of long value of varbinary datatype.

3. If table has big size data, then COMPRESS seems to be a good option.

4. If table has unicode character datatype, the benefit seems to be lesser than character datatype.

5. Need to evaluate the CPU cycles effort (if you do COMPRESS & DECOMPRESS) in SQL Server.

6. It would be a good option to consider doing COMPRESS & DECOMPRESS functions in application layer.

7. If you have audit feature, it is a good option to consider the compressed data as part of audited info instead of actual data (if audited data is not being frequently used).

I’d like to grow my readership. If you enjoyed this blog post, please share it with your friends!

String_Split function in SQL Server 2016

Introduction:

SQL Server 2016 has introduced a built-in table valued function string_split to split a string into rows based on the separator.

Earlier, developers need to write custom/user defined function to do the split work. However, with the new function, its a built-in and can be used directly. The problem with user defined function is the way the function is defined and used.There are multiple ways usually developers write custom functions like scalar/Table valued functions. And its observed in many cases, scalar functions are not scaling as desired and causing serious performance issues.

Usages:

1. Split the variable string


Select * From string_split('first,second,third',',')

2. Split the table rows with the help of CROSS APPLY


Create Table splitTable(Col int, string varchar(100))
Insert into splitTable values(1,'First,Second'),(2,'Primary,Secondary')

Select * From splitTable a
Cross apply string_split(a.string,',') B

Drop Table splitTable

If we analyse the execution plan of the above query, we can see “Table Valued Function” operator is used to implement the function.

Limitations:

1. ONLY a single-character delimiters can be used at a time. If you want to apply multiple delimiter, you can use multiple CROSS APPLY.

2. There is no way to understand the ordinal position/eliminate duplicates/empty strings in the string.

3. This function will work ONLY if the compatibility of your database is 130 or later irrespective of SQL Server version.

I would recommend to use this built-in function for future development and provide your experience as part of learning and sharing.

SQL Server 2016 Database Scoped Configuration to force The Legacy CE

Problem Statement:

Recently, we identified a performance issue with a query once we upgraded the database server to SQL Server 2016. If you look at the query, it uses system catalogs to fetch some data. This query can be optimized or rewritten in better way, however, we are going to discuss about how the same query behaves with different scenario/settings.


SELECT UPPER(ccu.table_name) AS SourceTable
    ,UPPER(ccu.constraint_name) AS SourceConstraint
    ,UPPER(ccu.column_name) AS SourceColumn
    ,LOWER(ccu.TABLE_SCHEMA) AS SourceSchema
    ,UPPER(kcu.table_name) AS TargetTable
    ,UPPER(kcu.column_name) AS TargetColumn
    ,LOWER(kcu.TABLE_SCHEMA) as TargetSchema
    ,LOWER(kcu.constraint_name) as TargetConstraint
FROM INFORMATION_SCHEMA.CONSTRAINT_COLUMN_USAGE ccu
    INNER JOIN INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS rc
    ON ccu.CONSTRAINT_NAME = rc.CONSTRAINT_NAME
    INNER JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE kcu
    ON kcu.CONSTRAINT_NAME = rc.UNIQUE_CONSTRAINT_NAME
WHERE ccu.TABLE_NAME not like 'tablename%' and ccu.TABLE_SCHEMA = 'dbo'
ORDER BY ccu.TABLE_SCHEMA, ccu.table_name, ccu.COLUMN_NAME 

The above query runs for 3 minutes in SQL Server 2016 with compatibility 130, but quite faster in SQL Server 2008 R2 that returns in 3 seconds. It is also observed that, when we change the compatibility 100 in SQL Server 2016, the query runs faster as expected.

Analysis:

 
--SQL 2016 (with compatibility 130) Took 3 min 16 sec
(424 row(s) affected)
Table 'sysidxstats'. Scan count 7971, logical reads 38154, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'sysschobjs'. Scan count 13063, logical reads 104556, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'syscolpars'. Scan count 3265, logical reads 8681635, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'sysiscols'. Scan count 2696, logical reads 6120, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.


--SQL 2008 R2 (with compatibility 100) Took 0 min 3 sec
(424 row(s) affected)
Table 'sysidxstats'. Scan count 235, logical reads 1280, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'sysschobjs'. Scan count 11, logical reads 1314787, physical reads 0, read-ahead reads 6, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'syscolpars'. Scan count 1, logical reads 2659, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'sysiscols'. Scan count 2, logical reads 410, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

If we see the IO statistics, we can clearly see that the number of IO is more with compatibility 130 compared to 100. So it is evident that there are some fundamental changes at optimizer level that causes this difference.

Solutions:

We have multiple ways to solve this difference:

1. Change the compatibility to 100


Alter database DBNAME SET COMPATIBILITY_LEVEL = 100

Considerations:

Once we set lower compatibility, we will not be able to use new features introduced in SQL 2016 like RLS/In Memory OLTP and many others.

2. Use Trace Flag 9481

Trace Flag 9481 forces Legacy CE model used by SQL Server 7.0 – 2012.
There are three ways to use this option for different scopes.

a. Query Scope: Use QUERYTRACEON query hint specified in the OPTION clause for a query.


Select * From Table OPTION ( QUERYTRACEON 9481 )

Considerations:

1. This option is to use trace flags at the query level. The change is ONLY applicable for the query, It does not have any impact on other queries.
2. This option requires sysadmin fixed server role. In production, it is unlikely to have this privillage in most cases for application logins.
3. This requires changes in the procedures/application code.

b. Session Scope: If you dont want to make any modification in query or procedures or there are many places we need to make the changes, then you can use this option at session level to use the legacy cardinality estimator.


DBCC TRACEON (9481);
GO
Select * From Table
GO
DBCC TRACEOFF (9481);

Considerations:

1. It may appear to be a quick fix in first place. But, this will not use the latest CE abilities.
2. If the procedure plan is already cached, then it will still be continuing to use the old one.

c. Global Scope: Trace flags can be set at server level and the changes will impact on every connection to the SQL Server instance.

Global trace flags can be set in two different ways as below:


	1. Using Query
		DBCC TRACEON (9481, -1);
	2. Using Startup parameters
                Go To "SQL Server 2016 Configuration Manager" -> "SQL Server services" -> Right Click on the Service -> Properties -> Startup Parameters - > Add "-T9481".
	

Considerations:

1. When you want to move your databases to use compatibility of 130 (to use new feature), but, you still need some time to test the existing code performance with new CE.
2. You may be restricting the new CE cost based benefits while creating the plans.

3. Use LEGACY_CARDINALITY_ESTIMATION Database Scoped Configuration

This is a new feature added to SQL Server 2016 – Database Scoped Configuration. To stick with the subject, lets cover LEGACY_CARDINALITY_ESTIMATION option in this feature. We have seen QUERYTRACEON option at different scopes like Query/session or Server level. SQL Server 2016 provides another option at DATABASE level using LEGACY_CARDINALITY_ESTIMATION. There are two ways, we can use this option.

a. Using GUI:


OR


ALTER DATABASE SCOPED CONFIGURATION
SET LEGACY_CARDINALITY_ESTIMATION = ON;

b. Using Query HINT: SQL Server 2016 SP1 introduced a new query hint which will influence the query optimizer. This does not require the sysadmin fixed server role unlike QUERYTRACEON, however, it is logically equivalent of the same.


SELECT * FROM Tablename
OPTION (USE HINT('FORCE_LEGACY_CARDINALITY_ESTIMATION'));

Additional information: Using DMV sys.database_scoped_configurations, we would be able to see the settings with Database scoped configuration.


SELECT configuration_id, name, value, value_for_secondary
FROM sys.database_scoped_configurations OPTION (RECOMPILE);

Hope, this helps you to understand various options to use Legacy CE in SQL Server 2016.
Happy reading and learning!!!

Always Encrypted – A new column level security feature in SQL Server 2016

Always Encrypted (AE) is a new feature introduced in SQL Server 2016 to secure your data in SQL Server at column level. Perhaps, SQL Server has many options to secure the data, the new feature Always Encrypted stands out from the list with unique characteristics – “Always Encrypted”.

Before we get into details about Always Encrypted, let us quickly look at the security features in the SQL Server in comparison as an overview.

AE-Always Encrypted, DDM – Dynamic Data Masking, TDE – Transparent Data Encryption

Why do we call Always Encrypted?

As the name depicts, Always Encrypted feature in SQL Server always ensures your data encrypted, that means, the data at rest and in motion. The encryption and decryption happens at client application using an Always Encrypted driver. This separates the encryption from SQL Server database engine and enforces more security in a better controlled manner.

How do we implement Always Encrypted?

First and foremost action is to install the right version of SQL Server 2016.If you do not have the right version, you will not find the option “Encrypt Columns” in “Task” of the database options. If you are not using SSMS version 13.0.4001.0 or above, you will not be able to see this option in your SSMS.

You can find and download SP1

There are two ways, we can implement Always Encrypted in SQL Server using Wizard and T-SQL. However, we need to know that for existing table/column data, there is no way to implement the AE using T-SQL in SQL Server 2016(SP1). I mentioned the service pack as a caveat because Microsoft may change this behavior in future, but unlikely as of now. For existing data, we need to make sure the AE needs to be implemented using Wizard. It is quite good to be noted there is an option to generate PowerShell script for the existing data to encrypt that can be run later.

Using Wizard

Using T-SQL


/*1*/ CREATE COLUMN MASTER KEY CMK_Auto2
WITH (  
  KEY_STORE_PROVIDER_NAME = 'MSSQL_CERTIFICATE_STORE',   
  KEY_PATH = 'CurrentUser/my/B27A4A9FCC37F2C5B1807249FE1285CD4A40B88F');
/*2*/ CREATE COLUMN ENCRYPTION KEY AEColumnKey   
WITH VALUES  
(  
COLUMN_MASTER_KEY = CMK_Auto2,   
ALGORITHM = 'RSA_OAEP',   
ENCRYPTED_VALUE = 0x016E000001630075007200720065006E00740075007300650072002F006D0079002F00620032003700610034006100390066006300630033003700660032006300350062003100380030003700320034003900660065003100320038003500630064003400610034003000620038003800660053FD933BC3E3E6E5FFD935F452A5C4113FF56E4D946D78B22A69415FF8EF69D9B3A5541F2463BBC32D06AC88AE95B4CDBBEE7A9D1DD80043D7C900F28917637F4414565CB3F2B29CEEE5C03DF182C4F62395CDAD59A59BFCBD421889DB9EFB2B5250AA597268011B8ACCFFA7A1B5D846BD476BBD8F8239D2681C800E3BCD848485AEC6E69FE76D06D2E213FB36FCBCA5E8B75FE67D21C1C05EB7CF819AD9F96701116A2B642F690455FC7DC48AEEB1825BB20ECD428F910C002EE3D186706E00F76C608EF78FBB147ABA798309092517A39C9C4031B3857C599B238174AA1E8433A649D63D194278B0A4EFBF15DF4E4B5B4468FB73FC8992B3E34606AB306E2E19BADEE4B38288FF77B9A8E45A56BB321091EF0CF3567076ED27D875286CB2232177F610B9A0DAEBFA34ABA9856A094E26E995987AD050D27954DDB08BED9A34C6D19CBE6B2271A7E716C33850DB8781C9D3B762C0920EED57BB9D2BA581F7AC1A46EA55962200FD26405FE31005D413BA5B624E5AF2770377A13EB68FB681242B8B719499175113E84073013BDC6E03E5F82EC070B9151705F1C564106B93E3C7566E41BAD00209AB4587278640FE225F797DD9BB83284E8A674DFC7F48558441E00BC856161FC93A38E337B050915450E7B0ED848CDB63272B65319B26B45119ED081852DEBE53DFF7A6CD21935FC3CBF2C4852AD01CFF0153B76C196F7667  
);
/*3*/ CREATE TABLE AlwaysEncryptedSampleSQL(
  EmpID INT PRIMARY KEY,
  SSN NVARCHAR(15) 
    COLLATE Latin1_General_BIN2 
    ENCRYPTED WITH (COLUMN_ENCRYPTION_KEY = AEColumnKey, 
    ENCRYPTION_TYPE = Deterministic, 
    ALGORITHM = 'AEAD_AES_256_CBC_HMAC_SHA_256') NOT NULL);

Why do we need two encryption keys?

Yes, Always Encrypted uses two keys – Column encryption key and master encryption key. Column encryption key is used to encrypt the value in the column and database engine stores the column encryption key in the SQL Server instance. However, master key is being stored in external key store like Windows Certificate store or Azure Key Vault and database engine will have metadata pointer to the external key store. Master key is responsible to encrypt the column encryption key.

How do we verify the above implementation?

1. Check the master key

SELECT name KeyName,column_master_key_id KeyID,
  key_store_provider_name KeyStore,
  key_path KeyPath
FROM sys.column_master_keys;

2. Check the column key

SELECT name KeyName,
  column_encryption_key_id KeyID
FROM sys.column_encryption_keys;

3. Check the sys.columns

Select name,collation_name,encryption_type_desc, encryption_algorithm_name,column_encryption_key_id 
From sys.columns where object_id in (object_id('AlwaysEncryptedSample'),object_id('AlwaysEncryptedSampleSQL'))

How do the application encrypt and decrypt the value?

Client application uses Always Encrypted driver. I would suggest you to go through “Using Always Encrypted with the ODBC Driver for SQL Server” to understand better the usage. Anyway, this blog post will be followed up with the next post, on which I am currently working on, to understand the performance impact.

What are the different types of encryption in Always Encryption feature?

Yes, AE comes up with two different type of encryption.

1. Deterministic
As the name suggests, this type will always produce the same encrypted value for a given text. Ideally, this may not be a good option for all the keys as a good intruder can easily understand the value by analyzing data pattern as an example, gender, polar questions etc. As the encrypted value for a given text is always same, the encrypted column can very well part of a join, grouping and indexing.

2. Randomized
As the name suggests, it will produce randomized value which will make the encryption more secure than the earlier.

Gotchas!!!!…..

1. There is no straight forward method to implement AE for existing data apart using the wizard. However, wizard can generate PowerShell Script to do the action later.
2. Encryption method – Deterministic is less secure compared to “randomized”.
3. Encryption method – Randomized cannot be part of joins/groups/indexing.
4. INSERT/UPDATE operations are not allowed directly to table unless through the client driver. We will receive the below error message.
Msg 206, Level 16, State 2, Line 5
Operand type clash: varchar is incompatible with varchar(8000) encrypted with (encryption_type = 'DETERMINISTIC', encryption_algorithm_name = 'AEAD_AES_256_CBC_HMAC_SHA_256', column_encryption_key_name = 'CEK_Auto1', column_encryption_key_database_name = 'test') collation_name = 'SQL_Latin1_General_CP1_CI_AS'

5. By specifying encryption setting in “Additional Connection Parameters” in SQL Connection window, who has access the encrypted table, can see the actual data. However, the login cannot modify or insert new data.

6. Column encryption changes the collation of the string column to Latin1_General_BIN2
7. Encryption will increase the size of the table

See Also

Please refer Transparent Data Encryption

To know more about other Always Encrypted limitations, please refer Aaron’s blog.

Its always important to look at the License type of your SQL Server

Its always important to look at the License type of your SQL Server!!!

Problem Statement:

We recently had an issue with CPU utilization reaching more than 95% always for database server in one of our performance test environment. Load test environments are resource intensive test, hence it is expected to have high CPU utilization. However, we could observe the number of tests processed and number of transactions are very less spiking the CPU utilization to 95%.

Let me explain a bit more on my environment, We have 4 sockets with 10 physical cores and HT enabled in our test environment. As per the configuration we have total 80 logical CPU available. SQL Server version information as below:


Microsoft SQL Server 2014 (SP2) (KB3171021) - 12.0.5000.0 (X64) 
               Jun 17 2016 19:14:09 
               Copyright (c) Microsoft Corporation
               Enterprise Edition ((missing))Core based Licensing>((missing)) (64-bit) on Windows NT 6.3  (Build 9600: )

Here we can observe that the version information is missing “Core based Licensing”, that means, the SQL version is not Core based, but CAL based. Let us look at the excerpt from the MSDN article:

“Enterprise Edition with Server + Client Access License (CAL) based licensing (not available for new agreements) is limited to a maximum of 20 cores per SQL Server instance. There are no limits under the Core-based Server Licensing model”

So, though we have 4 sockets with 10 Cores, ONLY 20 cores are VISIBLE for SQL Server. In our environment, this was 20*2(HT enabled) = 40 Logical CPUs are visible.

Ref: https://technet.microsoft.com/en-us/library/cc645993(v=sql.120).aspx

Lets confirm the above with other parameters DMV -sys.dm_os_schedulers.


Select parent_node_id,Count(cpu_id) Total_Schedulers,
 count(Case when Status = 'VISIBLE ONLINE' Then 1 Else null End) Visible_Count,
 count(Case when Status = 'VISIBLE OFFLINE' Then 1 Else null End) NotVisible_Count
From sys.dm_os_schedulers 
where status in ('VISIBLE ONLINE','VISIBLE OFFLINE') 
and parent_node_id not in (64)--DAC
Group by parent_node_id

The result looks like below:


This is clear that SQL server was not able to utilize more than 40 logical CPU in the above environment.

Once we upgraded the license to Core Based License, we were able to use all available CPUs in our environment and observed the database CPU utilization has come down to 65% resulting more number of tests and transactions.

Let me reiterate, Its always important to look at the License type of your SQL Server !!!