Compress and Decompress in SQL Server 2016

Recently, to enforce a very small level of masking of data, I was thinking of different ways and finally used COMPRESS & DECOMPRESS functions introduced in SQL Server 2016.

Beware, This is a very bad example of implementing as a solution for data encryption! Since, it was just so trivial functionality and I do not really want to spend more time on encryption and do not bother on its performance, also it was not for a full time production application and just for demo purpose, it has been taken as an option.But, again, I would like to reiterate this is a wrong example! Do not apply !

If you are looking for a full fledged solution for encryption/data masking – SQL Server has many options, you can refer the options here.

Now, Lets quickly look into the two functions introduced in SQL Server 2016 – COMPRESS & DECOMPRESS.

COMPRESS function quickly compresses the input using GZIP algorithm. More details here.

DECOMPRESS function otherwise, it decompresses the compressed value using GZIP algorithm. More details here.

Since Microsoft covers a good explanation on the subject, let us not try to re-invent the wheel, instead, let us quickly see some of characteristics and its usages.

Storage and its significance

Storage is important factor, hmm, not really in modern world, but yes since if you want to save cost. This is really matters a lot if you have any plan for cloud migration or something like “you pay for what you use”. So COMPRESS and DECOMPRESS makes more sense in such situations. Let us demonstrate a simple example as below.

I am going to create a table with varchar and nvarchar columns to store values.

drop table if exists SQLZealot_Compress_Test
CREATE TABLE SQLZealot_Compress_Test
(
ID int identity(1,1) NOT NULL Primary Key,
Varchar_Value VARCHAR(MAX),
NVarchar_Value NVARCHAR(MAX)
)

Now, let us insert some sample data as below.

INSERT INTO SQLZealot_Compress_Test(Varchar_Value, NVarchar_Value)
VALUES 
 ( 'This is a Compress test by SQLZealot!', N'This is a Compress test by SQLZealot!!')
,( '1234567890', N'1234567890')
,( REPLICATE('X', 9000), REPLICATE('X' , 4500))
,( REPLICATE(CAST('X' AS VARCHAR(MAX)), 9000), REPLICATE(CAST(N'X' AS NVARCHAR(MAX)), 4500))
,( REPLICATE(CAST('SQLZealot|' AS VARCHAR(MAX)), 9000), REPLICATE(CAST(N'SQLZealot|' AS NVARCHAR(MAX)), 4500))

Now, let us insert some more big sized data as below. I used a sample table from my database, you can use any of your big table in your environment by replacing “tablename”.

Insert into SQLZealot_Compress_Test
Select (Select * From tablename FOR JSON PATH, ROOT('Tables')) varcharval,
		(Select * From tablename FOR JSON PATH, ROOT('Tables')) nvarcharval

Here comes our testing script. We are going to see the length of the values using DATALENGTH as below.

Select ID,
		Varchar_Value,
		DataLength(Varchar_Value) DL_NonCompress_Varchar,
		Datalength(Compress(Varchar_Value)) DL_Compress_Varchar,
		NVarchar_Value,
		DataLength(NVarchar_Value) DL_NonCompress_NVarchar,
		Datalength(Compress(NVarchar_Value)) DL_Compress_NVarchar
From SQLZealot_Compress_Test

Complete Demo Script

drop table if exists SQLZealot_Compress_Test
CREATE TABLE SQLZealot_Compress_Test
(
ID int identity(1,1) NOT NULL Primary Key,
Varchar_Value VARCHAR(MAX),
NVarchar_Value NVARCHAR(MAX)
)

INSERT INTO SQLZealot_Compress_Test(Varchar_Value, NVarchar_Value)
VALUES 
 ( 'This is a Compress test by SQLZealot!', N'This is a Compress test by SQLZealot!!')
,( '1234567890', N'1234567890')
,( REPLICATE('X', 9000), REPLICATE('X' , 4500))
,( REPLICATE(CAST('X' AS VARCHAR(MAX)), 9000), REPLICATE(CAST(N'X' AS NVARCHAR(MAX)), 4500))
,( REPLICATE(CAST('SQLZealot|' AS VARCHAR(MAX)), 9000), REPLICATE(CAST(N'SQLZealot|' AS NVARCHAR(MAX)), 4500))

Insert into SQLZealot_Compress_Test
Select (Select * From OC_TABLEDEFS FOR JSON PATH, ROOT('Tables')) varcharval,
		(Select * From OC_TABLEDEFS FOR JSON PATH, ROOT('Tables')) nvarcharval

Select ID,
		Varchar_Value,
		DataLength(Varchar_Value) DL_NonCompress_Varchar,
		Datalength(Compress(Varchar_Value)) DL_Compress_Varchar,
		NVarchar_Value,
		DataLength(NVarchar_Value) DL_NonCompress_NVarchar,
		Datalength(Compress(NVarchar_Value)) DL_Compress_NVarchar
From SQLZealot_Compress_Test

Screenshot

Observations

1. Do not use COMPRESS & DECOMPRESS as a replacement for encryption/data masking.

2. If table has less size of data, then COMPRESS will not have a benefit, instead there may be a small fraction of overhead of long value of varbinary datatype.

3. If table has big size data, then COMPRESS seems to be a good option.

4. If table has unicode character datatype, the benefit seems to be lesser than character datatype.

5. Need to evaluate the CPU cycles effort (if you do COMPRESS & DECOMPRESS) in SQL Server.

6. It would be a good option to consider doing COMPRESS & DECOMPRESS functions in application layer.

7. If you have audit feature, it is a good option to consider the compressed data as part of audited info instead of actual data (if audited data is not being frequently used).

I’d like to grow my readership. If you enjoyed this blog post, please share it with your friends!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s