HASHBYTES for a large string in SQL Server

HASHBYTES function in SQL Server is used to hash the input using the algorithm specified.

This is a very efficient method to compare the string, for an example, lets compare the view definition between two different databases etc.

I do not really want to cover the usage or basic information of HASHBYTES in this post, but, a limitation and how to overcome the same. The first input parameter of HASHBYTES is the algorithm that needs to be used and the second one is the actual character or binary string that needs to be hashed.Now, the limitation is it will not accept the string more than 8000 bytes.

Lets quickly look at the example.


CREATE TABLE dbo.Test1 (c1 nvarchar(MAX));  
Insert into dbo.Test1 Select Replicate('a',4001) 
Select hashbytes('SHA1',c1) From dbo.Test1
Drop table dbo.Test1

The above code will throw an exception “String or binary data would be truncated.” as below:

To overcome the limitation, I have come up with a solution to break down the string into multiple and apply the hashing separately and later combined.

The script is as below:


Create FUNCTION [dbo].[GenerateHASHforLargeValue]
(   
    @TextValue nvarchar(max)
)

RETURNS varbinary(20)

AS
BEGIN

    if @TextValue = null
        return hashbytes('SHA1', 'null')

    Declare @TextLength as integer
	Declare @BinaryValue as varbinary(20)

    Set @TextLength = len(@TextValue)
	Declare @LenCount int  = 3500
    if @TextLength > @LenCount
    Begin
    ;With cte 
    as
    (
	Select substring(@TextValue,1, @LenCount) textval, @LenCount+1 as start, @LenCount Level,
               hashbytes('SHA1', substring(@TextValue,1, @LenCount)) hashval
	Union All 
	Select substring(@TextValue,start,Level), start+Level ,@LenCount  Level, 
               hashbytes('SHA1', substring(@TextValue,start,Level) + convert( varchar(20), hashval )) 
	From cte where Len(substring(@TextValue,start,Level))>0
    ) Select @BinaryValue = (Select Top 1 hashval From cte Order by start desc)
			return @BinaryValue
    End
    else
    Begin
	Set @BinaryValue = hashbytes('SHA1', @TextValue)
	return @BinaryValue
    End
    return null
END

If we apply the above function, then it will generate the hash code without the string truncation issue.

Hope this will help you whenever you may need to generate hash for larger strings!!!

6 thoughts on “HASHBYTES for a large string in SQL Server”

  1. Thank you! I was looking for a good workaround to the 8000 character limit, and this is the best one I found. I have one suggestion to make. Instead of “if @TextLength > @LenCount” use “IF DATALENGTH(@TextValue) > 8000” so that any value up to the 8k limit of the function doesn’t require the CTE to be processed.

    Like

    1. Hi there. Just FYI: the 8000 byte limit was removed starting in SQL Server 2016. So, a “fix” is only needed for SQL Server 2014 or earlier. Also, another option for sending in > 8000 bytes and getting the actual hash value (same as “HASHBYTES(N’SHA1′, …)” produces on SQL Server 2016 and newer) is to use SQLCLR. I created a SQLCLR library of functions, SQL# ( https://sqlsharp.com/ ), that contains a function to do this, called “Util_Hash”. And yes, It is in the Free version.

      Like

  2. Mh…

    select hashbytes(‘SHA2_256’, ‘a’) — 0xCA978112CA1BBDCAFAC231B39A23DC4DA786EFF8147C4E72B9807785AFEE48BB

    — modified function to use SHA2_256
    select dbo.[GenerateHASHforLargeValue](‘a’) — 0xFFE9AAEAA2A2D5048174DF0B80599EF0197EC024

    Like

    1. Thank you for your feedback. The above discrepancy must be something related to your datatype. simple ‘a’ will take the datatype as varchar, but the function input datatype is defined as nvarchar. You may feel free to change the code to align with your requirement, in your case, the function input parameters can be varchar(max).

      Hope the above would help you.

      Like

Leave a comment