HASHBYTES function in SQL Server is used to hash the input using the algorithm specified.
This is a very efficient method to compare the string, for an example, lets compare the view definition between two different databases etc.
I do not really want to cover the usage or basic information of HASHBYTES in this post, but, a limitation and how to overcome the same. The first input parameter of HASHBYTES is the algorithm that needs to be used and the second one is the actual character or binary string that needs to be hashed.Now, the limitation is it will not accept the string more than 8000 bytes.
Lets quickly look at the example.
CREATE TABLE dbo.Test1 (c1 nvarchar(MAX));
Insert into dbo.Test1 Select Replicate('a',4001)
Select hashbytes('SHA1',c1) From dbo.Test1
Drop table dbo.Test1
The above code will throw an exception “String or binary data would be truncated.” as below:
To overcome the limitation, I have come up with a solution to break down the string into multiple and apply the hashing separately and later combined.
The script is as below:
Create FUNCTION [dbo].[GenerateHASHforLargeValue]
(
@TextValue nvarchar(max)
)
RETURNS varbinary(20)
AS
BEGIN
if @TextValue = null
return hashbytes('SHA1', 'null')
Declare @TextLength as integer
Declare @BinaryValue as varbinary(20)
Set @TextLength = len(@TextValue)
Declare @LenCount int = 3500
if @TextLength > @LenCount
Begin
;With cte
as
(
Select substring(@TextValue,1, @LenCount) textval, @LenCount+1 as start, @LenCount Level,
hashbytes('SHA1', substring(@TextValue,1, @LenCount)) hashval
Union All
Select substring(@TextValue,start,Level), start+Level ,@LenCount Level,
hashbytes('SHA1', substring(@TextValue,start,Level) + convert( varchar(20), hashval ))
From cte where Len(substring(@TextValue,start,Level))>0
) Select @BinaryValue = (Select Top 1 hashval From cte Order by start desc)
return @BinaryValue
End
else
Begin
Set @BinaryValue = hashbytes('SHA1', @TextValue)
return @BinaryValue
End
return null
END
If we apply the above function, then it will generate the hash code without the string truncation issue.
Hope this will help you whenever you may need to generate hash for larger strings!!!
Thank you! I was looking for a good workaround to the 8000 character limit, and this is the best one I found. I have one suggestion to make. Instead of “if @TextLength > @LenCount” use “IF DATALENGTH(@TextValue) > 8000” so that any value up to the 8k limit of the function doesn’t require the CTE to be processed.
LikeLike
Hi there. Just FYI: the 8000 byte limit was removed starting in SQL Server 2016. So, a “fix” is only needed for SQL Server 2014 or earlier. Also, another option for sending in > 8000 bytes and getting the actual hash value (same as “HASHBYTES(N’SHA1′, …)” produces on SQL Server 2016 and newer) is to use SQLCLR. I created a SQLCLR library of functions, SQL# ( https://sqlsharp.com/ ), that contains a function to do this, called “Util_Hash”. And yes, It is in the Free version.
LikeLike
Mh…
select hashbytes(‘SHA2_256’, ‘a’) — 0xCA978112CA1BBDCAFAC231B39A23DC4DA786EFF8147C4E72B9807785AFEE48BB
— modified function to use SHA2_256
select dbo.[GenerateHASHforLargeValue](‘a’) — 0xFFE9AAEAA2A2D5048174DF0B80599EF0197EC024
LikeLike
Thank you for your feedback. The above discrepancy must be something related to your datatype. simple ‘a’ will take the datatype as varchar, but the function input datatype is defined as nvarchar. You may feel free to change the code to align with your requirement, in your case, the function input parameters can be varchar(max).
Hope the above would help you.
LikeLike
thanks, shall i use the MD5 instead of SHA1 for conversion in the same function ?
LikeLike
Ofcourse, but it warrants thorough testing as always with any development.
LikeLike