Tag: SOUNDEX function in SQL Server

SOUNDEX() and DIFFERENCE() in SQL Server

SOUNDEX() and DIFFERENCE() functions are string functions in SQL Server. Honestly, I have not seen these functions used in development except rare cases. But, its good to understand about these two functions as part of our learning purpose so that in case you see any requirement in future, it would help you.

SOUNDEX() in SQL Server

SOUNDEX() function returns a 4 character code based on how the string sounds on speak. There are definite rules to generate the 4 character code.

1. Every soundex code consists of a letter and three numbers. The letter is always the first letter of the string. The numbers are assigned to the remaining letters of the string according to the soundex guide shown below.

Number Represents the Letters

1-> B, F, P, V

2-> C, G, J, K, Q, S, X, Z

3-> D, T

4-> L

5-> M, N

6-> R

Disregard the letters A, E, I, O, U, H, W, and Y.

SELECT SOUNDEX('CRAFT')
/*
    First Letter is C
    6 for R
    A is ignored letter
    1 for F
    3 for T
*/

2. Zeroes are added at the end if necessary to produce a four-character code.

3. Additional letters are disregarded.

4. If the word has any double letters, they should be treated as one letter.

5. If the surname has different letters side-by-side that have the same number in the soundex coding guide, they should be treated as one letter.

SELECT SOUNDEX('SQL')
/*
    S - the first character
    Q is coming in the same soundex group as S, hence both are treated as one letter.
    L - has soundex digit 4

So, the value is S400 (00 is added to produce four character code as per rule 2.) 
*/

6. If a vowel (A, E, I, O, U) separates two consonants that have the same soundex code, the consonant to the right of the vowel is coded. Example:

7. If “H” or “W” separate two consonants that have the same soundex code, the consonant to the right of the vowel is not coded.

DIFFERENCE() in SQL Server

DIFFERENCE function is used to compare and return an integer (0-4) that represents the similarity of the two words. 0 indicates weak or no similarity between the SOUNDEX values. 4 indicates strong similarity or identically SOUNDEX values.

SELECT SOUNDEX('Juice'), SOUNDEX('Jucy'), difference('Juice','Jucy')

I’d like to grow my readership. If you enjoyed this blog post, please share it with your friends!