Tag: SQL Error Troubleshooting

Error message: complete login process due to delay in opening server connection in SQL Server

Problem Statement:
Some days ago, one of my collegue faced an issue with one of her SQL jobs. SQL Jobs were intermittendly breaking its execution and stopped the processing.When we were analysing the issue in detail, we observed the below error log in SQL Server Error logs.

10/23/2014 @ 00:36:58: Sqlcmd: Error: Microsoft SQL Server Native Client 10.0 : Unable to
10/23/2014 @ 00:36:58: complete login process due to delay in opening server connection.
10/23/2014 @ 01:03:39: Finish External Commandline (1): sqlcmd.exe -Q “EXEC
10/23/2014 @ 01:03:39: spBatchScript
10/23/2014 @ 01:03:39: ‘xxxxxxxxxxxxxxx’, ’10/23/2014
10/23/2014 @ 01:03:39: 00:36:12′, ’10/23/2014 00:36:37′, ’00:00:25.2342135′, ‘Success’,
10/23/2014 @ 01:03:39: ‘0’” -S”xxxxxxx,xxxx” -dSchedulerLog -E

Troubleshooting the Issue:
The below error message was so clear that there were some issues with connectivity.

10/23/2014 @ 00:36:58:        Sqlcmd: Error: Microsoft SQL Server Native Client 10.0 : Unable to 
10/23/2014 @ 00:36:58:        complete login process due to delay in opening server connection. 

As long as the issue was so sporadic in nature, we were not able to simulate the issue and find the cause at the time.So, we decided to go with RING_BUFFER and collect if that could give us some lights.

Step 1: We tried to collect the network issues using the below query,However, there was a 5 days gap between the fact and the reporting, we were not able to collect the info from RING_BUFFER. Finally, we decided to wait for the next event to happened to collect the information:

SELECT CAST(record AS XML) FROM sys.dm_os_ring_buffers

Step 2: The output from the above query was too large and it was very difficult to analyse the file manually. Hence, we come up with the following shredding method to get the info only related with Connectivity.
You may find the script used:

--You may need assign the result of above query to variable @x.
Declare @x XML = ''

--Actual Query
;With cte as
	e.value('../@id[1]','VARCHAR(10)') AS RecordId,
	e.value('RecordType[1]','VARCHAR(10)') AS RecordType,
	e.value('Spid[1]','VARCHAR(10)') AS Spid,
	e.value('SniConnectionId[1]','VARCHAR(100)') AS SniConnectionId,
	e.value('SniProvider[1]','VARCHAR(10)') AS SniProvider,
	e.value('SniConsumerError[1]','VARCHAR(10)') AS SniConsumerError,
	e.value('RemoteHost[1]','VARCHAR(50)') AS RemoteHost,
	e.value('RemotePort[1]','VARCHAR(10)') AS RemotePort,
	e.value('LocalHost[1]','VARCHAR(50)') AS LocalHost,
	e.value('LocalPort[1]','VARCHAR(10)') AS LocalPort,
    e.value('RecordTime[1]','datetime') AS RecordTime,
	e.value('OSError[1]','VARCHAR(10)') AS OSError,
	e.value('State[1]','VARCHAR(10)') AS State,
	x1.e1.value('PhysicalConnectionIsKilled[1]','VARCHAR(10)') AS PhysicalConnectionIsKilled,
	x1.e1.value('DisconnectDueToReadError[1]','VARCHAR(10)') AS DisconnectDueToReadError,
	x1.e1.value('NetworkErrorFoundInInputStream[1]','VARCHAR(10)') AS NetworkErrorFoundInInputStream,
	x1.e1.value('ErrorFoundBeforeLogin[1]','VARCHAR(10)') AS ErrorFoundBeforeLogin,
	x1.e1.value('SessionIsKilled[1]','VARCHAR(10)') AS SessionIsKilled,
	x1.e1.value('NormalDisconnect[1]','VARCHAR(10)') AS NormalDisconnect
FROM @x.nodes('//ConnectivityTraceRecord') x(e)
Cross Apply x.e.nodes('TdsDisconnectFlags')x1(e1)
) Select * From cte where RecordTime between '2014-12-09 22:00:19.940' and '2014-12-09 22:50:19.940'

Step 3: When we analysed the query result, we observed some pattern as below:


Here, we would like to look at two things: SniConsumerError number and State columns. And state column will provide information on how we should further proceed.

Please find the below table for different States and its descriptions. (The below is logically grouped for very common issues. If you are looking for an exhuastive list of descriptions, I would recommand you to go through Aaron Bertrand’s blog post).


From the above, it is clear that there were some issues with server access failures. We reported back to the respective team (Network team) to proceed further on troubleshooting the issue with our observations.

Hope, this would help you for similar situations. I request you to share your experiences as part of our sharing and learning process.