Windows 2003 SP2, the Scalable Networking Pack, and SQL Server – can be a bad combo

Windows 2003 SP2, the Scalable Networking Pack, and SQL Server – can be a bad combo

Learned something new this week (and kind of the hard way). Hopefully this will help someone else out there.

This past Sunday morning, the Windows server group applied monthly Windows updates to our SQL Servers. This is a monthly process and typically no issues arise as we have them apply the updates to our development and test servers the week prior. However, this month, immediately after the updates were applied to the servers hosting our SQL instances and the servers restarted, performance between the servers (a lot of ETL processes here) became dismal. Jobs/Processes that normally would run in seconds or minutes wouldn’t finish at all. Seemed as though any SQL calls to our SQL 2005 Servers (all running Windows 2003 SP2) would just be sucked into the great SQL black hole.

We had’t had problems with these scheduled ETL processes earlier that morning, prior to the patches. But the following simple Linked Server select statement would take minutes from the affected servers and would return in less than 50ms from an unaffected server:

1
SELECT 1 FROM [LinkSvrA].[ApplDon].[dbo].[Customer] WHERE [CUST_NO_PK] = 1

Where LinkSvrA is the linked server running Windows 2003 SP2 (plus some Windows updates applied Sunday morning). The column, [CUST_NO_PK] is the only column in the Primary Key, Clustered Index. This was executed from a Windows 2008 server running SQL Server 2008 SP1.

After hours and hours of troubleshooting the performance problems, a couple of key indicators to me were pointing at network connectivity and throughput. The SQL Server wait types that were dominating the wait pools were OLEDB and ASYNC_NETWORK_IO. Ultimately indicating the provider (SQL Native Client in this case) is waiting on the network.

In the end, the case was escalated to Microsoft and it was determined that the Network Card settings on LnkSvrA were the root cause and after following the steps below, the issue was resolved.

Changing settings at TCP level are often not enough and the settings need to be disabled at the NIC driver level as well. You need to go into NCPA.cpl, then navigate to the Advanced network settings for your network card(s) and disable the “Chimney Offload” and Receive Side Scaling settings.

Steps to disable RSS, TCP Offloading (this is specific to the card but will usually look something like this):

• Open your network connection properties and click on “Configure”.

• In the advanced tab, set Large Offload to Disabled for all IP versions.

• Ensure Receive Side Scaling is Disabled.

• One additional recommendation we did not end up changing was the TCP and UDP Checksum Offload option. Having this set to Rx and Tx Enabled is working fine for us.

• Restart the server to ensure these settings take effect and retest your processes.

Take Away:
The SNP (Scalable Network Pack) features added and, by default, enabled by SP2 for Windows 2003 caused the performance of normally well performing processes to become abysmal. This has been a well known issue since the introduction of SP2 for Windows 2003 and Microsoft later released a patch to disable the SNP features due to the problems encountered by SQL Server and Exchange Server post SP2 implementation. In your Windows 2008 servers, the SNP features are disabled by default, so this shouldn’t be a problem.

Update: MS has a KB Article with a patch of sorts for this: http://support.microsoft.com/kb/948496

 

Leave a Reply

Your email address will not be published. Required fields are marked *