Note – this is a cross-post from 4sysops
Well, as we move from theory to practise, our first batch of Windows 7 machines has been deployed and rolled out into the production environment, and so far so very, very good. Microsoft has done a very impressive job with its newest suite of client and server products, and our deployment is being managed end-to-end with no third-party products required.
I thought this would be an opportune time to document some of the problems I’ve encountered so far in the build of our Windows 7 Standard Operating Environment (SOE). Given that we are moving from a Windows XP/Novell Netware environment, there are a whole raft of changes happening as well as having to deal with problems which were lurking behind the scenes, and that’s what I’ll talk about today.
One of our critical line-of-business applications is an authoritative administration/HR system, with a locally-installed GUI application which talks back to a SQL database. The database is hosted on SQL Server 2005 SP3 x64 which sits on a Windows Server 2003 SP2 x64 system. We started noticing that on the Windows 7 machines, the local GUI took forever to talk back to the SQL database. There were no error messages (irritatingly) but performance was so slow as to be unusable.
Of course, the first assumption is that there’s an incompatibility with the application. It’s not an unreasonable assumption given that any IT pro looking at Windows 7 has been conditioned to expect appcompat problems, particularly with Line of Business (LOB) applications. But on further investigation, performing a simple ODBC connectivity test produced the same performance results – in other words, taking the LOB application out of the equation, the problem was still present.
Next step – three cheers for Wireshark. A packet trace on the Windows 7 machine displayed Kerberos traffic between the client, server and domain controller, with an error from the DC that it was unable to verify the ticket request – KRB5KDC_ERR_S_PRINCIPAL_UNKNOWN. The client and server then renegotiated using NTLM and the connection was made. This was why there were no error messages, but did it explain the slow-down?
The error was due to an incorrect Service Principal Name (SPN) for the SQL server. This can happen when SQL Server is installed and the account used to start the SQL services is either a local or domain user rather than the Local System account. Standard user accounts don’t have the access rights in AD to update SPN records, whereas the Local System account uses the computer account of the server, which does have sufficient rights. Why didn’t this problem emerge before? Even though our XP workstations are connected to both AD and eDirectory, the “functionality” of the Novell Client means that when they talk out across the network, the workstations don’t identify themselves using standard the AD domain\username syntax. As a result Kerberos authentication isn’t attempted.
To update and verify the SPN in AD I used two tools – QUERYSPN.VBS and SETSPN. I ran these from the DC, but you can run them from any domain-joined workstation with an account with sufficient rights to modify AD. To check what information AD returns when queries are made, type in:
cscript queryspn.vbs mssqlsvc*
This queries AD for all instances of SQL server. If the SQL services on the problematic server are being started with a local/domain user account, the query should return something like:
CN=Username,CN=Users,DC=domain.com
Class: user
User Logon: Username
MSSQLSvc/servername.domain.com:1433
If this is not returned (as it wasn’t in my case), use SETSPN to create the service name:
SETSPN –S MSSQLSvc/servername.domain.com:1433 username
Using –S instead of –A will check for duplicate entries before adding the SPN. Also, this assumes that the SQL server is using the default port of 1433. If not, use whichever port is appropriate. Run QUERYSPN.VBS again and the correct result should be returned. Allow AD replication to take place and then restart SQL services.
I then ran another packet trace with Wireshark and the Kerberos issue was resolved, but the speed problem was still present (depressingly). Then, when remotely connected from home I noticed that the Windows server which hosts the SQL server was quite sluggish compared to the other systems I was connected to. This triggered a memory of something which cropped up when Windows Vista first came onto the scene, whereby Server 2003-based systems were very slow to respond to networking requests from operating systems later than Windows XP, and this was also a problem in Server 2003 SP2.
Turned out to be a simple registry fix on the server:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\EnableTCPA=1
Change this to EnableTCPA=0, reboot the server and all the performance issues were resolved. Phew!
So the moral of this particular tale is that if you’re planning a Windows 7 deployment, then depending on your environment there may be hidden problems which Windows 7 will uncover, but for which the operating system itself isn’t necessarily the cause. As our deployment progresses I’ll keep bringing you our discoveries.
Until next time…
