Monday, May 12, 2008

Adding nodes to a SQL Server 2005 Cluster

This past weekend I had the task of adding two additional nodes to our current 2 Node SQL Server 2005 x64 cluster. When I logged into an existing node of the cluster the new nodes had already been added to the cluster and were ready to have the SQL Server binaries installed. I started off by following the steps recommended by Microsoft (http://msdn.microsoft.com/en-us/library/ms191545.aspx). I was able to go into add/remove programs and click on Change for SQL Server 2005 and the installation process was started. I was able to easily make my way through the setup screens and soon had an installation running for one of our three Virtual SQL Servers. The installation seemed to be running smoothly but after a few minutes the installation reported a failure and roll backed. The error reported was "Setup failed to start on the remote machine. Check the Task scheduler event log on the remote machine." After doing some research I was able to learn this was due to the fact that some of the machines had RDP connections. I opened up terminal services manager and ensured all remote connections had been logged off for each node in the cluster. After doing this I retired setup and it failed once again with the same error. After doing a little more research I found that some people had to not only kill all RDP connection but also reboot the machines. So I rebooted each node and this solved the first error of the night. The next error I encountered was reported as Task did not appear to start on machine: CP-ITS-SQL64P-4: 267013. This machine was one of the new nodes being introduced. I decided to reboot that machine and restart the installation. Once I did that the next error was Task did not appear to start on machine: CP-ITS-SQL64P-5: 267013. I was thinking I was making progress but after rebooting that machine (the other new node). I restarted the installation and received the following error Task did not appear to start on machine: CP-ITS-SQL64P-4: 267013. So basically I was in an installation failure loop. While trying to research this error I was unable to find anyone out there that had experienced a similar issue. After having no luck on Google I decided to shutdown one of the new nodes CP-ITS-SQL64P-5 and work strictly with CP-ITS-SQL64P-4. Finally success. I was able to run through the setup of a virtual server and successfully install SQL on a new node. I decided to apply SP2 right away on the new node to make sure I had successfully finished the install of one of my 3 virtual servers on the new node. The SP2 was applied with no problems. Afterwards I returned to performing the install for the remaining two virtual servers. I had a new error this time Error String : The setup has encountered an unexpected error while Completing Commit. The error is: The object already exists. After doing some research I couldn't find much information on this one and decided to reboot the node I was doing the install from. This fixed the issue as I was able to install the instance after the reboot. I can only assume it was holding a variable from the install of the first Virtual Server. After that I was able to smoothly reboot and install for the remaining virtual server. By this time I was already an hour over my estimated 3 hour time window. I decided to quickly failover the 2 un-patched instances to CP-ITS-SQL64P-4 and apply SP2. The Service Pack was applying smoothly until I got the dreaded no passive nodes patched, setup failed. By this time it was late and I had to let the customer applications back in so I decided to leave the first Virtual Server that was installed and patched running on CP-ITS-SQL64P-4 and failed the other groups back to the original nodes. I am assuming a full round of reboots will allow SP2 to apply. During my next maintenance windows I plan to finish up the setup. My major question regarding the installation is “Does a major issue exist when trying to add more than 1 node at a time to a cluster in SQL 2005?”

4 comments:

Anonymous said...

Well said.

Anonymous said...

We've been going through the same problems. 3 node cluster, 13 SQL Instances, 4000 databases, adding two nodes. The company wont let us take a full outage so rebooting and shutting down isnt an options. After endless errors and many many hours of troubleshooting we were able to complete the install of 4 new instances. However after even more errors, and more troubleshooting attempts to add existing instances to the new nodes resulted in all instances on nodes 1 and 3 to fail over to new node 4 (where they where not patched), and the instances could not be failed back. Every node in the cluster had to be rebooted to clear the Group Move problems. My company and 30,000 users were down for over an hour. MS has no clue why it's happening or how to fix it.

buy viagra said...
This comment has been removed by a blog administrator.
mohsin said...

Guys using VM Ware

i have over come with that problem by sharing the SQL 2005 Server folder from the domain controller which is on the same network as NODE1 and NODE2, and then install it from any node and it works.

no further error.

"Task did not appear to start on machine 267013"



Regards

Mohsin Ali