There are many things that you might need to update on a server, ranging from a major upgrade of the operating system to just an update of a single piece of software (such as the Apache server itself).
One simple approach to performing an upgrade painlessly is to have a backup machine, of similar capacity and identical configuration, that can replace the production machine while the upgrade is happening. It is a good idea to have such a machine handy and to use it whenever major upgrades are required. The two machines must be kept synchronized, of course. (For Unix/Linux users, tools such as rsync and mirror can be used for synchronization.)
However, it may not be necessary to have a special machine on standby as a backup. Unless the service is hosted elsewhere and you can't switch the machines easily, the development machine is probably the best choice for a backup—all the software and scripts are tested on the development machine as a matter of course, and it probably has a software setup identical to that of the production machine. The development machine might not be as powerful as the live server, but this may well be acceptable for a short period, especially if the upgrade is timed to happen when the site's traffic is fairly quiet. It's much better to have a slightly slower service than to close the doors completely. A web log analysis tool such as analog can be used to determine the hour of the day when the server is under the least load.
Switching between the two machines is very simple:
Shut down the network on the backup machine.
Configure the backup machine to use the same IP address and domain name as the live machine.
Shut down the network on the live machine (do not shut down the machine itself!).
Start up the network on the backup machine.
When you are certain that the backup server has successfully replaced the live server (that is, requests are being serviced, as revealed by the backup machine's access_log), it is safe to switch off the master machine or do any necessary upgrades.
Why bother waiting to check that everything is working correctly with the backup machine? If something goes wrong, the change can immediately be rolled back by putting the known working machine back online. With the service restored, there is time to analyze and fix the problem with the replacement machine before trying it again. Without the ability to roll back, the service may be out of operation for some time before the problem is solved, and users may become frustrated.
We recommend that you practice this technique with two unused machines before using the production boxes.
After the backup machine has been put into service and the original machine has been upgraded, test the original machine. Once the original machine has been passed as ready for service, the server replacement technique described above should be repeated in reverse. If the original machine does not work correctly once returned to service, the backup machine can immediately be brought online while the problems with the original are fixed.
You cannot have two machines configured to use the same IP address, so the first machine must release the IP address by shutting down the link using this IP before the second machine can enable its own link with the same IP address. This leads to a short downtime during the switch. You can use the heartbeat utility to automate this process and thus possibly shorten the downtime period. See the references section at the end of this chapter for more information about heartbeat.