The most common scenario is a live running service that needs to be upgraded with a new version of the code. The new code has been prepared and uploaded to the production server, and the server has been restarted. Unfortunately, the service does not work anymore. What could be worse than that? There is no way back, because the original code has been overwritten with the new but non-working code.
Another scenario is where a whole set of files is being transferred to the live server but some network problem has occurred in the middle, which has slowed things down or totally aborted the transfer. With some of the files old and some new, the service is most likely broken. Since some files were overwritten, you can't roll back to the previously working version of the service.
No matter what file transfer technique is used, be it FTP, NFS, or anything else, live running code should never be directly overwritten during file transfer. Instead, files should be transferred to a temporary directory on the live machine, ready to be moved when necessary. If the transfer fails, it can then be restarted safely.
Both scenarios can be made safer with two approaches. First, do not overwrite working files. Second, use a revision control system such as CVS so that changes to working code can easily be undone if the working code is accidentally overwritten. Revision control will be covered later in this chapter.
We recommend performing all updates on the live server in the following sequence. Assume for this example that the project's code directory is /home/httpd/perl/rel. When we're about to update the files, we create a new directory, /home/httpd/perl/test, into which we copy the new files. Then we do some final sanity checks: check that file permissions are readable and executable for the user the server is running under, and run perl -Tcw on the new modules to make sure there are no syntax errors in them.
To save some typing, we set up some aliases for some of the apachectl commands and for tailing the error_log file:
panic% alias graceful /home/httpd/httpd_perl/bin/apachectl graceful panic% alias restart /home/httpd/httpd_perl/bin/apachectl restart panic% alias start /home/httpd/httpd_perl/bin/apachectl start panic% alias stop /home/httpd/httpd_perl/bin/apachectl stop panic% alias err tail -f /home/httpd/httpd_perl/logs/error_log
Finally, when we think we are ready, we do:
panic% cd /home/httpd/perl panic% mv rel old && mv test rel && stop && sleep 3 && restart && err
Note that all the commands are typed as a single line, joined by &&, and only at the end should the Enter key be pressed. The && ensures that if any command fails, the following commands will not be executed.
The elements of this command line are:
If mv is overriden by a global alias mv -i, which requires confirming every action, you will need to call mv -f to override the -i option.
When updating code on a remote machine, it's a good idea to prepend nohup to the beginning of the command line:
panic% nohup mv rel old && mv test rel && stop && sleep 3 && restart && err
This approach ensures that if the connection is suddenly dropped, the server will not stay down if the last command that executes is stop.
apachectl generates its status messages a little too early. For example, when we execute apachectl stop, a message saying that the server has been stopped is displayed, when in fact the server is still running. Similarly, when we execute apachectl start, a message is displayed saying that the server has been started, while it is possible that it hasn't yet. In both cases, this happens because these status messages are not generated by Apache itself. Do not rely on them. Rely on the error_log file instead, where the running Apache server indicates its real status.
Also note that we use restart and not just start. This is because of Apache's potentially long stopping times if it has to run lots of destruction and cleanup code on exit. If start is used and Apache has not yet released the port it is listening to, the start will fail and the error_log will report that the port is in use. For example:
Address already in use: make_sock: could not bind to port 8000
However, if restart is used, apachectl will wait for the server to quit and unbind the port and will then cleanly restart it.
Now, what happens if the new modules are broken and the newly restarted server reports problems or refuses to start at all?
The aliased err command executes tail -f on the error_log, so that the failed restart or any other problems will be immediately apparent. The situation can quickly and easily be rectified by returning the system to its pre-upgrade state with this command:
panic% mv rel bad && mv old rel && stop && sleep 3 && restart && err
This command line moves the new code to the directory bad, moves the original code back into the runtime directory rel, then stops and restarts the server. Once the server is back up and running, you can analyze the cause of the problem, fix it, and repeat the upgrade again. Usually everything will be fine if the code has been extensively tested on the development server. When upgrades go smoothly, the downtime should be only about 5-10 seconds, and most users will not even notice anything has happened.