When writing automated monitoring tools, you should start by monitoring the tools themselves until they are reliable and stable enough to be left to work by themselves.
Even when everything is automated, you should check at regular intervals that everything is working OK, since a minor change in a single component can silently break the whole monitoring system. A good example is a silent failure of the mail system—if all alerts from the monitoring tools are delivered through email, having no messages from the system does not necessarily mean that everything is OK. If emails alerting about a problem cannot reach the webmaster because of a broken email system, the webmaster will not realize that a problem exists. (Of course, the mailing system should be monitored as well, but then problems must be reported by means other than email. One common solution is to send messages by both email and to a mobile phone's short message service.)
Another very important (albeit often-forgotten) risk time is the post-upgrade period. Even after a minor upgrade, the whole service should be monitored closely for a while.
The first and simplest check is to visit a few pages from the service to make sure that things are working. Of course, this might not suffice, since different pages might use different resources—while code that does not use the database system might work properly, code that does use it might not work if the database server is down.
The second thing to check is the web server's error_log file. If there are any problems, they will probably be reported here. However, only obvious syntactic or malfunction bugs will appear here—the subtle bugs that are a result of bad program logic will be revealed only through careful testing (which should have been completed before upgrading the live server).