Let's write another similar mod_perl script that hangs, and deploy strace to find the point at which it hangs (see Example 21-11).

Example 21-11. hangme.pl

local $|=1;
my $r = shift;

print "PID = $$\n";

my $i = 0;
while (1) {
    sleep 1;

The reason this simple code hangs is obvious. It never breaks from the while loop. As you can see, it prints the PID of the current process to the browser. Of course, in a real situation you cannot use the same trick—in the previous section we presented several ways to detect the runaway processes and their PIDs.

We save the above code in a file and make a request. As usual, we use $|=1; in our demonstration scripts to unbuffer STDOUT so we will immediately see the process ID. Once the script is requested, the script prints the PID and obviously hangs. So we press the Stop button, but the process continues to hang in this code. Isn't Apache supposed to detect the broken connection and abort the request? Yes and no—you will understand soon what's really happening.

First let's attach to the process and see what it's doing. We use the PID the script printed to the browser—in this case, it is 10045:

panic% strace -p 10045

[...truncated identical output...]
SYS_175(0, 0xbffff41c, 0xbffff39c, 0x8, 0) = 0
SYS_174(0x11, 0, 0xbffff1a0, 0x8, 0x11) = 0
SYS_175(0x2, 0xbffff39c, 0, 0x8, 0x2)   = 0
nanosleep(0xbffff308, 0xbffff308, 0x401a61b4, 0xbffff308, 0xbffff41c) = 0
time([940973834])                       = 940973834
time([940973834])                       = 940973834
[...truncated the identical output...]

It isn't what we expected to see, is it? These are some system calls we don't see in our little example. What we actually see is how Perl translates our code into system calls. We know that our code hangs in this snippet:

while (1) {
    sleep 1;

so these must be the system calls that represent this loop, since they are printed repeatedly.

Usually the situation is different from the one we have shown. You first detect the hanging process, then you attach to it and watch the trace of calls it does (or observe the last few system calls if the process is hanging waiting for something, as when blocking on a file-lock request). From watching the trace you figure out what it's actually doing, and probably find the corresponding lines in your Perl code. For example, let's see how one process hangs while requesting an exclusive lock on a file that is exclusively locked by another process (see Example 21-12).

Example 21-12. excl_lock.pl

use Fcntl qw(:flock);
use Symbol;

fork( ); # child and parent do the same operation

my $fh = gensym;
open $fh, ">/tmp/lock" or die "cannot open /tmp/lock: $!";
print "$$: I'm going to obtain the lock\n";
flock $fh, LOCK_EX;
print "$$: I've got the lock\n";
sleep 30;
close $fh;

The code is simple. The process executing the code forks a second process, and both do the same thing: generate a unique symbol to be used as a file handle, open the lock file for writing using the generated symbol, lock the file in exclusive mode, sleep for 30 seconds (pretending to do some lengthy operation), and close the lock file, which also unlocks the file.

The gensym function is imported from the Symbol module. The Fcntl module provides us with a symbolic constant, LOCK_EX. This is imported via the :flock tag, which imports this and other flock( ) constants.

The code used by both processes is identical, so we cannot predict which one will get its hands on the lock file and succeed in locking it first. Thus, we add print( )statements to find the PID of the process blocking (waiting to get the lock) on a lock request.

When the above code is executed from the command line, we see that one of the processes gets the lock:

panic% perl ./excl_lock.pl

3038: I'm going to obtain the lock
3038: I've got the lock
3037: I'm going to obtain the lock

Here we see that process 3037 is blocking, so we attach to it:

panic% strace -p 3037

about to attach c10
flock(3, LOCK_EX

It's clear from the above trace that the process is waiting for an exclusive lock. The missing closing parenthesis is not a typo; it means that strace didn't yet receive a return status from the call.

After spending time watching the running traces of different scripts, you will learn to more easily recognize what Perl code is being executed.