6.2.1. The First Mystery: Why Does the Script Go Beyond 5?

If we look at the error_log file (we did enable warnings), we'll see something like this:

Variable "$counter" will not stay shared 
at /home/httpd/perl/counter.pl line 13.

This warning is generated when a script contains a named (as opposed to an anonymous) nested subroutine that refers to a lexically scoped (with my( )) variable defined outside this nested subroutine.

Do you see a nested named subroutine in our script? We don't! What's going on? Maybe it's a bug in Perl? But wait, maybe the Perl interpreter sees the script in a different way! Maybe the code goes through some changes before it actually gets executed? The easiest way to check what's actually happening is to run the script with a debugger.

Since we must debug the script when it's being executed by the web server, a normal debugger won't help, because the debugger has to be invoked from within the web server. Fortunately, we can use Doug MacEachern's Apache::DB module to debug our script. While Apache::DB allows us to debug the code interactively (as we will show in Chapter 21), we will use it noninteractively in this example.

To enable the debugger, modify the httpd.conf file in the following way:

PerlSetEnv PERLDB_OPTS "NonStop=1 LineInfo=/tmp/db.out AutoTrace=1 frame=2"
PerlModule Apache::DB
<Location /perl>
    PerlFixupHandler Apache::DB
    SetHandler perl-script
    PerlHandler Apache::Registry
    Options ExecCGI
    PerlSendHeader On
</Location>

We have added a debugger configuration setting using the PERLDB_OPTS environment variable, which has the same effect as calling the debugger from the command line. We have also loaded and enabled Apache::DB as a PerlFixupHandler.

In addition, we'll load the Carp module, using <Perl>sections (this could also be done in the startup.pl file):

<Perl>
    use Carp;
</Perl>

After applying the changes, we restart the server and issue a request to /perl/counter.pl, as before. On the surface, nothing has changed; we still see the same output as before. But two things have happened in the background:

The file /tmp/db.out was written, with a complete trace of the code that was executed.
Since we have loaded the Carp module, the error_log file now contains the real code that was actually executed. This is produced as a side effect of reporting the "Variable "$counter" will not stay shared at..." warning that we saw earlier.

Here is the code that was actually executed:

package Apache::ROOT::perl::counter_2epl;
use Apache qw(exit);
sub handler {
    BEGIN {
        $^W = 1;
    };
    $^W = 1;

    use strict;

    print "Content-type: text/plain\n\n";

    my $counter = 0;

    for (1..5) {
        increment_counter( );
    }

    sub increment_counter {
        $counter++;
        print "Counter is equal to $counter !\n";
    }
}

Note that the code in error_log wasn't indented—we've indented it to make it obvious that the code was wrapped inside the handler( )subroutine.

From looking at this code, we learn that every Apache::Registryscript is cached under a package whose name is formed from the Apache::ROOT:: prefix and the script's URI (/perl/counter.pl) by replacing all occurrences of / with :: and . with _2e. That's how mod_perl knows which script should be fetched from the cache on each request—each script is transformed into a package with a unique name and with a single subroutine named handler( ), which includes all the code that was originally in the script.

Essentially, what's happened is that because increment_counter( ) is a subroutine that refers to a lexical variable defined outside of its scope, it has become a closure. Closures don't normally trigger warnings, but in this case we have a nested subroutine. That means that the first time the enclosing subroutine handler( ) is called, both subroutines are referring to the same variable, but after that, increment_counter( ) will keep its own copy of $counter (which is why $counter is not shared) and increment its own copy. Because of this, the value of $counter keeps increasing and is never reset to 0.

If we were to use the diagnostics pragma in the script, which by default turns terse warnings into verbose warnings, we would see a reference to an inner (nested) subroutine in the text of the warning. By observing the code that gets executed, it is clear that increment_counter( ) is a named nested subroutine since it gets defined inside the handler( )subroutine.

Any subroutine defined in the body of the script executed under Apache::Registry becomes a nested subroutine. If the code is placed into a library or a module that the script require( )s or use( )s, this effect doesn't occur.

For example, if we move the code from the script into the subroutine run( ), place the subroutines in the mylib.pl file, save it in the same directory as the script itself, and require( ) it, there will be no problem at all.[29] Examples Example 6-2 show how we spread the code across the two files.

[29]Don't forget the 1; at the end of the library, or the require( ) call might fail.

Example 6-1. mylib.pl

my $counter;
sub run {
    $counter = 0;
    for (1..5) {
        increment_counter( );
    }
}
sub increment_counter {
    $counter++;
    print "Counter is equal to $counter !\n";
}
1;

Example 6-2. counter.pl

use strict;
require "./mylib.pl";
print "Content-type: text/plain\n\n";
run( );

This solution is the easiest and fastest way to solve the nested subroutine problem. All you have to do is to move the code into a separate file, by first wrapping the initial code into some function that you later call from the script, and keeping the lexically scoped variables that could cause the problem out of this function.

As a general rule, it's best to put all the code in external libraries (unless the script is very short) and have only a few lines of code in the main script. Usually the main script simply calls the main function in the library, which is often called init( ) or run( ). This way, you don't have to worry about the effects of named nested subroutines.

As we will show later in this chapter, however, this quick solution might be problematic on a different front. If you have many scripts, you might try to move more than one script's code into a file with a similar filename, like mylib.pl. A much cleaner solution would be to spend a little bit more time on the porting process and use a fully qualified package, as in Examples Example 6-4.

Example 6-3. Book/Counter.pm

package Book::Counter;

my $counter = 0;

sub run {
    $counter = 0;
    for (1..5) {
        increment_counter( );
    }
}

sub increment_counter {
    $counter++;
    print "Counter is equal to $counter !<BR>\n";
}

1;
_  _END_  _

Example 6-4. counter-clean.pl

use strict;
use Book::Counter;

print "Content-type: text/plain\n\n";
Book::Counter::run( );

As you can see, the only difference is in the package declaration. As long as the package name is unique, you won't encounter any collisions with other scripts running on the same server.

Another solution to this problem is to change the lexical variables to global variables. There are two ways global variables can be used:

Using the vars pragma. With the use strict 'vars' setting, global variables can be used after being declared with vars. For example, this code:
```
use strict;
use vars qw($counter $result);
# later in the code
$counter = 0;
$result  = 1;
```
is similar to this code if use strict is not used:
```
$counter = 0;
$result  = 1;
```
However, the former style of coding is much cleaner, because it allows you to use global variables by declaring them, while avoiding the problem of misspelled variables being treated as undeclared globals.

The only drawback to using vars is that each global declared with it consumes more memory than the undeclared but fully qualified globals, as we will see in the next item.
Using fully qualified variables. Instead of using $counter, we can use $Foo::counter, which will place the global variable $counter into the package Foo. Note that we don't know which package name Apache::Registry will assign to the script, since it depends on the location from which the script will be called. Remember that globals must always be initialized before they can be used.

Perl 5.6.x also introduces a third way, with the our( ) declaration. our( ) can be used in different scopes, similar to my( ), but it creates global variables.

Finally, it's possible to avoid this problem altogether by always passing the variables as arguments to the functions (see Example 6-5).

Example 6-5. counter2.pl

#!/usr/bin/perl -w
use strict;

print "Content-type: text/plain\n\n";

my $counter = 0;

for (1..5) {
    $counter = increment_counter($counter);
}

sub increment_counter {
    my $counter = shift;

    $counter++;
    print "Counter is equal to $counter !\n";

    return $counter;
}

In this case, there is no variable-sharing problem. The drawback is that this approach adds the overhead of passing and returning the variable from the function. But on the other hand, it ensures that your code is doing the right thing and is not dependent on whether the functions are wrapped in other blocks, which is the case with the Apache::Registry handlers family.

When Stas (one of the authors of this book) had just started using mod_perl and wasn't aware of the nested subroutine problem, he happened to write a pretty complicated registration program that was run under mod_perl. We will reproduce here only the interesting part of that script:

use CGI;
$q = CGI->new;
my $name = $q->param('name');
print_response( );

sub print_response {
    print "Content-type: text/plain\n\n";
    print "Thank you, $name!";
}

Stas and his boss checked the program on the development server and it worked fine, so they decided to put it in production. Everything seemed to be normal, but the boss decided to keep on checking the program by submitting variations of his profile using The Boss as his username. Imagine his surprise when, after a few successful submissions, he saw the response "Thank you, Stas!" instead of "Thank you, The Boss!"

After investigating the problem, they learned that they had been hit by the nested subroutine problem. Why didn't they notice this when they were trying the software on their development server? We'll explain shortly.

To conclude this first mystery, remember to keep the warnings mode On on the development server and to watch the error_log file for warnings.

Continue to:

Written by
Eric Cholet (Logilune) and
Stas Bekman (StasoSphere & Free Books).