Since many web application developers are interested in the content delivery phase and come from a CGI background, mod_perl includes packages designed to make the transition from CGI simple and painless. Apache::PerlRun and Apache::Registry run unmodified CGI scripts, albeit much faster than mod_cgi.[10]

[10]Apache::RegistryNG and Apache::RegistryBB are two new experimental modules that you may want to try as well.

The difference between Apache::Registry and Apache::PerlRun is that Apache::Registry caches all scripts, and Apache::PerlRun doesn't. To understand why this matters, remember that if one of mod_perl's benefits is added speed, another is persistence. Just as the Perl interpreter is loaded only once, at child process startup, your scripts are loaded and compiled only once, when they are first used. This can be a double-edged sword: persistence means global variables aren't reset to initial values, and file and database handles aren't closed when the script ends. This can wreak havoc in badly written CGI scripts.

Whether you should use Apache::Registry or Apache::PerlRun for your CGI scripts depends on how well written your existing Perl scripts are. Some scripts initialize all variables, close all file handles, use taint mode, and give only polite error messages. Others don't.

Apache::Registry compiles scripts on first use and keeps the compiled scripts in memory. On subsequent requests, all the needed code (the script and the modules it uses) is already compiled and loaded in memory. This gives you enormous performance benefits, but it requires that scripts be well behaved.

Apache::PerlRun, on the other hand, compiles scripts at each request. The script's namespace is flushed and is fresh at the start of every request. This allows scripts to enjoy the basic benefit of mod_perl (i.e., not having to load the Perl interpreter) without requiring poorly written scripts to be rewritten.

A typical problem some developers encounter when porting from mod_cgi to Apache::Registry is the use of uninitialized global variables. Consider the following script:

use CGI;
$q = CGI->new( );
$topsecret = 1 if $q->param("secret") eq 'Muahaha';
# ...
if ($topsecret) {
    display_topsecret_data( );
}
else {
    security_alert( );
}

This script will always do the right thing under mod_cgi: if secret=Muahaha is supplied, the top-secret data will be displayed via display_topsecret_data( ), and if the authentication fails, the security_alert( ) function will be called. This works only because under mod_cgi, all globals are undefined at the beginning of each request.

Under Apache::Registry, however, global variables preserve their values between requests. Now imagine a situation where someone has successfully authenticated, setting the global variable $topsecret to a true value. From now on, anyone can access the top-secret data without knowing the secret phrase, because $topsecret will stay true until the process dies or is modified elsewhere in the code.

This is an example of sloppy code. It will do the right thing under Apache::PerlRun, since all global variables are undefined before each iteration of the script. However, under Apache::Registry and mod_perl handlers, all global variables must be initialized before they can be used.

The example can be fixed in a few ways. It's a good idea to always use the strict mode, which requires the global variables to be declared before they are used:

use strict;
use CGI;
use vars qw($top $q);
# init globals
$top = 0;
$q = undef;
# code
$q = CGI->new( );
$topsecret = 1 if $q->param("secret") eq 'Muahaha';
# ...

But of course, the simplest solution is to avoid using globals where possible. Let's look at the example rewritten without globals:

use strict;
use CGI;
my $q = CGI->new( );
my $topsecret = $q->param("secret") eq 'Muahaha' ? 1 : 0;
# ...

The last two versions of the example will run perfectly under Apache::Registry.

Here is another example that won't work correctly under Apache::Registry. This example presents a simple search engine script:

use CGI;
my $q = CGI->new( );
print $q->header('text/plain');
my @data = read_data( )
my $pat = $q->param("keyword");
foreach (@data) {
    print if /$pat/o;
}

The example retrieves some data using read_data( ) (e.g., lines in the text file), tries to match the keyword submitted by a user against this data, and prints the matching lines. The /o regular expression modifier is used to compile the regular expression only once, to speed up the matches. Without it, the regular expression will be recompiled as many times as the size of the @data array.

Now consider that someone is using this script to search for something inappropriate. Under Apache::Registry, the pattern will be cached and won't be recompiled in subsequent requests, meaning that the next person using this script (running in the same process) may receive something quite unexpected as a result. Oops.

The proper solution to this problem is discussed in Chapter 6, but Apache::PerlRun provides an immediate workaround, since it resets the regular expression cache before each request.

So why bother to keep your code clean? Why not use Apache::PerlRun all the time? As we mentioned earlier, the convenience provided by Apache::PerlRun comes at a price of performance deterioration.

In Chapter 9, we show in detail how to benchmark the code and server configuration. Based on the results of the benchmark, you can tune the service for the best performance. For now, let's just show the benchmark of the short script in Example 1-6.

Example 1-6. readdir.pl

use strict;

use CGI ( );
use IO::Dir ( );

my $q = CGI->new;
print $q->header("text/plain");
my $dir = IO::Dir->new(".");
print join "\n", $dir->read;

The script loads two modules (CGI and IO::Dir), prints the HTTP header, and prints the contents of the current directory. If we compare the performance of this script under mod_cgi, Apache::Registry, and Apache::PerlRun, we get the following results:

  Mode          Requests/sec
-------------------------------
  Apache::Registry       473
  Apache::PerlRun        289
  mod_cgi                 10

Because the script does very little, the performance differences between the three modes are very significant. Apache::Registry thoroughly outperforms mod_cgi, and you can see that Apache::PerlRun is much faster than mod_cgi, although it is still about twice as slow as Apache::Registry. The performance gap usually shrinks a bit as more code is added, as the overhead of fork( ) and code compilation becomes less significant compared to execution times. But the benchmark results won't change significantly.

Jumping ahead, if we convert the script in Example 1-6 into a mod_perl handler, we can reach 517 requests per second under the same conditions, which is a bit faster than Apache::Registry. In Chapter 13, we discuss why running the code under the Apache::Registry handler is a bit slower than using a pure mod_perl content handler.

It can easily be seen from this benchmark that Apache::Registry is what you should use for your scripts to get the most out of mod_perl. But Apache::PerlRun is still quite useful for making an easy transition to mod_perl. With Apache::PerlRun, you can get a significant performance improvement over mod_cgi with minimal effort.

Later, we will see that Apache::Registry's caching mechanism is implemented by compiling each script in its own namespace. Apache::Registry builds a unique package name using the script's name, the current URI, and the current virtual host (if any). Apache::Registry prepends a packagestatement to your script, then compiles it using Perl's eval function. In Chapter 6, we will show how exactly this is done.

What happens if you modify the script's file after it has been compiled and cached? Apache::Registry checks the file's last-modification time, and if the file has changed since the last compile, it is reloaded and recompiled.

In case of a compilation or execution error, the error is logged to the server's error log, and a server error is returned to the client.