14.2.6. Memory Leakage

It's normal for a process to grow when it processes its first few requests. They may be different requests, or the same requests processing different data. You may try to reload the same request a few times, and in many cases the process will stop growing after only the second reload. In any case, once a representative selection of requests and inputs has been executed by a process, it won't usually grow any more unless the code leaks memory. If it grows after each reload of an identical request, there is probably a memory leak.

The experience might be different if the code works with some external resource that can change between requests. For example, if the code retrieves database records matching some query, it's possible that from time to time the database will be updated and that a different number of records will match the same query the next time it is issued. Depending on the techniques you use to retrieve the data, format it, and send it to the user, the process may increase or decrease in size, reflecting the changes in the data.

The easiest way to see whether the code is leaking is to run the server in single-process mode (httpd -X), issuing the same request a few times to see whether the process grows after each request. If it does, you probably have a memory leak. If the code leaks 5 KB per request, then after 1,000 requests to run the leaking code, 5 MB of memory will have leaked. If in production you have 20 processes, this could possibly lead to 100 MB of leakage after a few tens of thousands of requests.

This technique to detect leakage can be misleading if you are not careful. Suppose your process first runs some clean (non-leaking) code that acquires 100 KB of memory. In an attempt to make itself more efficient, Perl doesn't give the 100 KB of memory back to the operating system. The next time the process runs any script, some of the 100 KB will be reused. But if this time the process runs a script that needs to acquire only 5 KB, you won't see the process grow even if the code has actually leaked these 5 KB. Now it might take 20 or more requests for the leaking script served by the same process before you would see that process start growing again.

A process may leak memory for several reasons: badly written system C/C++ libraries used in the httpd binary and badly written Perl code are the most common. Perl modules may also use C libraries, and these might leak memory as well. Also, some operating systems have been known to have problems with their memory-management functions.

If you know that you have no leaks in your code, then for detecting leaks in C/C++ libraries you should either use the technique of sampling the memory usage described above, or use C/C++ developer tools designed for this purpose. This topic is beyond the scope of this book.

The Apache::Leak module (derived from Devel::Leak) might help you to detect leaks in your code. Consider the script in Example 14-3.

Example 14-3. leaktest.pl

use Apache::Leak;

my $global = "FooA";

leak_test {
    $$global = 1;
    ++$global;
};

You do not need to be inside mod_perl to use this script. The argument to leak_test( ) is an anonymous sub or a block, so you can just throw in any code you suspect might be leaking. The script will run the code twice. The first time, new scalar values (SVs) are created, but this does not mean the code is leaking. The second pass will give better evidence.

From the command line, the above script outputs:

ENTER: 1482 SVs
new c28b8 : new c2918 : 
LEAVE: 1484 SVs
ENTER: 1484 SVs
new db690 : new db6a8 : 
LEAVE: 1486 SVs
!!! 2 SVs leaked !!!

This module uses the simple approach of walking the Perl internal table of allocated SVs. It records them before entering the scope of the code under test and after leaving the scope. At the end, a comparison of the two sets is performed, sv_dump( ) is called for anything that did not exist in the first set, and the difference in counts is reported. Note that you will see the dumps of SVs only if Perl was built with the -DDEBUGGING option. In our example the script will dump two SVs twice, since the same code is run twice. The volume of output is too great to be presented here.

Our example leaks because $$global = 1; creates a new global variable, FooA (with the value of 1), which will not be destroyed until this module is destroyed. Under mod_perl the module doesn't get destroyed until the process quits. When the code is run the second time, $global will contain FooB because of the increment operation at the end of the first run. Consider:

$foo = "AAA";
print "$foo\n";
$foo++;
print "$foo\n";

which prints:

AAA
AAB

So every time the code is executed, a new variable (FooC, FooD, etc.) will spring into existence.

Apache::Leak is not very user-friendly. You may want to take a look at B::LexInfo. It is possible to see something that might appear to be a leak, but is actually just a Perl optimization. Consider this code, for example:

sub test { my ($string) = @_;}
test("a string");

B::LexInfo will show you that Perl does not release the value from $string unless you undef( ) it. This is because Perl anticipates that the memory will be needed for another string, the next time the subroutine is entered. You'll see similar behavior for @array lengths, %hash keys, and scratch areas of the padlist for operations such as join( ), ., etc.

Let's look at how B::LexInfo works. The code in Example 14-4 creates a new B::LexInfo object, then runs cvrundiff( ), which creates two snapshots of the lexical variables' padlists—one before the call to LeakTest1::test( ) and the other, in this case, after it has been called with the argument "a string". Then it calls diff -u to generate the difference between the snapshots.

Example 14-4. leaktest1.pl

package LeakTest1;
use B::LexInfo ( );

sub test { my ($string) = @_;}

my $lexi = B::LexInfo->new;
my $diff = $lexi->cvrundiff('LeakTest1::test', "a string");
print $$diff;

In case you aren't familiar with how diff works, - at the beginning of the line means that that line was removed, + means that a line was added, and other lines are there to show the context in which the difference was found. Here is the output:

--- /tmp/B_LexInfo_3099.before        Tue Feb 13 20:09:52 2001
+++ /tmp/B_LexInfo_3099.after        Tue Feb 13 20:09:52 2001
@@ -2,9 +2,11 @@
   {
     'LeakTest1::test' => {
       '$string' => {
-        'TYPE' => 'NULL',
+        'TYPE' => 'PV',
+        'LEN' => 9,
         'ADDRESS' => '0x8146d80',
-        'NULL' => '0x8146d80'
+        'PV' => 'a string',
+        'CUR' => 8
       },
       '_ _SPECIAL_ _1' => {
         'TYPE' => 'NULL',

Perl tries to optimize the speed by keeping the memory allocated for $string, even after the variable is destroyed.

Let's run the script from Example 14-5).

Example 14-5. leaktest2.pl

package LeakTest2;
use B::LexInfo ( );

my $global = "FooA";

sub test {
    $$global = 1;
    ++$global;
}

my $lexi = B::LexInfo->new;
my $diff = $lexi->cvrundiff('LeakTest2::test');
print $$diff;

Here's the result:

--- /tmp/B_LexInfo_3103.before Tue Feb 13 20:12:04 2001
+++ /tmp/B_LexInfo_3103.after         Tue Feb 13 20:12:04 2001
@@ -5,7 +5,7 @@
         'TYPE' => 'PV',
         'LEN' => 5,
         'ADDRESS' => '0x80572ec',
-        'PV' => 'FooA',
+        'PV' => 'FooB',
         'CUR' => 4
       }
     }

We can clearly see the leakage, since the value of the PV entry has changed from one string to a different one. Compare this with the previous example, where a variable didn't exist and sprang into existence for optimization reasons. If you find this confusing, probably the best approach is to run diff twice when you test your code.

Now let's run the cvrundiff( ) function on this example, as shown in Example 14-6.

Example 14-6. leaktest3.pl

package LeakTest2;
use B::LexInfo ( );

my $global = "FooA";

sub test {
    $$global = 1;
    ++$global;
}

my $lexi = B::LexInfo->new;
my $diff = $lexi->cvrundiff('LeakTest2::test');
$diff    = $lexi->cvrundiff('LeakTest2::test');
print $$diff;

Here's the output:

--- /tmp/B_LexInfo_3103.before Tue Feb 13 20:12:04 2001
+++ /tmp/B_LexInfo_3103.after         Tue Feb 13 20:12:04 2001
@@ -5,7 +5,7 @@
         'TYPE' => 'PV',
         'LEN' => 5,
         'ADDRESS' => '0x80572ec',
-        'PV' => 'FooB',
+        'PV' => 'FooC',
         'CUR' => 4
       }
     }

We can see the leak again, since the value of PV has changed again, from FooB to FooC. Now let's run cvrundiff( ) on the second example script, as shown in Example 14-7.

Example 14-7. leaktest4.pl

package LeakTest1;
use B::LexInfo ( );

sub test { my ($string) = @_;}

my $lexi = B::LexInfo->new;
my $diff = $lexi->cvrundiff('LeakTest1::test', "a string");
   $diff = $lexi->cvrundiff('LeakTest1::test', "a string");
print $$diff;

No output is produced, since there is no difference between the second and third runs. All the data structures are allocated during the first execution, so we are sure that no memory is leaking here.

Apache::Status includes a StatusLexInfo option that can show you the internals of your code via B::LexInfo. See Chapter 21 for more information.

Continue to:

Written by
Eric Cholet (Logilune) and
Stas Bekman (StasoSphere & Free Books).