The Devel::SmallProf profiler is focused on the time taken for a program run on a line-by-line basis. It is called "small" because it's supposed to impose very little extra load on the machine (speed- and memory-wise) while profiling the code.
Let's take a look at the simple example shown in Example 9-9.
for (1..1000) { my @rows = ( ); push @rows, Tr( map { td($_) } 'a'..'d' ); push @rows, Tr( map { td($_) } 'e'..'h' ); my $var = table(@rows); } sub table { my @rows = @_; return "<table>\n@rows</table>\n";} sub Tr { my @cells = @_; return "<tr>@cells</tr>\n"; } sub td { my $cell = shift; return "<td>$cell</td>"; }
It creates the same HTML table in $var, with the cells of the table filled with single letters. The functions table( ), Tr( ), and td( ) insert the data into appropriate HTML tags. Notice that we have used Tr( ) and not tr( ), since the latter is a built-in Perl function, and we have used the same function name as in CGI.pm that does the same thing. If we print $var we will see something like this:
<table> <tr><td>a</td> <td>b</td> <td>c</td> <td>d</td></tr> <tr><td>e</td> <td>f</td> <td>g</td> <td>h</td></tr> </table>
We have looped a thousand times through the same code in order to get a more precise speed measurement. If the code runs very quickly we won't be able to get any meaningful results from just one loop.
If we run this code with Devel::SmallProf:
panic% perl -d:SmallProf table_gen.pl
we get the following output in the autogenerated smallprof.out file:
count wall tm cpu time line 1001 0.003855 0.030000 1: for (1..1000) { 1000 0.004823 0.040000 2: my @rows = ( ); 5000 0.272651 0.410000 3: push @rows, Tr( map { td($_) } 5000 0.267107 0.360000 4: push @rows, Tr( map { td($_) } 1000 0.067115 0.120000 5: my $var = table(@rows); 0 0.000000 0.000000 6: } 3000 0.033798 0.080000 7: sub table { my @rows = @_; return 6000 0.078491 0.120000 8: sub Tr { my @cells = @_; return 24000 0.267353 0.490000 9: sub td { my $cell = shift; return 0 0.000000 0.000000 10:
We can see in the CPU time column that Perl spends most of its time in the td( ) function; it's also the code that's visited by Perl the most times. In this example we could find this out ourselves without much effort, but if the code is longer it will be harder to find the lines of code where Perl spends most of its time. So we sort the output by the third column as a numerical value, in descending order:
panic% sort -k 3nr,3 smallprof.out | less 24000 0.267353 0.490000 9: sub td { my $cell = shift; return 5000 0.272651 0.410000 3: push @rows, Tr( map { td($_) } 5000 0.267107 0.360000 4: push @rows, Tr( map { td($_) } 1000 0.067115 0.120000 5: my $var = table(@rows); 6000 0.078491 0.120000 8: sub Tr { my @cells = @_; return 3000 0.033798 0.080000 7: sub table { my @rows = @_; return 1000 0.004823 0.040000 2: my @rows = ( ); 1001 0.003855 0.030000 1: for (1..1000) {
According to the Devel::SmallProf manpage, the wallclock's measurements are fairly accurate (we suppose that they're correct on an unloaded machine), but CPU clock time is always more accurate. That's because if it takes more than one CPU time slice for a directive to complete, the time that some other process uses CPU is counted in the wallclock counts. Since the load on the same machine may vary greatly from moment to moment, it's possible that if we rerun the same test a few times we will get inconsistent results.
Let's try to improve the td( ) function and at the same time the Tr( ) and table( ) functions. We will not copy the passed arguments, but we will use them directly in all three functions. Example 9-10 shows the new version of our script.
for (1..1000) { my @rows = ( ); push @rows, Tr( map { td($_) } 'a'..'d' ); push @rows, Tr( map { td($_) } 'e'..'h' ); my $var = table(@rows); } sub table { return "<table>\n@_</table>\n";} sub Tr { return "<tr>@_</tr>\n"; } sub td { return "<td>@_</td>"; }
Now let's rerun the code with the profiler:
panic% perl -d:SmallProf table_gen2.pl
The results are much better now—only 0.34 CPU clocks are spent in td( ), versus 0.49 in the earlier run:
panic% sort -k 3nr,3 smallprof.out | less 5000 0.279138 0.400000 4: push @rows, Tr( map { td($_) } 16000 0.241350 0.340000 9: sub td { return "<td>@_</td>"; } 5000 0.269940 0.320000 3: push @rows, Tr( map { td($_) } 4000 0.050050 0.130000 8: sub Tr { return "<tr>@_</tr>\n"; } 1000 0.065324 0.080000 5: my $var = table(@rows); 1000 0.006650 0.010000 2: my @rows = ( ); 2000 0.020314 0.030000 7: sub table{ return "<table>\n@_</table>\n";} 1001 0.006165 0.030000 1: for (1..1000) {
You will also notice that Devel::SmallProf reports that the functions were executed different numbers of times in the two runs. That's because in our original example all three functions had two statements on each line, but in the improved version they each had only one. Devel::SmallProf looks at the code after it's been parsed and optimized by Perl—thus, if some optimizations took place, it might not be exactly the same as the code that you wrote.
In most cases you will probably find Devel::DProf more useful than Devel::SmallProf, as it allows you to analyze the code by subroutine and not by line.
Just as there is the Apache::DProf equivalent for Devel::DProf, there is the Apache::SmallProf equivalent for Devel::SmallProf. It uses a configuration similar to Apache::DProf—i.e., it is registered as a PerlFixupHandler—but it also requires Apache::DB. Therefore, to use it you should add the following configuration to httpd.conf:
<Perl> if (Apache->define('PERLSMALLPROF')) { require Apache::DB; Apache::DB->init; } </Perl> <Location /perl> SetHandler perl-script PerlHandler Apache::Registry <IfDefine PERLSMALLPROF> PerlFixupHandler Apache::SmallProf </IfDefine> </Location>
Now start the server:
panic% httpd -X -DPERLSMALLPROF &
This will activate Apache::SmallProf::handler during the request. As a result, the profile files will be written to the $ServerRoot/logs/smallprof/ directory. Unlike with Devel::SmallProf, the profile is split into several files based on package name. For example, if CGI.pm was used, one of the generated profile files will be called CGI.pm.prof.
 
Continue to: