Perl IO:: modules are very convenient, but let's see what it costs to use them. The following command (Perl 5.6.1 on Linux) reveals that when we use IO we also load the IO::Handle, IO::Seekable, IO::File, IO::Pipe, IO::Socket, and IO::Dir modules. The command also shows us how big they are in terms of code lines. wc(1) reports how many lines of code are in each of the loaded files:

panic% wc -l `perl -MIO -e 'print join("\n", sort values %INC, "")'`
  124 /usr/lib/perl5/5.6.1/Carp.pm
  602 /usr/lib/perl5/5.6.1/Class/Struct.pm
  456 /usr/lib/perl5/5.6.1/Cwd.pm
  313 /usr/lib/perl5/5.6.1/Exporter.pm
  225 /usr/lib/perl5/5.6.1/Exporter/Heavy.pm
   93 /usr/lib/perl5/5.6.1/File/Spec.pm
  458 /usr/lib/perl5/5.6.1/File/Spec/Unix.pm
  115 /usr/lib/perl5/5.6.1/File/stat.pm
  414 /usr/lib/perl5/5.6.1/IO/Socket/INET.pm
  143 /usr/lib/perl5/5.6.1/IO/Socket/UNIX.pm
   52 /usr/lib/perl5/5.6.1/SelectSaver.pm
  146 /usr/lib/perl5/5.6.1/Symbol.pm
  160 /usr/lib/perl5/5.6.1/Tie/Hash.pm
   92 /usr/lib/perl5/5.6.1/base.pm
 7525 /usr/lib/perl5/5.6.1/i386-linux/Config.pm
  276 /usr/lib/perl5/5.6.1/i386-linux/Errno.pm
  222 /usr/lib/perl5/5.6.1/i386-linux/Fcntl.pm
   47 /usr/lib/perl5/5.6.1/i386-linux/IO.pm
  239 /usr/lib/perl5/5.6.1/i386-linux/IO/Dir.pm
  169 /usr/lib/perl5/5.6.1/i386-linux/IO/File.pm
  612 /usr/lib/perl5/5.6.1/i386-linux/IO/Handle.pm
  252 /usr/lib/perl5/5.6.1/i386-linux/IO/Pipe.pm
  127 /usr/lib/perl5/5.6.1/i386-linux/IO/Seekable.pm
  428 /usr/lib/perl5/5.6.1/i386-linux/IO/Socket.pm
  453 /usr/lib/perl5/5.6.1/i386-linux/Socket.pm
  129 /usr/lib/perl5/5.6.1/i386-linux/XSLoader.pm
  117 /usr/lib/perl5/5.6.1/strict.pm
   83 /usr/lib/perl5/5.6.1/vars.pm
  419 /usr/lib/perl5/5.6.1/warnings.pm
   38 /usr/lib/perl5/5.6.1/warnings/register.pm
14529 total

About 14,500 lines of code! If you run a trace of this test code, you will see that it also puts a big load on the machine to actually load these modules, although this is mostly irrelevant if you preload the modules at server startup.

CGI.pmsuffers from the same problem:

panic% wc -l `perl -MCGI -le 'print for values %INC'`
  313 /usr/lib/perl5/5.6.1/Exporter.pm
  124 /usr/lib/perl5/5.6.1/Carp.pm
  117 /usr/lib/perl5/5.6.1/strict.pm
   83 /usr/lib/perl5/5.6.1/vars.pm
   38 /usr/lib/perl5/5.6.1/warnings/register.pm
  419 /usr/lib/perl5/5.6.1/warnings.pm
  225 /usr/lib/perl5/5.6.1/Exporter/Heavy.pm
 1422 /usr/lib/perl5/5.6.1/overload.pm
  303 /usr/lib/perl5/5.6.1/CGI/Util.pm
 6695 /usr/lib/perl5/5.6.1/CGI.pm
  278 /usr/lib/perl5/5.6.1/constant.pm
10017 total

However, judging the bloat by the number of lines is misleading, since not all the code is used in most cases. Also remember that documentation might account for a significant chunk of the lines in every module.

Since we can preload the code at server startup, we are mostly interested in the execution overhead and memory footprint. So let's look at the memory usage.

Example 13-12 is the perlbloat.pl script, which shows how much memory is acquired by Perl when you run some code. Now we can easily test the overhead of loading the modules in question.

Example 13-12. perlbloat.pl

#!/usr/bin/perl -w

use GTop ( );

my $gtop = GTop->new;
my $before = $gtop->proc_mem($$)->size;

for (@ARGV) {
    if (eval "require $_") {
        eval { $_->import; };
    }
    else {
        eval $_;
        die $@ if $@;
    }
}

my $after = $gtop->proc_mem($$)->size;
print "@ARGV added " . GTop::size_string($after - $before) . "\n";

The script simply samples the total memory use, then evaluates the code passed to it, samples the memory again, and prints the difference.

Now let's try to load IO:

panic% ./perlbloat.pl 'use IO;'
use IO; added  1.3M

"Only" 1.3 MB of overhead. Now let's load CGI.pm (v2.79) and compile its methods:

panic% ./perlbloat.pl 'use CGI; CGI->compile(":cgi")'
use CGI; CGI->compile(":cgi") added 784k

That's almost 1 MB of extra memory per process.

Let's compare CGI.pm with its younger sibling, whose internals are implemented in C:

%. /perlbloat.pl 'use Apache::Request'
use Apache::Request added   36k

Only 36 KB this time. A significant difference, isn't it? We have compiled the :cgi group of the CGI.pm methods, because CGI.pm is written in such a way that the actual code compilation is deferred until some function is actually used. To make a fair comparison with Apache::Request, we compiled only the methods present in both.

If we compile :all CGI.pm methods, the memory bloat is much bigger:

panic% ./perlbloat.pl 'use CGI; CGI->compile(":all")'
use CGI; CGI->compile(":all") added  1.9M

The following numbers show memory sizes in KB (virtual and resident) for Perl 5.6.0 on four different operating systems. Three calls are made: without any modules, with only -MCGI, and with -MIO (never with both). The rows with -MCGI and -MIO are followed by the difference relative to raw Perl.

  OpenBSD      FreeBSD       RedHat         Linux        Solaris
              vsz   rss     vsz   rss     vsz   rss    vsz    rss
  Raw Perl    736   772     832  1208    2412   980    2928  2272

  w/ CGI     1220  1464    1308  1828    2972  1768    3616  3232
  delta      +484  +692    +476  +620    +560  +788    +688  +960

  w/ IO      2292  2580    2456  3016    4080  2868    5384  4976
  delta     +1556 +1808   +1624 +1808   +1668 +1888   +2456 +2704

Which is more important: saving enough memory to allow the machine to serve a few extra concurrent clients, or using off-the-shelf modules that are proven and well understood? Debugging a reinvention of the wheel can cost a lot of development time, especially if each member of your team reinvents in a different way. In general, it is a lot cheaper to buy more memory or a bigger machine than it is to hire an extra programmer. So while it may be wise to avoid using a bloated module if you need only a few functions that you could easily code yourself, the place to look for real efficiency savings is in how you write your code.