Perl IO:: modules are very convenient, but let's see what it costs to use them. The following command (Perl 5.6.1 on Linux) reveals that when we use IO we also load the IO::Handle, IO::Seekable, IO::File, IO::Pipe, IO::Socket, and IO::Dir modules. The command also shows us how big they are in terms of code lines. wc(1) reports how many lines of code are in each of the loaded files:
panic% wc -l `perl -MIO -e 'print join("\n", sort values %INC, "")'` 124 /usr/lib/perl5/5.6.1/Carp.pm 602 /usr/lib/perl5/5.6.1/Class/Struct.pm 456 /usr/lib/perl5/5.6.1/Cwd.pm 313 /usr/lib/perl5/5.6.1/Exporter.pm 225 /usr/lib/perl5/5.6.1/Exporter/Heavy.pm 93 /usr/lib/perl5/5.6.1/File/Spec.pm 458 /usr/lib/perl5/5.6.1/File/Spec/Unix.pm 115 /usr/lib/perl5/5.6.1/File/stat.pm 414 /usr/lib/perl5/5.6.1/IO/Socket/INET.pm 143 /usr/lib/perl5/5.6.1/IO/Socket/UNIX.pm 52 /usr/lib/perl5/5.6.1/SelectSaver.pm 146 /usr/lib/perl5/5.6.1/Symbol.pm 160 /usr/lib/perl5/5.6.1/Tie/Hash.pm 92 /usr/lib/perl5/5.6.1/base.pm 7525 /usr/lib/perl5/5.6.1/i386-linux/Config.pm 276 /usr/lib/perl5/5.6.1/i386-linux/Errno.pm 222 /usr/lib/perl5/5.6.1/i386-linux/Fcntl.pm 47 /usr/lib/perl5/5.6.1/i386-linux/IO.pm 239 /usr/lib/perl5/5.6.1/i386-linux/IO/Dir.pm 169 /usr/lib/perl5/5.6.1/i386-linux/IO/File.pm 612 /usr/lib/perl5/5.6.1/i386-linux/IO/Handle.pm 252 /usr/lib/perl5/5.6.1/i386-linux/IO/Pipe.pm 127 /usr/lib/perl5/5.6.1/i386-linux/IO/Seekable.pm 428 /usr/lib/perl5/5.6.1/i386-linux/IO/Socket.pm 453 /usr/lib/perl5/5.6.1/i386-linux/Socket.pm 129 /usr/lib/perl5/5.6.1/i386-linux/XSLoader.pm 117 /usr/lib/perl5/5.6.1/strict.pm 83 /usr/lib/perl5/5.6.1/vars.pm 419 /usr/lib/perl5/5.6.1/warnings.pm 38 /usr/lib/perl5/5.6.1/warnings/register.pm 14529 total
About 14,500 lines of code! If you run a trace of this test code, you will see that it also puts a big load on the machine to actually load these modules, although this is mostly irrelevant if you preload the modules at server startup.
CGI.pmsuffers from the same problem:
panic% wc -l `perl -MCGI -le 'print for values %INC'` 313 /usr/lib/perl5/5.6.1/Exporter.pm 124 /usr/lib/perl5/5.6.1/Carp.pm 117 /usr/lib/perl5/5.6.1/strict.pm 83 /usr/lib/perl5/5.6.1/vars.pm 38 /usr/lib/perl5/5.6.1/warnings/register.pm 419 /usr/lib/perl5/5.6.1/warnings.pm 225 /usr/lib/perl5/5.6.1/Exporter/Heavy.pm 1422 /usr/lib/perl5/5.6.1/overload.pm 303 /usr/lib/perl5/5.6.1/CGI/Util.pm 6695 /usr/lib/perl5/5.6.1/CGI.pm 278 /usr/lib/perl5/5.6.1/constant.pm 10017 total
However, judging the bloat by the number of lines is misleading, since not all the code is used in most cases. Also remember that documentation might account for a significant chunk of the lines in every module.
Since we can preload the code at server startup, we are mostly interested in the execution overhead and memory footprint. So let's look at the memory usage.
Example 13-12 is the perlbloat.pl script, which shows how much memory is acquired by Perl when you run some code. Now we can easily test the overhead of loading the modules in question.
#!/usr/bin/perl -w use GTop ( ); my $gtop = GTop->new; my $before = $gtop->proc_mem($$)->size; for (@ARGV) { if (eval "require $_") { eval { $_->import; }; } else { eval $_; die $@ if $@; } } my $after = $gtop->proc_mem($$)->size; print "@ARGV added " . GTop::size_string($after - $before) . "\n";
The script simply samples the total memory use, then evaluates the code passed to it, samples the memory again, and prints the difference.
Now let's try to load IO:
panic% ./perlbloat.pl 'use IO;' use IO; added 1.3M
"Only" 1.3 MB of overhead. Now let's load CGI.pm (v2.79) and compile its methods:
panic% ./perlbloat.pl 'use CGI; CGI->compile(":cgi")' use CGI; CGI->compile(":cgi") added 784k
That's almost 1 MB of extra memory per process.
Let's compare CGI.pm with its younger sibling, whose internals are implemented in C:
%. /perlbloat.pl 'use Apache::Request' use Apache::Request added 36k
Only 36 KB this time. A significant difference, isn't it? We have compiled the :cgi group of the CGI.pm methods, because CGI.pm is written in such a way that the actual code compilation is deferred until some function is actually used. To make a fair comparison with Apache::Request, we compiled only the methods present in both.
If we compile :all CGI.pm methods, the memory bloat is much bigger:
panic% ./perlbloat.pl 'use CGI; CGI->compile(":all")' use CGI; CGI->compile(":all") added 1.9M
The following numbers show memory sizes in KB (virtual and resident) for Perl 5.6.0 on four different operating systems. Three calls are made: without any modules, with only -MCGI, and with -MIO (never with both). The rows with -MCGI and -MIO are followed by the difference relative to raw Perl.
OpenBSD FreeBSD RedHat Linux Solaris vsz rss vsz rss vsz rss vsz rss Raw Perl 736 772 832 1208 2412 980 2928 2272 w/ CGI 1220 1464 1308 1828 2972 1768 3616 3232 delta +484 +692 +476 +620 +560 +788 +688 +960 w/ IO 2292 2580 2456 3016 4080 2868 5384 4976 delta +1556 +1808 +1624 +1808 +1668 +1888 +2456 +2704
Which is more important: saving enough memory to allow the machine to serve a few extra concurrent clients, or using off-the-shelf modules that are proven and well understood? Debugging a reinvention of the wheel can cost a lot of development time, especially if each member of your team reinvents in a different way. In general, it is a lot cheaper to buy more memory or a bigger machine than it is to hire an extra programmer. So while it may be wise to avoid using a bloated module if you need only a few functions that you could easily code yourself, the place to look for real efficiency savings is in how you write your code.
 
Continue to: