Developers often use sample inputs for testing their new code. But sometimes they forget that the real inputs can be much bigger than those they used in development.

Consider code like this, which is common enough in Perl scripts:

{
    open IN, $file or die $!;
    local $/;
    $content = <IN>; # slurp the whole file in
    close IN;
}

If you know for sure that the input will always be small, the code we have presented here might be fine. But if the file is 5 MB, the child process that executes this script when serving the request will grow by that amount. Now if you have 20 children, and each one executes this code, together they will consume 20 × 5 MB = 100 MB of RAM! If, when the code was developed and tested, the input file was very small, this potential excessive memory usage probably went unnoticed.

Try to think about the many situations in which your code might be used. For example, it's possible that the input will originate from a source you did not envisage. Your code might behave badly as a result. To protect against this possibility, you might want to try to use other approaches to processing the file. If it has lines, perhaps you can process one line at a time instead of reading them all into a variable at once. If you need to modify the file, use a temporary file. When the processing is finished, you can overwrite the source file. Make sure that you lock the files when you modify them.

Often you just don't expect the input to grow. For example, you may want to write a birthday reminder process intended for your own personal use. If you have 100 friends and relatives about whom you want to be reminded, slurping the whole file in before processing it might be a perfectly reasonable way to approach the task.

But what happens if your friends (who know you as one who usually forgets their birthdays) are so surprised by your timely birthday greetings that they ask you to allow them to use your cool invention as well? If all 100 friends have yet another 100 friends, you could end up with 10,000 records in your database. The code may not work well with input of this size. Certainly, the answer is to rewrite the code to use a DBM file or a relational database. If you continue to store the records in a flat file and read the whole database into memory, your code will use a lot of memory and be very slow.