The Apache server processes CGI scripts via an Apache module called mod_cgi. (See later in this chapter for more information on request-processing phases and Apache modules.) mod_cgi is built by default with the Apache core, and the installation procedure also preconfigures a cgi-bin directory and populates it with a few sample CGI scripts. Write your script, move it into the cgi-bin directory, make it readable and executable by the web server, and you can start using it right away.
Should you wish to alter the default configuration, there are only a few configuration directives that you might want to modify. First, the ScriptAlias directive:
ScriptAlias /cgi-bin/ /home/httpd/cgi-bin/
ScriptAlias controls which directories contain server scripts. Scripts are run by the server when requested, rather than sent as documents.
When a request is received with a path that starts with /cgi-bin, the server searches for the file in the /home/httpd/cgi-bin directory. It then runs the file as an executable program, returning to the client the generated output, not the source listing of the file.
The other important part of httpd.conf specifies how the files in cgi-bin should be treated:
<Directory /home/httpd/cgi-bin> Options FollowSymLinks Order allow,deny Allow from all </Directory>
The above setting allows the use of symbolic links in the /home/httpd/cgi-bin directory. It also allows anyone to access the scripts from anywhere.
mod_cgi provides access to various server parameters through environment variables. The script in Example 1-5 will print these environment variables.
#!/usr/bin/perl print "Content-type: text/plain\n\n"; for (keys %ENV) { print "$_ => $ENV{$_}\n"; }
Save this script as env.pl in the directory cgi-bin and make it executable and readable by the server (that is, by the username under which the server runs). Point your browser to http://localhost/cgi-bin/env.pl and you will see a list of parameters similar to this one:
SERVER_SOFTWARE => Server: Apache/1.3.24 (Unix) mod_perl/1.26 mod_ssl/2.8.8 OpenSSL/0.9.6 GATEWAY_INTERFACE => CGI/1.1 DOCUMENT_ROOT => /home/httpd/docs REMOTE_ADDR => 127.0.0.1 SERVER_PROTOCOL => HTTP/1.0 REQUEST_METHOD => GET QUERY_STRING => HTTP_USER_AGENT => Mozilla/5.0 Galeon/1.2.1 (X11; Linux i686; U;) Gecko/0 SERVER_ADDR => 127.0.0.1 SCRIPT_NAME => /cgi-bin/env.pl SCRIPT_FILENAME => /home/httpd/cgi-bin/env.pl
Your code can access any of these variables with $ENV{"somekey"}. However, some variables can be spoofed by the client side, so you should be careful if you rely on them for handling sensitive information. Let's look at some of these environment variables.
SERVER_SOFTWARE => Server: Apache/1.3.24 (Unix) mod_perl/1.26 mod_ssl/2.8.8 OpenSSL/0.9.6
The SERVER_SOFTWARE variable tells us what components are compiled into the server, and their version numbers. In this example, we used Apache 1.3.24, mod_perl 1.26, mod_ssl 2.8.8, and OpenSSL 0.9.6.
GATEWAY_INTERFACE => CGI/1.1
The GATEWAY_INTERFACE variable is very important; in this example, it tells us that the script is running under mod_cgi. When running under mod_perl, this value changes to CGI-Perl/1.1.
REMOTE_ADDR => 127.0.0.1
The REMOTE_ADDR variable tells us the remote address of the client. In this example, both client and server were running on the same machine, so the client is localhost (whose IP is 127.0.0.1).
SERVER_PROTOCOL => HTTP/1.0
The SERVER_PROTOCOL variable reports the HTTP protocol version upon which the client and the server have agreed. Part of the communication between the client and the server is a negotiation of which version of the HTTP protocol to use. The highest version the two can understand will be chosen as a result of this negotiation.
REQUEST_METHOD => GET
The now-familiar REQUEST_METHOD variable tells us which request method was used (GET, in this case).
QUERY_STRING =>
The QUERY_STRING variable is also very important. It is used to pass the query parameters when using the GET method. QUERY_STRING is empty in this example, because we didn't pass any parameters.
HTTP_USER_AGENT => Mozilla/5.0 Galeon/1.2.1 (X11; Linux i686; U;) Gecko/0
The HTTP_USER_AGENT variable contains the user agent specifications. In this example, we are using Galeon on Linux. Note that this variable is very easily spoofed.
SERVER_ADDR => 127.0.0.1 SCRIPT_NAME => /cgi-bin/env.pl SCRIPT_FILENAME => /home/httpd/cgi-bin/env.pl
The SERVER_ADDR, SCRIPT_NAME, and SCRIPT_FILENAME variables tell us (respectively) the server address, the name of the script as provided in the request URI, and the real path to the script on the filesystem.
Now let's get back to the QUERY_STRING parameter. If we submit a new request for http://localhost/cgi-bin/env.pl?foo=ok&bar=not_ok, the new value of the query string is displayed:
QUERY_STRING => foo=ok&bar=not_ok
This is the variable used by CGI.pm and other modules to extract the input data.
Keep in mind that the query string has a limited size. Although the HTTP protocol itself does not place a limit on the length of a URI, most server and client software does. Apache currently accepts a maximum size of 8K (8192) characters for the entire URI. Some older client or proxy implementations do not properly support URIs larger than 255 characters. This is true for some new clients as well—for example, some WAP phones have similar limitations.
Larger chunks of information, such as complex forms, are passed to the script using the POST method. Your CGI script should check the REQUEST_METHOD environment variable, which is set to POST when a request is submitted with the POST method. The script can retrieve all submitted data from the STDINstream. But again, let CGI.pm or similar modules handle this process for you; whatever the request method, you won't have to worry about it because the key/value parameter pairs will always be handled in the right way.
 
Continue to: