This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
”; print “You’re using the browser: $ENV{ ‘HTTP_USER_AGENT’ }.”;
135
12_556801-ch10.indd 135
8/31/10 9:20 AM
Configure a Directory to Use the CGI Handler
O
nce the user’s personal Web directory is active, and you have enabled both the CGI module and handler, the final step is to configure a directory to use the CGI handler. This is usually a subdirectory called cgi-bin, but you can customize its name. Apache executes Perl scripts that are placed in this new directory through the CGI handler, allowing you to produce dynamic Web pages for users. After you create the folder, you need to configure Apache to use the folder. You need to open the Apache configuration file and edit the
1
Open the Apache UserDir configuration file in a text editor.
2
Locate the
3
Type Options as a new value in the AllowOverride directive.
4
Save the UserDir configuration file and exit the editor.
5
Restart Apache.
6
In a terminal, go to the My Website directory (public_ html in Linux).
7
Type mkdir cgi-bin and press Enter.
8 9
Type cd cgi-bin and press Enter.
you are using Windows, edit Apache Install Dir\ conf\httpd.conf. If you are on Debian- or Ubuntubased Linux, edit /etc/apache2/mods-enabled/ userdir.conf. For Mandriva Linux, edit /etc/httpd/ modules.d/67_mod_userdir.conf. For all other Red Hat-based Linux distributions, edit /etc/httpd/conf/ httpd.conf. Apache comes with a pre-configured global CGI directory, as defined by the ScriptAlias directive, at the URL http://localhost/cgi-bin/. On a Windows system, this points to the directory Apache Install Dir\ cgi-bin\. On Linux, it is either /var/www/cgi-bin/ or /usr/lib/cgi-bin/. Files placed in this directory are automatically granted access to the Apache CGI handler. It is far better to create a new directory under your personal Web site URL, and grant it access to the CGI handler so that your CGI scripts run as unprivileged, regular users.
1
2 3
6
7 8
9
Type echo Options +ExecCGI > .htaccess and press Enter.
Note: Directives saved as .htaccess files do not require you to restart Apache.
136
12_556801-ch10.indd 136
8/31/10 9:20 AM
Create a new Perl CGI script called hello_world.pl.
! @ # $
Insert the Perl interpreter path header.
%
In a browser, go to the Perl CGI script’s URL.
•
! @
Insert some static CGI text. Save the script. (Unix-only) On a command-line, type chmod +x hello_world.pl and press Enter.
%
The browser displays the output of the Perl CGI.
Chapter 10: Configuring Apache to Execute Perl
0
The first line of the Perl CGI script is paramount. It must be the path to the Perl interpreter on your server. PERL DISTRIBUTION
PERL BINARY PATH HEADER
Any Linux Perl
#!/usr/bin/perl
Windows ActiveState Perl
#!C:\Perl\bin\perl.exe
Windows Strawberry Perl
#!C:\strawberry\perl\bin\perl.exe
If you are on Linux, remember to set the script’s executable-bit, as described in Chapter 6. You must do this once for every Perl script in your cgi-bin directory. It is technically possible to make your entire personal Web directory executable, and not create a specific directory for CGI scripts at all. Just create the .htaccess file directly inside of the directory My Websites (public_html on Linux), as previously described. In fact, you could eliminate the need for an .htaccess file entirely by appending ExecCGI to the Options directive in the UserDir
137
12_556801-ch10.indd 137
8/31/10 9:20 AM
Understanding the Apache Logs
A
ll Web activity that Apache handles is stored in log files on the server. The standard access log stores information about every HTTP request, requester, the time, and the results. The standard error
log stores information about every unexpected failure, warning, and anomaly that happens on the server; this includes any errors induced by Perl CGI scripts.
Accessing the Apache Logs
You can read the Apache logs by opening the log file directly in a text editor. While this accesses the file, you need to close and re-open the file in order to see new content. On Linux, you can use the command tail -f logfile to stream activity in a terminal in real-time.
The Log Directory The log directory into which Apache stores its logs differs, depending on your operating system.
OPERATING SYSTEM
APACHE LOG DIRECTORY
Windows
Apache Install Dir\logs
Debian/Ubuntu Linux
/var/log/apache2/
Red Hat Linux
/var/log/httpd/
The Activity Log
The Error Log
The activity log is saved in the log directory as access.log (or access_log). It describes all Web-based transactions the Apache server has responded to.
The error log is saved in the log directory as error.log (or error_log). It describes any error messages triggered by any failed Web-based transactions the Apache server could not respond to.
remote_host - - [timestamp] “GET URL HTTP/1.1” result bytes
In this format, an activity line describes the remote host’s IP address or hostname, the remote log name (as a dash if not available), the remote username (as a dash if not available), the timestamp of the request, the actual HTTP request, the status results code of the request, and the number of bytes transferred. This represents the Apache common log format. Apache also provides a combined log format which extends the common format by adding the referring URL and user-agent environment values. The status result code follows RFC 2616. Typically, 200 means success, 403 is a “permission denied” error, 404 is a “file not found” error, and 500 is a server error. You can find the full list of status codes online at www. w3.org/Protocols/rfc2616/rfc2616-sec10.html.
[timestamp] [type] [client remote_host] Error message text
In this format, an error line describes the timestamp of the request, the type of failure (error, warning, or info), the remote host’s IP address or hostname, and the raw error text describing the failure. The actual text contained in the error log is free form, meaning that there is no set formatting standard that is followed. The error message text is usually generated by the program being executed. Typically, if a Perl CGI script failed with a syntax error, you would see the error description in this log, exactly as it would be described if the script were run from the command-line.
Auto-Rotating the Apache Logs
Some versions of Linux auto-rotate the Apache log files on a daily basis. This means that the entries stored in /var/log/ apache/access.log and /var/log/apache/error. log may only describe today’s activity. If this is the case, yesterday’s activity is available in access.log.1 and error.log.1, activity from the day before is in access. log.2 and error.log.2, and so on. Most of the time, this
type of log rotation has a built-in upper limit, usually seven days. After the seventh day, the log file is deleted. Log deletion is usually a good idea, as it prevents the log directory from taking up too much space on the Web server’s hard drive. Often you can configure this by editing the file / etc/logrotate.d/apache2 on Debian or Ubuntu, or /etc/logrotate.d/httpd on Red Hat.
138
12_556801-ch10.indd 138
8/31/10 9:20 AM
T
he log files produced by an Apache server describe all Web activity that has been requested. Two log files are generated: access.log (or access_log) contains a summary of all HTTP requests received, and error.log (or error_log) contains all specific error messages if a problem occurs. You can configure these log files to customize their format, directory, and filename. You can save the format of the access log file as a template using the LogFormat directive. You can then apply the template, directory, and filename using the CustomLog directive. You cannot customize the format of the error log file, but you can change its directory and filename using the ErrorLog directive. Configure the Apache Logs
1
Open the Apache configuration file in a text editor.
2
Locate the LogFormat directives.
• •
From a single server, Apache can simultaneously provide Web page hosting for multiple domains using the
1
Chapter 10: Configuring Apache to Execute Perl
Configure the Apache Logs
The log format fields.
2
The log format name.
3 4
Locate the CustomLog directive.
5
Uncomment the combined log format.
6
Save the Apache configuration file and restart the Apache server.
Type # to comment out the common log format.
5
3
4
3
139
12_556801-ch10.indd 139
8/31/10 9:20 AM
Read the Apache Logs
Y
ou can access the Apache log files using any program that can open text files. On Windows this could be Notepad, or even using the type command at the DOS prompt. Opening the log file as a text file is okay for a server with very little activity, but it is not practical to view real-time data. The tail utility is very useful for examining the access and error log files for real-time activity. It is available by default on Unix systems but you may need to manually install it on Windows. The log directory Apache stores its logs into differs, depending on your operating system. On Windows, this is Apache Install Dir\logs, on Debian and Ubuntu this is /var/log/apache2/, and on Red Hat this is /var/log/httpd/.
Tail can accept multiple files on the command line, making it possible to monitor both the access.log and error.log files at the same time. Tail on Windows is only a standard component on Windows Server 2003 and 2008 installations. Unfortunately, it is not standard on Windows desktop installations, such as XP, Vista, and 7. A graphical version of Tail is available for Windows desktops at http://tailforwin32.sourceforge. net. Once downloaded, simply double-click the enclosed tail.exe file and open the access.log file.
Read the Apache Logs
1
Open a command-line prompt.
2
Navigate to the log file directory.
3
Type tail -vf access.log (or access_log) and press Enter.
1 2
3
Note: You may need elevated privileges to view the log file. Use sudo or su.
The terminal displays the most recent log activity.
140
12_556801-ch10.indd 140
8/31/10 9:20 AM
P
erl automatically forwards any error messages, such as syntax or interpreter errors, into the Apache error log. You can also define custom log messages in your Perl script that are relayed to this log file. These may include warning messages, debug statements, or any type of information that is manually coded. You can use this feature to monitor your Perl CGI scripts’ activity and performance. You can use the file handle called standard-error, or STDERR, for this purpose. Apache automatically picks up any text printed to this file handle and stores it into the error log. This can be especially useful if you are debugging a problem on a live Web site, but you cannot actually
see what the user sees. When you print important data to the user’s Web browser, print it again to standard-error.
Try replacing print STDERR with the warn or die functions. Along with printing to standard-error, warn and die also append the script’s name and line number into the error log, but only if the message does not end in a carriage-return (\n). The function die differs from warn in that it stops the execution of the script after relaying the message text. Alternatively, the Apache error messages can be forwarded back to CGI and the browser. For more information, see Chapter 12.
Forward Perl Activity into the Apache Error Log
1
Open a Perl CGI script in a text editor.
2
Type print STDERR TEXT; and press Enter.
3
Save the Perl script.
4
Access the Perl CGI script in a Web browser.
5 6
Open the Apache error log file.
•
1
Chapter 10: Configuring Apache to Execute Perl
Forward Perl Activity into the Apache Logs
2 2
5
Scroll down to the bottom. The Perl script’s log entry.
6 141
12_556801-ch10.indd 141
8/31/10 9:20 AM
Create an HTML Form
A
n HTML form collects information from a user by way of his browser, and relays that information to a Web server. Along with standard HTML formatting, an HTML form’s syntax defines the input fields that can be populated, any predefined values, the buttons to submit the form, and the URL of the script accepting the form’s data. To make this form useful, you need a CGI component at the other end of that URL which accepts the submitted data, interprets the data, reports any errors, and moves the user onto the next page according to the site’s workflow. An HTML form always begins with a form tag. This instructs the browser that an HTML form is beginning, and tells it the method type, and the URL that will receive the data when it is submitted. to close the HTML form.
Type text to create a new radio button and its text value. Duplicate the radio input fields, changing only the value and text.
Note: You can use radio buttons for multiplechoice questions, where the user can only select one choice at a time.
The input fields within the form can include text boxes, pull-down lists, multi-select lists, check boxes, and radio buttons. You assign each of these input fields a name, which is an optional value within the input HTML tag. If you provide a value, it appears as the default value when the form is rendered; if you do not provide a value, the input field appears blank. The Submit button normally appears at the bottom of the form. It does not require a name or a value, but these fields are submitted to the CGI if you define them within the HTML form. Finally, you complete the HTML form by closing the form tag.
1 2 3
4 5 6 7
142
13_556801-ch11.indd 142
8/31/10 9:20 AM
Chapter 11: Introducing Do-It-Yourself Perl/CGI Interaction
8
Type to create a Submit button.
Note: If a Submit button contains a name or value field, this information is relayed to the CGI script as a standard key/value pair. As a result, you can define multiple buttons, each with a unique name and value. This helps the CGI script identify what the user clicked.
8
Open the HTML form in a browser.
• • • •
The input text field. The check box field. The radio button fields. The Submit button.
An HTML form can be a static HTML file stored on a Web server, or it can be generated dynamically by another server-side CGI script. To validate the fields submitted by the user, they should be error corrected by the accepting CGI script. This refers to the portion of the code that validates that an “e-mail address is an e-mail address” and a “phone number is a phone number.” Finally, if any errors are discovered, you need to communicate this back to the user. A good practice is to re-display the same form, pre-populated with the user’s known-good values, and displaying a red background beside the fields where an error is discovered. This way, the user only needs to correct the missing fields and press the Submit button again. It is also possible to use JavaScript to validate the user’s data as a client-side control. This can be used prior to the form’s actual submission as a convenience to the user. However, if used, client-side validation should only complement server-side validation, and you should not use it as the primary means of error correction. Clever users can bypass JavaScript validation controls; users cannot bypass Perl’s validation controls when you write them correctly.
143
13_556801-ch11.indd 143
8/31/10 9:20 AM
Read HTTP GET/POST Parameters
W
hen a user fills in an HTML form and submits it, a CGI script needs to receive the data, parse it, decode it, and act upon it. For example, the raw form data is delivered to the CGI script in this format: firstname=John&lastname=Smith&email=jsmith%40 domain.com The parsing process involves splitting the string at each ampersand (&) to get each field, and then splitting each field at each equal sign (=) to get each key/value pair. The decoding process converts the non-alphanumeric characters from a percent-hexadecimal-ASCII representation, “%40”, into the original character, “@”. HTML forms are submitted using a method, either GET or POST. The GET method provides the form data within the query string in the script’s URL. The POST method does Read HTTP GET/POST Parameters
1 2
Open a static HTML script in a text editor.
3 4
Insert a series of Input fields.
the same thing as GET, except the data is sent as a part of the HTTP request. The Perl script interprets this data as being a part of standard input, or STDIN. Regardless of whether you use GET or POST, the CGI Perl script must convert the raw data into a hash reference. $param->{ ‘firstname’ } $param->{ ‘lastname’ } $param->{ ‘email’ } It is up to you to decide which method to use in your Web site. Generally speaking, GET forms are easier to test and debug new code on, as the data being submitted is visible right in the URL. POST forms are cleaner, but more difficult to validate visually without adding some additional Perl debug code. You should limit the total length of a URL to 256 characters; however, some browsers support more. Do not use GET if you are expecting a lot of data from the HTML form.
1
Type to close the form.
4
Save the file as form.html.
5 6 7 8
Open a new Perl CGI script.
9
Type my $data = {}; to declare a new hash reference.
Create a new ReadParams subroutine. Type my $input = $ENV{ 'QUERY_STRING' }; to declare a new scalar, and store the QUERY_ STRING environment value.
7 9
8
9
144
13_556801-ch11.indd 144
8/31/10 9:20 AM
Chapter 11: Introducing Do-It-Yourself Perl/CGI Interaction
0 !
Type return $data; to return it. Start a new foreach loop; split the input by the “&” character.
@
Split each element by the “=” character into key and value scalars.
#
Type $val =~ s/%(..)/chr( hex( $1 ) ) /ge; to decode any encoded characters in $val.
0 ! @
# $
Store each key and value into the hash reference.
% ^ &
Open the static HTML in a browser.
$ %
Type data into the fields. Click Submit.
^
The CGI script executes.
•
The submitted form data appears in the URL, encoded.
•
The parsed form data returned by Data::Dumper.
To handle the POST method, you need to make two changes. First, the static HTML form needs to specify the POST method: