This copy is registered to: Douglas Bryant
[email protected]
LEARNING PHP WAS NEVER THIS MUCH FUN
php|Tropics Moon Palace Resort, Cancun, Mexico. May 11-15 2005
Come learn PHP in Paradise with us (and spend less than many other conferences)
Ilia Alshanetsky - Accelerating PHP Applications, Marcus Boerger - Implementing PHP 5 OOP Extensions, John Coggeshall - Programming Smarty, Wez Furlong - PDO: PHP Data Objects, Daniel Kushner - Introduction to OOP in PHP 5, Derick Rethans - Playing Safe: PHP and Encryption, George Schlossnagle - Web Services in PHP 5, Dan Scott - DB2 Universal Database, Chris Shiflett - PHP Security: Coding for Safety, Lukas Smith - How About Some PEAR For You?, Jason Sweat - Test-driven Development with PHP, Andrei Zmievski PHP-GTK2 For more information and to sign up: http://www.phparch.com/tropics Early-bird discount in effect for a limited time!
At php|tropics, take the exam and The Magazine For PHP Professionals
Get Zend Certified ...and we'll pay your fees!
TABLE OF CONTENTS
php|architect
TM
Departments
Features
6
10
EDITORIAL Out with the Old
Secure SOAP Transactions in Command Line Applications by Ron Korving
7
What’s New! 20
48
Test Pattern
I N D E X
Spring Cleaning by Marcus Baker
54
68
by Lukas Smith
29
Product Review Visustin 3.0: The Flowcharter of the People ? by Peter B. MacIntyre
Database Abstraction in PHP Advanced Sessions and Authentication in PHP 5 by Ed Lecky-Thompson
40
Building a MySQL Database Abstraction Class by Tom Whitbread
Security Corner BBCode
71
exit(0);
58
Old School, New School, NO SCHOOL by Marco Tabini
Have you had your PHP today?
An XML approach to Templating using PHPTAL Part II by José Pablo Ezequiel Fernández Silva
http://www.phparch.com
NEW !
ce Lower Pri NEW COMBO NOW AVAILABLE: PDF + PRINT
The Magazine For PHP Professionals
EDITORIAL
Out with the Old
E D I T O R I A L
R A N T S
E
diting php|architect is, at the same time, a blessing and a curse. On the plus side, I get to read some really exciting material every month. On the minus side… I have to read all that material every month before the deadline for the next issue! Being an editor is very challenging—something I would have never guessed when I got into this line of work. I dare anybody to do it for six months and read a book the way they used to before. Gone is the lust for knowledge—to be replaced for a compulsive, incurable need to find typos and fix someone else’s grammar. Of course, someone else is the key here—it’s never your own mistakes you catch (regardless of whether you actively made them part of your own writing or didn’t catch them in another author’s work). There are, of course, many reasons why being the editor of this magazine no longer makes sense for me. First, our activities have grown so much from a single PDF publication to a group that encompasses PHP education on so many levels— print, books, training and conferences—that I constantly feel guilty that I’m not dedicating as much time as I should to making sure that the contents of php|a are always the best of the best (even though editing the magazine keeps me up many nights every month). Second, and most important, we must ensure that our supply of fresh ideas is, well, fresh. Change is everything—and it’s been time for some new thought patterns to be formed in the php|a brain for a while now. Armed with these problems, we have been working hard at finding a new Editorin-Chief for php|architect. It’s not been easy, but I hope that you’ll join me in welcoming Sean Coates to the gang. Sean is an active member of the PHP team (he works on the documentation—and I can’t think of a better way to be exposed to as much PHP technology as possible) and, like the rest of us, uses PHP in his everyday life. But don’t take my word for it—he will be introducing himself shortly. For my part, I bid you all farewell. Of course, you can’t get rid of me quite that easily—I’m still hanging on to my exit(0), and I will as always lurk on our forums trying my very best to confuse as many people as possible for every single post. Until next month… well, it’s up to Sean now!
In with the New One random afternoon, on IRC, I noticed Marco complaining about having to go edit an article, when he’d rather be doing something else. I naively retorted with “I actually like editing!” and, over the next few days, we worked out the details, evaluated my skills, and speculated on how much work was involved in editing an issue of php|architect. Now, only a month later, here I am. Allow me to introduce myself. As Marco indicated, I’ve been actively involved in the PHP community for approximately two years, now (and not-so-actively involved, before that, for another year). My attention and keystrokes are primarily spent writing and editing the PHP manual, but I’m also involved in several other projects, including documentation meta-projects and the maintenance of a popular PEAR package. I’ve been writing PHP, professionally, for over 5 years for various companies, involved in many sectors, from marketing to credit card processing. It is with great pleasure (and already some late nights) that I take the reins of what I believe to be the best recurring resource that is currently available for professional PHP developers. I’m also happy that Marco can offload some of his work to me, freeing him up to do the things he mentioned above. I believe that the owner of a business should be involved in his creation, but not necessarily intimately so. There’s a certain value in having the ability to take a step back, and view the fruits of your labor from a distance. With this pleasure, though, comes great responsibility. I hope to be accessible to you, our readers, in as many ways as possible. Please don’t hesitate to contact me with any complaints, criticism, snide remarks, ideas, or encouragement you may have. I’m usually very responsive by email (
[email protected]), or you might find it more convenient to drop your thoughts in our online discussion forums (http://phparch.com/discuss/). I look forward to hearing from you. Until next month, happy reading! (Yes, I stole his line.)
php|architect
TM
Volume IV - Issue 4 April, 2005
Publisher Marco Tabini
Editor-in-Chief Sean Coates
Editorial Team Arbi Arzoumani Peter MacIntyre Eddie Peloke
Graphics & Layout Arbi Arzoumani
Managing Editor Emanuela Corso
News Editor Leslie Hill
[email protected]
Authors Marcus Baker, Peter B. MacIntyre, Chris Shiflett, Ron Korving, José Pablo Ezequiel Fernández Silva, Lukas Smith, Ed Lecky-Thompson, Tom Whitbread
php|architect (ISSN 1709-7169) is published twelve times a year by Marco Tabini & Associates, Inc., P.O. Box 54526, 1771 Avenue Road, Toronto, ON M5M 4N5, Canada. Although all possible care has been placed in assuring the accuracy of the contents of this magazine, including all associated source code, listings and figures, the publisher assumes no responsibilities with regards of use of the information contained herein or in all associated material.
Contact Information: General mailbox:
[email protected] Editorial:
[email protected] Subscriptions:
[email protected] Sales & advertising:
[email protected] Technical support:
[email protected] Copyright © 2003-2005 Marco Tabini & Associates, Inc. — All Rights Reserved
April 2005
●
PHP Architect
●
www.phparch.com
6
NEW STUFF
What’s New!
N E W
S T U F F
php|architect prepares for php|tropics 2005 Ever wonder what it's like to learn PHP in paradise? Well, this year we've decided to give you a chance to find out! We're proud to announce php|tropics 2005, a new conference that will take place between May 11-15 at the Moon Palace Resort in Cancun, Mexico. The Moon Palace is an allinclusive (yes, we said all inclusive!) resort with over 100 acres of ground and 3,000 ft. of private beach, as well as excellent state-of-the-art meeting facilities. As always, we've planned an in-depth set of tracks for you, combined with a generous amount of downtime for your enjoyment (and your family's, if you can take them along with you). We even have a very special early-bird fee in effect for a limited time only. For more information, go to http://www.phparch.com/tropics.
Fast Template 1.3 Grafxsoftware.com announces the latest release of their PHP templating system, Fast Template. What's new in this version? • Added DELETE_CACHE function, to delete files what is older then expire time. • Added file extension to cache for example now a cache file name will be 62327a34b389dca70c7c15e9d81e57bd.ft (notice the extension .ft) This was necessary because of DELETE_CACHE function • Added include block which include another template by statement (like SSI do) It is useful if you have several different templates for different parts of page and you don't need to write any php code to gather all "blocks" of the page. Also is very helpful from designer point of view, he will see in a visual editor the result. Get more information from http://www.grafxsoftware.com/product.php?id=26.
CONFERENCES
PHP Input Filter 1.2.0 Need help filtering data and preventing attacks? Check out PHP Input Filter. According to the project's homepage, PHP Input Filter: "is a free php class that allows developers to easily filter input coming from the user (HTML forms, cookies etc) for a number of reasons. The focus of this tool is on customization. v1.2.0 features much more comprehensive anti-XSS protection, as well as the option of auto-stripping bad tags separate from any specified by the developer." To see a demo or to download, visit www.cyberai.com/inputfilter/.
April 2005
●
PHP Architect
●
www.phparch.com
Zend/PHP Conference and Expo 2005 Zend.com announces: Zend Technologies and KB Conferences proudly announce the Zend/PHP Conference & Expo 2005 taking place at the Hyatt Regency San Francisco Airport on October 18-21, 2005. The theme of the conference will be "Power Your Business With PHP" and will feature sessions in the following four tracks: The Business Case for PHP; Developing, Deploying and Managing Large-Scale PHP Applications; Integrating PHP with the Enterprise (including Web Services and XML); and PHP Resources: Tools, Libraries and Techniques. "We invite interested speakers to submit session proposals between now and July 15, 2005. Visit the conference website for more information about the conference or if you are interested in submitting a session proposal." Get all the latest conference information from Zend.com.
International PHP Conference 2005 Spring Edition Don't want to wait until October for the Zend/PHP Conference? Zend.com brings news of the International PHP Conference coming in May: "The International PHP Conference 2005 Spring Edition will take place from May 2, 2005 to May 4, 2005. The Conference features a PowerWorkshop day on May 2 with PHP/MySQL Best Practices, XML/WebServices with PHP 5, Rapid Application Development and a PHP Starter Workshop for Beginners. The main Conference days will include sessions on PHP Internals, XML, Databases, Migration to PHP 5 and others. Early bird discounts are available until April 1, 2005." For more information, visit phpconference.com.
7
NEW STUFF
Check out some of the hottest new releases from PEAR.
Net_Monitor 0.2.2 A unified interface for checking the availability of services on external servers and sending meaningful alerts through a variety of media if a service becomes unavailable.
LiveUser_Admin 0.2.1 LiveUser_Admin is meant to be used with the LiveUser package. It is composed of all the classes necessary to administer data used by LiveUser. You'll be able to add/edit/delete/get things like: • Rights • Users • Groups • Areas • Applications • Subgroups • ImpliedRights And all other entities within LiveUser.
LiveUser 0.15.1 LiveUser is a set of classes for dealing with user authentication and permission management. Basically, there are three main elements that make up this package: • The LiveUser class • The Auth containers • The Perm containers The LiveUser class takes care of the login process and can be configured to use a certain permission container and one or more different auth containers. That means, you can have your users' data scattered among many data containers and have the LiveUser class try each defined container until the user is found. For example, you can have all website users who can apply for a new account online on the webserver's local database. Also, you want to enable all your company's employees to login to the site without the need to create new accounts for all of them. To achieve that, a second container can be defined to be used by the LiveUser class. You can also define a permission container of your choice that will manage the rights for each user. Depending on the container, you can implement any kind of permission schemes for your application while having one consistent API. Using different permission and auth containers, it's easily possible to integrate newly written applications with older ones that have their own ways of storing permissions and user data. Just make a new container type and you're ready to go! Currently available are containers using: PEAR::DB, PEAR::MDB, PEAR::MDB2, PEAR::XML_Tree and PEAR::Auth.
File 1.2.0 Provides easy access to read/write to files along with some common routines to deal with paths. Also provides interface for handling CSV files.
XML_Wddx 1.0.1 XML_Wddx
does 2 things: a) functions as a drop in replacement for the XML_Wddx extension (if it's not built in) b) produces an editable WDDX file (with indenting etc.) and uses CDATA, rather than char tags
This package contains 2 static methods: XML_Wddx:serialize($value) and XML_Wddx:deserialize($value). It should be 90% compatible with wddx_deserialize(), and the deserializer will use wddx_deserialize if it is built in. No support for recordsets is available at present in the PHP version of the deserializer.
PHP 5 ionCube Encoder The good people at ioncube have announced the release of the new ionCube Encoder for PHP 5. "We are happy to announce the release of the new ionCube Encoder for PHP 5! The new Encoder fully supports all PHP 5 language constructs and can deliver a substantial increase in performance over unencoded PHP 5. The PHP 4 Encoder is provided for free with the PHP 5 Encoder. We have added Package Foundry support to the Windows version of the new Encoder, enabling a one-stop solution for those wishing to create, package, and deploy PHP applications. To demonstrate this support the Encoder download bundle now includes a Package Foundry evaluation. Existing PHP 4 Encoder customers are eligible for a discount when purchasing the new PHP 5 Encoder." For more details please visit www.ioncube.com.
April 2005
●
PHP Architect
●
www.phparch.com
8
NEW STUFF
Looking for a new PHP Extension? Check out some of the lastest offerings from PECL.
pecl_http 0.7.0 pecl_http provides: • Building absolute URIs • RFC compliant HTTP redirects • RFC compliant HTTP date handling • Parsing of HTTP headers and responses • Caching by "Last-Modified" and/or ETag (with 'on the fly' option for ETag generation from buffered output) • Sending data/files/streams with (multiple) ranges support • Negotiating user preferred language/charset • Convenient request functions to HEAD/GET/POST if libcurl is available • HTTP auth hooks (Basic) • HTTPi, HTTPi_Response and HTTPi_Request classes (HTTPi_Request only with libcurl)
maxdb 7.5.00.24 MaxDB PHP is an extension which provides access to the MySQL MaxDB databases. It is compatible with MySQL's mysqli extension.
big_int 1.0.1 Functions from this package are useful for number theory applications. For example, in two-keys cryptography. See /tests/RSA.php in the package for example of implementation of RSA-like cryptoalgorithm. The package has many bitset functions, which allow to work with arbitrary length bitsets. This package is much faster than bundled into PHP BCMath and consists almost all functions, which implemented in PHP GMP extension, but it needn't any external libraries.
crack 0.2 This package provides an interface to the cracklib (libcrack) libraries that come standard on most unix-like distributions. This allows you to check passwords against dictionaries of words to ensure some minimal level of password security. From the cracklib README CrackLib makes literally hundreds of tests to determine whether you've chosen a bad password. • It tries to generate words from your username and gecos entry to tries to match them against what you've chosen. • It checks for simplistic patterns. • It then tries to reverse-engineer your password into a dictionary word, and searches for it in your dictionary. • after all that, it's PROBABLY a safe(-ish) password. 8-) The crack extension requires cracklib (libcrack) 2.7, some kind of word dictionary, and the proper header files (crack.h and packer.h) to build. For cracklib RPMs for Red Hat systems and a binary distribution for Windows systems, visit http://www.dragonstrider.com/cracklib.
php-Booba 0.8.1 The php-Booba team announces the release of php-Booba 0.8.1. "php-Booba is a simple framework for developing Web applications. It contains classes for validating incoming data from forms, a powerful ticket-based request handling system, and a very fast template system." For more information, or to download, visit http://sourceforge.net/projects/php-booba
The Zend PHP Certification Practice Test Book is now available! We're happy to announce that, after many months of hard work, the Zend PHP Certification Practice Test Book, written by John Coggeshall and Marco Tabini, is now available for sale from our website and most book sellers worldwide! The book provides 200 questions designed as a learning and practice tool for the Zend PHP Certification exam. Each question has been written and edited by four members of the Zend Education Board--the very same group who prepared the exam. The questions, which cover every topic in the exam, come with a detailed answer that explains not only the correct choice, but also the question's intention, pitfalls and the best strategy for tackling similar topics during the exam. For more information, visit http://www.phparch.com/cert/mock_testing.php
April 2005
●
PHP Architect
●
www.phparch.com
9
FEATURE
Secure SOAP Transactions in Command Line Applications
F E A T U R E
by Ron Korving
Remote procedure calls using PHP have become increasingly popular in the past few years. Since the introduction of PHP 5, a SOAP extension has been bundled with the core PHP distribution. SOAP does not, in itself, provide a security mechanism, nor is the PHP-extension very suitable for command line applications. In this article, I will explain how you can achieve security for your SOAP transactions, and create your own SOAP-driven daemons on your servers.
S
OAP (Simple Object Access Protocol) is a protocol that enables you to run functions on a remote system (Remote Procedure Calls). It is derived from XML-RPC, which has been available in PHP since version 4.1, and as we will see later, SOAP messages are formatted in XML. Because it is such an open protocol, SOAP is programming language and operating system independent. This enables you to use PHP to communicate with any application as long as it can communicate using SOAP. The PHP SOAP extension was introduced in PHP 5 and is particularly useful when combined with PHP 5’s object oriented possibilities, because SOAP handler functions can all be implemented in a single class, and because the extension itself is completely implemented as classes. One of the nice things about having a SOAP extension in PHP is the ability to use this protocol to communicate with custom-made daemon applications that are running on remote servers. The wonderful thing about having a daemon running on the command line interface (CLI), instead of a web interface, is that you can run it with root permissions, enabling it to do virtually everything a web script is not allowed to do. Generally, SOAP relies on the HTTP protocol, which is a good thing, since it’s such a commonly spoken pro-
April 2005
●
PHP Architect
●
www.phparch.com
tocol. HTTP is, however, insecure by default. Of course, you can use the secure HTTPS protocol for SOAP transactions, but if you want to create a secure commandline daemon in PHP, you’ll have to embed an HTTPS web server in it. Luckily, the SOAP extension allows you to modify requests before they are sent, and responses before they are received. This allows you to apply the cryptographic algorithms and key-distribution mechanisms of your choice!
REQUIREMENTS PHP
5.x
OS
Any
Other Software
N/A
Code Directory
soap
RESOURCES
i
URL http://www.php.net/manual/en/ref.soap.php URL http://php.net/manual/en/ref.mysql.php
http://en.wikipedia.org/wiki/Block_cipher_ URL modes_of_operation
10
FEATURE
Secure SOAP Transactions in Command Line Applications
In this article, you will learn how to write both a SOAP client and a server script that will be able to communicate over an encrypted channel. Later, you will see how to run a SOAP server run from the command line and the cool possibilities that are created when doing so. A Simple SOAP Client and Server Let’s write a small script that we can use to connect to our server process. If you have never worked with the SOAP extension before, you will see how easy it is to communicate with the server: you can call remote methods as if they were local to your client. First, we will create an instance of the predefined SoapClient class. I will try to keep things as simple as possible, but keep in mind that a lot of extra information—such as a WSDL filename and proxy server information—can be passed to the SoapClient constructor. An example of creating an instance of SoapClient and calling a SOAP server function can be seen in Listing 1. The function we call is systemName()—why not? If we’re enabling ourselves to talk to other servers, let’s find out what their names are—and the SOAP server will be listening at the same host that the client will be running from. In our case, the URI we use won’t even matter—it is totally up to you. Now that we have set up our client script, let’s have a look at how to write a simple SOAP server script. It is always wise to put SOAP functions in a class so that everything is nicely bundled into a single package. You can tell your SoapServer instance to use the functions within that class. Our server script might look something like Listing 2. When the script is called, the handle() function will create an instance of our SoapHandler class and execute the function that was called by our client. You may wonder how the SoapServer class knows what the client was asking. It gets its information from the superglobal variable $GLOBALS[“HTTP_RAW_POST_DATA”] which contains all the data that was posted through HTTP. The code in Listing 2 would function the same way if we passed that variable as a parameter to handle(). The wonderful thing about this behaviour is that we can alter the incoming data before sending it off to our SoapServer object. The same behaviour can be achieved with our SoapClient object, allowing us to alter the SOAP request just before it’s sent off to the server. Altering a SOAP Transaction Let’s start simple and make our SOAP transaction a little smaller by compressing both the request and the response. As we have seen, it is easy to alter the SOAP request that is passed into the SoapServer object. There are three additional pieces of data must be altered to make this work: the sent SOAP request, the sent SOAP response and the received SOAP response.
April 2005
●
PHP Architect
●
www.phparch.com
On the client side, there is a very useful method in SoapClient that, when implemented, allows us to alter the data in the request and the response. This can be done by extending SoapClient to form our own custom class. When a SOAP function is called, the __doRequest() method in our SoapClient object is triggered. By extending the parent class, we can alter its behaviour to compress the request before it is sent to the server. This also allows us to decompress the response from the server before it is used. You can see how this is done in Listing 3. Now that our client is compressing and decompressing transactions, automatically, it is time to move on to our server script. We have already seen how to modify the input of the handle() function, but we haven’t seen how to influence its output. handle() creates a SOAP response (based on the return value of the invoked function) and streams that response to standard output. We can capture the contents of the SOAP response by using PHP’s output buffering mechanism (a simple Listing 1 1
Listing 2 1
Listing 3 1
11
FEATURE
Secure SOAP Transactions in Command Line Applications
call to ob_start() and another to ob_get_clean()). Once we have captured the data, it can be altered, and must be sent back to the client, explicitly, by echoing the modified response and its corresponding headers The headers that we specify are: Content-Length (we’ve compressed the data, so there’s probably less to send), and Content-Type (the data is no longer formatted as XML—it has become binary). Take a look at Listing 4 to see how our SOAP server looks after these changes. Using Encryption Instead of compressing our data, we could encrypt it or even do both. Encryption can easily be achieved using the mcrypt extension which has been available in PHP since PHP 3, but really only became powerful in PHP 4. I really want to recommend that you read more about cryptography before you start using it in your software. It’s best to understand why you do something a certain way than to just assume that it will be secure enough, when following the instructions I give below. There are many different kinds of encryption that you can apply with mcrypt, but we will stick to 256 bit AES encryption which is secure enough for our purposes. We will specify a key that both the client and server will use to encrypt and decrypt their data. We will also generate a non-secret initialization vector (IV), on the fly, which will scramble all data before encrypting it. This way, our key will be harder to crack, even if part of the contents of our data is known—which is the case with plaintext SOAP. As we have seen in the previous examples, only two functions are necessary to compress our data. We will now implement two functions that can be used on both the client- and server-side for encrypting and decrypting information. When sending an initialization vector and encrypted data, we must choose a format in which to put this information. In this example, I will format the information as such: 1 byte to specify the IV-length in bytes, the IV, the encrypted data.
These items are concatenated to form the transaction data. For easy handling, we will define a few constants to contain the encryption key, the cryptography algorithm, and the block cipher mode. Packaging these items like this will make it easier to maintain our software if this configuration ever changes. We now have a good reason to extend the SoapServer class: we can re-implement the encryption and decryption methods, and make our own handle() implementation. The new encryption and decryption functions will have to initialize (and close) the mcrypt library, declare/determine the cryptography algorithm, extract the initialization vector and the block cipher mode, and encrypt/decrypt the passed data with the
April 2005
●
PHP Architect
●
www.phparch.com
proper key. In the case of the encryption function, we can have mcrypt generate the IV and encrypt our data. The function can then return a long string of data that is formed by concatenating the length of the IV (in bytes), the IV itself, and the encrypted request. In the decryption function, we will parse this string to determine the IV and the encrypted data. Once these are separated, they can be passed to the mcrypt library,
“...only two functions are necessary to compress our data.”
which will give us what we want: the decrypted SOAP message. You can see the code for the client and server scripts in Listings 5 and 6, respectively. In the client script you will notice that _doRequest() has hardly changed. The calls to the compression functions have simply been replaced by calls to our crypto functions. Note that the key used in these examples is much too simple to be used in real-world situations. To be safe, generate your own key and make sure it’s as random as possible. Remember that your key doesn’t have to consist of “readable” characters. When you look at the server script you will see that our handle() function completely mimics the behaviour of SoapServer::handle(). Other than this new implementation, and the addition of our encryption and decryption functions (and the three defined constants), little has changed. Listing 4 1
12
FEATURE
Secure SOAP Transactions in Command Line Applications
SOAP In a Command Line Application Now that we’ve seen how to apply encryption to SOAP transactions, let’s move on to the tricky job of mimicking a SOAP server within a command line PHP application. The impact that this will have on the code that we have seen so far, is that we have to do all of the receiving and responding ourselves. We will write a simple and incomplete, yet functional (for what we need) HTTP server. Don’t let this scare you; it is easier than it may seem. Since our script now needs to handle the HTTP protocol, I will explain how a SOAP transaction works at this level. There are several types of HTTP requests. The most commonly known are GET and POST which can be used to retrieve a webpage, or, in this case, a SOAP Listing 5 1
April 2005
●
PHP Architect
●
www.phparch.com
Listing 6 1
13
FEATURE
Secure SOAP Transactions in Command Line Applications
response. The difference is in how the methods send data to the web server. With the GET method, all information—besides cookies—is embedded in the request path like this: GET /some_dir/some_file.php?name=John&sirname=Doe HTTP/1.1
An HTTP POST request is suitable for sending relatively large chunks of data to a web server. The first line of a post-action is simpler: POST /some_dir/some_file.php HTTP/1.1
After the other HTTP headers are sent, the POST data is appended to the request, in “var=value” form. In the case of a SOAP message, the contents of the request made up of the POST data. In Listing 7, you can see what a transaction actually looks like (it is unaltered). Our task is to parse this request and pass the correct data—which is everything we receive after the POST header—to our handle() function. Listing 8 contains a typical HTTP response. We will have to send something like this back to the client, but we will first encrypt the SOAP message. To handle the network operations, we can use the socket stream API, which was introduced in PHP 5. We will listen on a TCP port for SOAP clients to connect to our server, by calling stream_socket_server(), and when they do, we will retrieve the socket resource with stream_socket_accept(). This sequence will be performed in a loop, so we can accept clients until the process is killed. Once a client is connected, it will send the SOAP request, starting with the HTTP POST header. By parsing the headers, we can determine the HTTP version spoken by the client, and the length of the data that is being sent (in a line that looks like “Content-Length: 123”). These headers can be received with stream_get_line(), but, unfortunately, this function was buggy up to, and including, PHP 5.0.3. To make things a little easier, I’ve created a function called stream2_get_line() which we can use instead. After all of the headers have been received—that is, after we receive a blank line—we can read the SOAP request data. We will read the proper number of bytes (determined a header, as described above) with stream_get_contents() . Now that we have received the encrypted data, we have the same situation as before: we can serve the body of the request to our handle() function which will decrypt and execute the SOAP call, and output the encrypted response. We will catch this response and send it to the client, but only after we have sent the proper headers and a blank line, to mimic an HTTP server’s response. Once this is complete, we can close the client socket and wait for a new connection to be made. The modifications to our web-based server script
April 2005
●
PHP Architect
●
www.phparch.com
can be seen in Listing 9. Because our server script is listening on a certain port, and the path of our server script no longer matters, it is not necessary to request a certain path on the client side (the server script can basically ignore the requested path). So, in the case of our example, something similar to this would suffice: $soap = new MySoapClient(null, array( “location” => “http://localhost:12345/”, “uri” => “http://yourownuri/”));
Error handling Unfortunately, because the SOAP extension was designed to be run in a PHP environment that is hosted on an external HTTP server, throwing exceptions from within it is somewhat problematic—especially in “persistent” mode, which we will use when we implement session support (explained below). Hopefully, this will change in the future, but until then we will have to Listing 7 1 POST /soapdemo/server.php HTTP/1.1 2 Host: localhost 3 Connection: Keep-Alive 4 User-Agent: PHP SOAP 0.1 5 Content-Type: text/xml; charset=utf-8 6 SOAPAction: “http://yourownuri/#systemName” 7 Content-Length: 442 8 9 10 17 18 19 20
Listing 8 1 HTTP/1.1 200 OK 2 Date: Sat, 05 Mar 2005 10:05:14 GMT 3 Server: Apache/1.3.26 (Unix) Debian GNU/Linux PHP/5.0.3-1.dotdeb.0 4 X-Powered-By: PHP/5.0.3-1.dotdeb.0 5 Content-Length: 575 6 Keep-Alive: timeout=15, max=100 7 Connection: Keep-Alive 8 Content-Type: text/xml; charset=utf-8 9 10 11 18 19 20 Linux deepthought 2.4.26 #2 Sat Apr 24 14:02:08 CEST 2004 i686 21 22 23
14
FEATURE
Secure SOAP Transactions in Command Line Applications
Listing 9
Listing 10
1
April 2005
●
PHP Architect
●
www.phparch.com
1 \n” 52 . “\n” 53 . “ \n” 54 . “ \n” 55 . “ RPC\n” 56 . “ ”.utf8_encode($this>error).”\n” 57 . “ \n” 58 . “ \n” 59 . “\n”; 60 } 61 62 echo $this->encrypt($response); 63 } 64 } 65 66 ... 67 } 68 69 70 $soap->setClass(“SoapHandler”, $soap); 71 ?>
15
FEATURE
Secure SOAP Transactions in Command Line Applications
work around this problem. Luckily, there is a way to throw exceptions that a SOAP client can catch: the server can respond with a SOAP fault. A SOAP fault is not exactly the same as a regular exception, though. When a client receives a fault, the error information can be extracted from public variables called faultcode and faultstring. To account for this type of response, our client script’s remote procedure call would change to the following: try { echo “system: “.$soap->systemName(); } catch (SoapFault $e) { echo “Error (“.$e->faultcode.”): “.$e->faultstring; }
In our server script, we will have to generate a SOAP error message that contains the appropriate faultcode Listing 11
Listing 11 (cont’d)
1
16
Secure SOAP Transactions in Command Line Applications
handler class. We can pass the handle of our MySoapServer object as a second parameter to SoapServer::setClass() so the constructor of our handler can receive it and store it in a private variable that will be used whenever the handler needs to generate an error. Here’s a summary of what we will have to change in our server script to enable error handling (as shown in Listing 10): • implement a constructor in SoapHandler to receive a SoapServer object and store its handle in a private variable • implement a private error() function in SoapHandler that can be called from handler functions • add a public $error variable to MySoapServer that can be set from our SoapHandler • extend our MySoapServer::handle() function to deal with errors • change the call to fSoapServer::setClass():f pass theSSoapServer’s handle as a parameter when constructing the MySoapServer object Sessions It would be nice if our command-line script could communicate with the server at a more advanced level. We could establish a handshake procedure, during which the client could tell the server about itself. The server could then return a unique session key to the client. Using this session key, our server could verify the client’s identity during the following remote procedure calls. Now, it becomes possible for a client to execute multiple commands without losing state-information. Other possibilities include: • applying rules and boundaries to the client, depending on authorization levels • returning errors and warnings in the preferred language of the user • making sure the client and server use the same version of your API I’m sure you can probably think of a thing or two, yourself. To get you started, I’ll show you a way to create a session system, and provide multi-language exceptions. One thing is vitally important for any session system: persistence. You may not have realized this, but a SoapHandler instance was created (and destructed) for each SOAP request we processed. It’s impossible to store session information in our handler because it’s destroyed after every request. We can alter this behaviour with the SoapServer::setPersistence() function, and by passing the constant SOAP_PERSISTENCE_SESSION. This ensures that our handler is only destroyed once— when our script terminates. As I’ve mentioned, our session system requires that
April 2005
●
PHP Architect
●
www.phparch.com
FEATURE we implement a handshake procedure. This will simply be a single call (to the handshake() function) with which the server can determine the client’s preferences. In this case, we only need to tell the server that we want to receive error messages in English. Our client code becomes: try { $session = $soap->handshake(“en”); echo “system: “.$soap->systemName($session); } catch (SoapFault $e) { echo “Error (“.$e->faultcode.”): “.$e->faultstring; }
As you can see, the server’s handshake() function returns a session key which we will use for the following function call to systemName(). This way, the server will be able to identify us and can, if necessary, throw a SOAP fault with a message in the language that was asked for. The client was, of course, the easy part. The implications on the server side are a bit more extensive, but our MySoapServer class can remain the same—the error handling mechanism that we’ve already introduced can still be used the same way. The big changes are in our SoapHandler class. First,
“One thing is vitally important for any session system: persistence.” we’ll add a handshake() SOAP function that will initialize a new session, add it to the stack of sessions and return a generated session key. The easiest and cleanest way to achieve this is to create an instance of the SoapSession class whose constructor will generate the session key which can then be passed, as a parameter, to our systemName() implementation. Of course, this parameter must be passed to all SOAP functions we will write, under this framework, so we should create a private session checking function that will return the handle to an instance of our SoapSession object, if it could be found, for the given session key. This session checking function can set an error flag if an appropriate, existing session can not be found. Because the session retrieval function must crawl through the session stack, it’s a very convenient place to destroy sessions that have timed out. Our sessions will remember when they were last validated—this happens once for every SOAP function call—and the destruction function can simply compare the session’s
17
FEATURE
Secure SOAP Transactions in Command Line Applications
timestamp with the current time. If the difference is greater than a predetermined interval, the session can be destroyed. In order to return errors in the correct language, the error() function must be called with error messages in multiple languages, and because it will have to pick the correct language, it’s wise to move this function to the SoapSession class. This class, as you’ve seen, contains session object that enables us to determine which language to use. Now that we know what needs to change in our SoapHandler class, let’s take a closer look at the SoapSession class that we introduced, above. As I mentioned, its constructor will create a key for the session, and there will be a multi-language error handler. The key-generating constructor can be implemented in many different ways. The approach that I’ve chosen calls mt_rand(), a number of times, to fetch random characters from a constant that contains all of the possible characters that we could use in a session key. The results are concatenated to create a random, pseudounique key. The error handler will have to be changed so it can accept error messages in several languages, at the same time. This can be accomplished by passing an associative array whose keys denote a specific language, and whose values contain the text that we will return. The function call becomes:
apply encryption, handle errors, and create sessions. There are, however, a few things we didn’t cover that you should probably look at before you put all of this into a production environment. First, ask yourself what would happen if corrupt data were sent to your SOAP server. The decryption would still work, because a crypto algorithm has no idea what “good” or “bad” data is, but the SoapServer class was built to terminate the script on an error event, which is exactly what would happen. So, before you call the handle() function, you might want to verify that the incoming data is a genuine SOAP message. Next, you should consider how easy it is to mount a DOS (denial of service) attack against our SOAP server. A malicious client could keep sending (partial) requests with just enough time in between to keep the connection from timing out: other SOAP clients would have to wait forever. To solve this issue, you could create a more robust TCP server that can handle multiple connections, simultaneously. Check out stream_select() if you’re planning on implementing such a system. PHP 5 provides a very useful and easy to use extension for SOAP that, as we’ve seen, is relatively easy to use in a CLI environment. Error handling can’t, currently, be implemented the way I would like, but in the (perhaps near) future, things will probably improve. Until they do, I hope the method I described will help you out. Good luck and have fun coding!
$session->error(array( “en” => “Something went very wrong”, “nl” => “Er ging iets heel erg fout”, “de” => “Etwas ist sehr falsch gegangen”));
The error handler should be able to fall back to a default language, if the passed pool of messages does not contain the language specified in the session. To satisfy this requirement, we will define a default language constant in our class, so our handler can use the preferred language, if available, and if not, use the default. In a situation where even this is not possible, the error handler will return the first message in the passed array. This way, at least something will be returned, in a case where an error situation does not have a message for the preferred language. That’s almost all we have to do to implement a multilanguage session system. All that’s left is to make our SOAP handling functions use the new error() function in the SoapSession object. To see what our SoapHandler and SoapSession classes look like, take a look at Listing 11, where you will see that if the posix_uname() function does not exist, an error would be triggered.
About the Author
?>
Ron Korving is an advanced computer science graduate and senior PHP developer located in Breda (The Netherlands). He enjoys developing complex software architectures and writing low level software. You can
What’s Next? In this article I’ve shown you how to create a SOAP client and server, web- as well as command-line-based. You’ve seen how to alter transaction data (in-line),
April 2005
●
PHP Architect
●
www.phparch.com
contact Ron through his website: http://www.ronkorving.nl.
To Discuss this article:
http://forums.phparch.com/212 18
FEATURE
Database Abstraction in PHP by Lukas Smith
There is this myth that database abstraction is only useful when you need to be able to switch your code from one RDBMS to another. Obviously, this alone can be a key advantage in many situations. For example, when developing a product, do you really want to lose a potential client that has a different technology preference, or even a corporate-wide standard RDBMS? Database abstraction layers just might hold the answer.
T
he idea behind database abstraction is to stick an RDBMS independent layer in between your own code and the native interface. To some extent, the PHP database extensions already fit this description. However, these extensions are usually just wrappers around the native C APIs and, therefore, their interfaces are usually quite different. This means that even simple things like opening a database connection will result in radically different code, depending on your choice of RDBMS. The example in Listing 1 illustrates the differences in function names only—it gets even worse when looking at the parameter differences. The key point here is: coding against a single RDBMS is going to cut your market opportunities short. Coding for multiple database engines, however, will leave you with a maintenance nightmare, and huge development overhead. The time you would waste having to write and maintain all that duplicate code is much better spent optimizing your application’s general flow. With an abstraction layer, you can focus your efforts on RDBMS dependant code, exactly where you have the most potential gain. For example, most database engines have features that will help you optimize a specific task. If this feature is not available in your database abstraction layer in an abstracted form, there is nothing
April 2005
●
PHP Architect
●
www.phparch.com
stopping you from optimizing this exact piece of code for a specific RDBMS. Using a database abstraction layer will save you time on the simple parts of your code, so you’ll have more time to add these optimizations. Time spent writing multiple versions of your database code that doesn’t use an abstraction layer does not even guarantee that you’ll get the best performance! A great misconception is that database abstraction is only useful for the task mentioned above, but it can do so much more for you. For example, you may be interested in all of the new, advanced, functionality of the mysqli extension, but mysqli requires PHP 5 (although, a port to PHP 5 shouldn’t be a great challenge for most well written PHP 4 applications). Even if you’ve successfully ported your application to PHP 5, a simple search and replace to turn each mysql_* function call into its
REQUIREMENTS PHP
4.x, 5.x
OS
N/A
Other Software
N/A
Code Directory
dbal
20
FEATURE
Database Abstraction in PHP
mysqli_* equivalent is likely to fail. Even worse, it will mean that you’ll now need to maintain a separate PHP 4 and PHP 5 version of your code to avoid losing the entire PHP 4 market. An abstraction layer will allow you to start using the advanced features of MySQL 4.1, like SSL connections, native prepared queries and sqlstate error codes, while maintaining compatibility with older MySQL versions, through emulation. So, using a database abstraction layer also gives you forward compatibility to newer versions of the RDBMS you are already using! The example of the mysqli extension is likely to soon be repeated for most other database extensions with the imminent release of the PDO extension with PHP 5.1. I will talk about PDO in more detail later in this article.
switch, you can use this token while bargaining a new deal with your vendor’s sales team. Here is an overview of the benefits of using a database abstraction layer: • support several RDBMS with the same code • more time for optimizing the performance relevant area of your code • forward compatibility with new versions of the same RDBMS (and new PHP extensions) • reduced training costs when maintaining several projects that use different RDBMS • maintain freedom of vendor choice
“Another feature that some of these layers provide is an XML format that allows RDBMS independent database schema definition.” Even if you never plan to switch your RDBMS for a given project, you may still have to support different database servers, for different customers. Abstraction can save you from a having to invest in RDBMS-specific training, which, given the radically different database extensions in PHP, can cost quite a bundle. Using an abstraction layer allows you to design your project around your favourite RDBMS, and promise your customers that they will be able to switch to whatever they prefer, if your choice doesn’t pan out in the end. This allows you to truly leverage the price point advantage that you can offer your customers if you prefer an open source database. The customer doesn’t have to commit to your choice until he has been fully satisfied, and you can show off your RDBMS without a customer constantly breathing down your neck. Finally, being able to easily escape the grips of any specific database vendor (with little effort) is a large token to hold in your hands. For example, you don’t have to accept changes in a licensing policy, knowing you don’t have an alternative, because you’ve have invested too much money into code dependant on the given RDBMS. Even if the customer is not yet ready to Listing 1 1 2 3 4 5 6 7
// connecting to a firebird/interbase database ibase_connect(); // connecting to an oracle database OCILogon(); // connecting to a sqlite database slite_open()
April 2005
●
PHP Architect
●
www.phparch.com
What Is Possible? Obviously, database abstraction layers cannot abstract everything. More specifically, it is not feasible to abstract everything, especially since abstraction will always incur a performance overhead. The point of a good abstraction layer is to provide the greatest common denominator of features, without too much of a performance sacrifice. Functionality that is missing from one RDBMS and must be emulated is likely to reduce performance. Then again, the performance hit is usually minimal. Among the more popular abstraction layers for PHP are the following, in alphabetical order: ADODB, Listing 2 1 function writeSomeData($query, $ondemand = true) 2 { 3 global $mdb2; 4 // expect no such table error 5 $mdb2->expectError(MDB2_ERROR_NOSUCHTABLE); 6 // send off query 7 $res = $mdb2->query($sql); 8 $mdb2->popExpect(); 9 // table didnt exist? 10 if ($ondemand 11 && MDB2::isError($res, MDB2_ERROR_NOSUCHTABLE 12 ) { 13 $res = createSomeDataTable(); 14 // table was successfully created 15 if (!PEAR::isError($res)) { 16 // prevent infinate recursion by setting 17 // ondemand to false 18 return writeSomeData($query, false); 19 } 20 } 21 return $res; 22 } 23
21
FEATURE
Database Abstraction in PHP
Creole, Metabase, PDO, PEAR::DB, PEAR::MDB and PEAR::MDB2. Each of these packages has specific features and drawbacks that set it apart from the others, giving each a unique advantage, depending on the needs of the user base. ADODB has been around for quite a while, so it’s gained a significant following. The API is heavily based on MS ADO, so it should provide a fairly smooth transition for developers making the jump from Microsoft’s offerings. It provides partial abstraction for schema management and data type abstraction. John Lim, the project’s maintainer, has also added a large number of convenience methods that fit well with ADODB, but go far beyond the realm of database abstraction. A new, emerging choice is Creole. As ADODB aims to make transition easy for MS developers, Creole does the same for Java developers who have a background using JDBC. It makes heavy use of new PHP 5 features like exceptions and iterators, and it adheres strictly to PHP 5’s standards (it is E_STRICT compliant). However, it tends to be somewhat of a disadvantage to be the new kid on the block, with something as delicate as database abstraction—it’s likely that there are still some tricky portability issues lurking in Creole. Another, rather new, option is PDO (PHP Data Objects). Actually, it’s not all that new—the first meetings date back to the summer of 2003 where a bunch of people from the PHP project that were interested in
databases, including myself, sat down together, at LinuxTag in Kahrlsruhe, to come up with a plan to improve the database APIs in PHP. A few months ago, this code was made available for end user testing, and is rapidly approaching beta status. It’s slated to become one of the key new features in the upcoming PHP 5.1 release. Since it’s written in C, and it interfaces directly with the native RDBMS APIs, it is considerably faster than other options and is also able to do some things that are not easily possible in PHP user land code. For example, it can hook much deeper into the native error handling mechanisms, to provide the option of either using PHP errors, or exceptions. However, we decided that it would not make sense to get in too far into SQL abstraction, since it will be quite difficult to keep the (native C) drivers up to date with all the little gritty details that are different between database engines or even different versions of the same database. The oldest abstraction layer mentioned here is probably Metabase, which is maintained by Manuel Lemos. It is actually still compatible with PHP 3! However, this means that it’s rather slow on newer versions of PHP. Beyond the extensive compatibility, it features an incredibly large scope of features, ranging from data type to schema management support. It even abstracts things like the REPLACE syntax that was invented by MySQL and adopted by SQLite. Aside from being slow, a lot of criticism has been made due to the needlessly verbose API.
Figure 1 DBAL/Feature Connect Fetch Prepare Data Types Transactions Sequences Extensibility Drivers Additional Packages API Stability Reliability Schema Reading Schema Manipulation Error Handling Error Codes Documentation
ADODB
Creole
PDO
Metabase
DB
MDB
MDB2
+ + ++ + + AUTOCOMMIT + +/++ + ++ + + ++ + OWN ERROR HANDLER +
+ +/+/+ + BEGIN, AUTOCOMMIT ++ +/+ + + -+ EXCEPTIONS ++
+ +/+ LOB IS PLANNED + BEGIN -+/+/+/+/-PROVIDED BY PROPEL + PHP ERRORS OR EXCEPTIONS ++ --
+ +/+/+ + AUTOCOMMIT + + + + ++ + + ++ +/OWN ERROR HANDLER -+
++ + + +/+ AUTOCOMMIT + +/++ ++ +/+ +/ONLY SEQUENCES + PEAR ERROR + +
+ + + + + AUTOCOMMIT + + + + + + ++ + PEAR ERROR + +/-
++ ++ ++ ++ + BEGIN ++ ++ +/+ + ++ + + PEAR ERROR +/-
PHP Version
PHP4-5
PHP5
PHP5.1
PHP3-4
PHP4-5
PHP4-5
PHP4-5
Official State
stable
stable
alpha
stable
stable
stable
beta
April 2005
●
PHP Architect
●
www.phparch.com
22
Database Abstraction in PHP
The PEAR project also provides some very popular options for database abstraction. There are actually three options which exist in parallel, mainly for historical reasons. PEAR::DB has been around the longest and is actually bundled with all recent PHP releases. This, and the large number of packages that build on top of PEAR::DB, mean it’s fairly hard to establish alternatives, even though they are technically superior. PEAR::MDB is the result of a merge of the Metabase and PEAR::DB layers. It provides most of the features of both, while actually having better performance, but the API is a bit overwhelming, as a result of the merge. For this reason PEAR::MDB2 consolidates this merge and makes even more improvements in terms of performance and flexibility. Furthermore, PEAR::MDB2 is currently being modified to match the PDO API, making it possible to migrate from and to PDO, easily. The table in Figure 1 compares the level of portability attained, high level features and other relevant aspects of each of the abstraction layers on a simplified scale of --, -, +/-, +, ++. This list, obviously, hides some of the details, but should help, as a starting point, when making a decision. Also, note that each of these abstraction layers is likely to gain new features, regularly. Client API and Error Codes Most abstraction layers in PHP provide a more coherent API than the native APIs, which have usually received little PHP-specific API design attention, and instead, are mostly a direct wrapping of the databases client API. Abstraction layers are also usually object-oriented, Listing 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14
$dsn = ‘mysql://root:@localhost/example’; $db =& DB::connect($dsn); if (PEAR::isError($db)) { die(‘failed to connect’); } // disable result buffering $db->setOption(‘result_buffering’, false); // will use mysql_unbuffered_query() instead of mysql_query $db->query(‘SELECT * FROM foobar’); while (DB_OK === $res->fetchInto($row, $fetchmode)) { doSomething($row); }
Listing 4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
// create sql string $sql = ‘INSERT INTO foo (:url, :value)’; // send off query $stmt = $db->prepare($sql); // bind values $stmt->bindParam(‘:url’, $text); $stmt->bindParam(‘:value’, $decimal); // send off query $result = $stmt->execute(); // bind next set of data $text = ‘http://pear.php.net’; $decimal = 23453/8; // send off query $result = $stmt->execute();
April 2005
●
PHP Architect
●
www.phparch.com
FEATURE which most people will chalk up as a positive. More importantly, they usually offer some level of error code abstraction, which is very important, since some RDBMS extensions (unfortunately) have very limited error handling capabilities–they often require hacks to translate text error messages to proper error codes which enable you to handle errors more appropriately. For example, a query that has failed due to a connection error should probably alert the administrator, while a malformed query should alert the programmer. Having abstracted error codes will even enable you to write applications that automatically install themselves. Using PEAR::MDB2 you could write code as shown in Listing 2. Connecting, Querying and Fetching On top of these features, a database abstraction layer covers the basics like connecting, querying and fetching. The devil, however, is in the details. Some RDBMS convert column cases in a specific way, or convert empty strings to NULL. Since the necessary unification can be costly, these kinds of conversions are usually optional. PEAR::MDB and PDO provide a limited compatibility layer, but PEAR::DB and PEAR::MDB2 both feature a wide range of compatibility options for result set handling. A few database engines require you to connect as a specific user to modify the schema, in a specific way, thus requiring a separate connection for these types of tasks. More advanced abstraction layers provide ways to use both buffered and unbuffered result sets. Buffered result sets can fill up your memory, quickly, especially if you work with large result sets. If you are using queries that return few results, you may prefer to be able to use buffered result sets in order to be able to jump around within the data. Choice in this area is, obviously, a good thing. The PDO extension defaults to unbuffered result sets, but provides a fetchAll() method that allows you to fetch an entire result set into a multi dimensional array, enabling you to determine the number of rows in the result set, and providing the ability jump around within the returned data. The example in Listing 3 uses PEAR::DB to connect to a database, make an unbuffered query and fetch all the rows. Note that this feature was only recently added to PEAR::DB. Prepared Queries and LOBs Another feature that most abstraction layers cover is prepared queries, which help in protecting against SQL injection. On RDBMS that support this feature natively, you can expect a significant performance increase. Prepared queries are also the most efficient way to deal with LOBs (large objects), like images and very long text, as they allow you to stream the data to the RDBMS instead of forcing you to read the entire LOB
23
Database Abstraction in PHP
into memory first, and then pass it to the RDBMS in one chunk. Even though many abstraction layers provide an interface for this, the truth is that many just emulate prepared queries, even if they are natively supported by the database engine. The PDO extension provides a very efficient way to handle prepared queries that reduces the amount of costly function/method calls to an absolute minimum. These kinds of speed optimizations, however, can result in hard to read “magic” code. Listing 4 shows an example of this. Listing 5 contains an example that uses Creole with prepared statements to insert a LOB into the RDBMS. Other Data Types LOBs are not the only special kind of data type where you will run into incompatibilities between RDBMS. The topic of data type handling is much broader: things like date and time information tend to be handled slightly (or sometimes even totally) differently between databases. For example, MySQL has a very special column type called “timestamp” that will be automatically updated, under certain conditions. There are also differences in the handling of certain numeric and boolean data types. Some databases, like SQLite, do not even use data types. For these reasons, abstraction layers should provide methods to convert data from a database independent format into the specific format required for the given RDBMS. These abstraction layers will let you define the type of data for each placeholder to ensure proper handling inside the database or will, alternatively, provide methods to ensure properly quoted and escaped data to be embedded directly inside a query. Often, this requires limiting yourself to a subset of Listing 5 1 2 3 4 5 6 7 8 9 10
include_once ‘creole/util/Blob.php’; $blob = new Blob(); $blob->setInputFile(‘/path/to/your/file.gif’); $sql = “INSERT INTO blobtable (name, image) VALUES (?, ?)”; $stmt = $creole->prepareStatement($sql); $stmt->setString(1, ‘file.gif’); $stmt->setBlob(2, $blob); $stmt->executeUpdate();
Listing 6 1 2 3 4 5 6 7 8 9 10 11 12
$adodb->StartTrans(); $adodb->Execute($sql); // 2nd nesting level .. like to be in a different module $adodb->StartTrans(); # ignored if (!CheckRecords()) { $adodb->FailTrans(); } $adodb->CompleteTrans(); # ignored // end of 2nd nesting level $adodb->Execute($Sql2); $adodb->CompleteTrans();
April 2005
●
PHP Architect
●
www.phparch.com
FEATURE supported data types. PEAR::MDB2 takes this a bit further by making it easy for users to define their own data types. I have made use of this in my applications by defining a new abstract data type called “serialize” which will call serialize() when writing to the database and unserialize() when fetching data from the database. When using native prepared queries, the RDBMS can automatically and efficiently determine the proper type to use from the column definition in the database. In
“While abstraction layers carry a natural overhead, they might even help you improve performance.” some cases the type can be determined by looking at the type or format of the data, in PHP. However, this is a rather costly process. For this reason it is necessary for the developer to pass the data types to the abstraction layer, to ensure that the proper conversions are done with minimal overhead. Transactions All of the abstraction layers we covered earlier provide support for transactions, if available. Popular table handlers like MyISAM and HEAP for MySQL do not support transactions, so none of the listed abstraction layers are able to emulate transactions, nor would this really be feasible. Most abstraction layers take the route of providing an auto commit mode while others prefer to require that the user explicitly start every transaction. The abstraction mechanisms also provide interfaces that allow commit and rollback. ADODB takes things one step further and tries to assist the developer by preventing transactions from being committed when the developer was too lazy to implement error handling within all the SQL queries in the given transaction. For this reason, the CompleteTrans() method will automatically issue a rollback if an SQL error occurred. An example, illustrating nested use of this method, can be found in Listing 6. Sequences and Auto Increment Columns Another key area where abstraction is necessary is in the handling of unique identifier generation. Some databases provide this through sequences, while others
24
Database Abstraction in PHP
offer auto increment columns. While sequences are more flexible (since they do not have to be linked to a table like auto increment), it is not really possible to emulate auto incrementing columns with sequences; the other way around, which is what most abstraction layers supply, is much more suitable. Creole and PEAR::MDB2 provide a nice mechanism that allows you to use either auto increment columns or sequences, depending on what the RDBMS natively supports. The syntax, however, works slightly differently in each. Listing 7 shows how this works. As you can see, this results in a few extra method calls that would not be required, otherwise. Database Schema All mentioned abstraction layers provide some mechanism to introspect a result set, in order to determine the column and tables that were read. All but PDO also provide a way to read out the RDBMS schema. Some of the more advanced layers make it possible to CREATE, ALTER and DROP databases, tables, sequences and indexes. Another feature that some of these layers provide is an XML format that allows RDBMS independent database schema definition. Even automatic schema alter-
FEATURE ing may be supported if the user makes changes to the XML in their database definition. Metabase was the first layer to provide such a format, which was adopted by MDB, and MDB2. ADODB also recently gained such a format. In Listing 8, you can see an example of using the Metabase format to define a database with a single table called users, a few fields with an index and a sequence. Syntax Sugar We’ve seen common elements between database abstraction layers; let’s now look at how each has its own specialties and characteristics. For example, Metabase, PEAR::MDB and PEAR::MDB2 provide other nice-to-have features, like support for the MySQL REPLACE syntax. The two MDB layers even support limited subselect emulation for single column subqueries. Listing 9 shows the syntax required to make this work using MDB. All mentioned layers provide some higher level methods to make the developer’s life easier. Among other things, they all feature the ability to fetch a single column or all rows at the same time. Another nice touch that PEAR::DB and PEAR::MDB2 support is the wrapping of result sets in user defined result sets (with
Available Right At Your Desk All our classes take place entirely through the Internet and feature a real, live instructor that interacts with each student through voice or real-time messaging.
What You Get Your Own Web Sandbox Our No-hassle Refund Policy Smaller Classes = Better Learning
Curriculum The training program closely follows the certification guide— as it was built by some of its very same authors.
Sign-up and Save! For a limited time, you can get over $300 US in savings just by signing up for our training program! New classes start every three weeks!
http://www.phparch.com/cert
April 2005
●
PHP Architect
●
www.phparch.com
25
FEATURE
Database Abstraction in PHP
setFetchMode()). This should enable developers to, for example, wrap result sets inside PHP 5 iterators, which are an incredibly powerful way to speed up performance and encapsulate logic while iterating over result sets. ADODB has other features that provide nice table outputs of results, and caching. The PEAR abstraction layers are able to provide similar features through many of the packages that are part of the PEAR repository. On the other hand we have Creole, whose most notable extra is that it provides the basis for Propel, an ORM solution modelled after the Apache Torque project. PDO, being the youngest of the bunch, does not yet have as many add on packages available (see CrtxDB), but this is likely to change, quickly, with the release of PHP 5.1 in the coming months. Each layer has its own set of such API sugar that is not
really needed to enable portability. However, since these features add few lines of code, their availability doesn’t hurt, and might even affect your final choice. Writing Proper SQL It should be noted that, even though all of these great options to unify and emulate differences exist, a lot of portability issues can be evaded by simply writing ANSI compliant SQL and by watching out for known pitfalls. While abstraction layers will try to provide features that will make your code portable across all RDBMS, it may be sufficient for you to identify a subset of databases you want to support. This may allow you to ignore certain limitations or needlessly awkward unified APIs. The best resource to date on this topic is “SQL Performance Tuning” by Peter Guluzan and Trudy Pelzer, which gives you good advice on how to optiListing 8
Listing 7 1 2 3 4 5 6 7 8 9 10 11 12
$idgen = $creole->getIdGenerator(); // do we get id before or after performing insert? if($idgen->isBeforeInsert()) { $id = $idgen->getId($seqname); // now add that ID to SQL and perform INSERT $creole->executeUpdate(“INSERT .... “); } else { // isAfterInsert() // first perform INSERT $creole->executeUpdate(“INSERT .... “); $id = $idgen->getId(); }
i
REFERENCES ADODB http://phplens.com/lens/adodb/ Creole http://creole.phpdb.org/wiki/ PDO http://pecl.php.net/pdo Metabase
http://www.phpclasses.org/browse/ pack age/20.html
PEAR::DB http://pear.php.net/DB PEAR::MDB http://pear.php.net/MDB PEAR::MDB2 http://pear.php.net/MDB2 SPL & http://www.php.net/~helly/php/ ext/spl/ Iterators PEAR http://pear.php.net/ PROPEL http://propel.phpdb.org/
1 2 3 auth 4 1 5
6 users 7 8 9 user_id 10 integer 11 1 12 1 13 0 14 15 16 handle 17 text 18 20 19 1 20 21 22 23 is_active 24 boolean 25 1 26 N 27 28 29 1 30 user_id_index 31 32 user_id 33 ascending 34 35 36 37
38 39 users_user_id 40 1 41 42
43 user_id 44 45 46 47
Apache http://db.apache.org/torque/ Torque CrtxDB
http://crtx.org/index.php?area=Main&page=Crtx DB
SQL http://www.amazon.com/exec/obidos/ASIN/020179 Performance 1692/104-5164967-0225523 Tuning
April 2005
●
PHP Architect
●
www.phparch.com
Listing 9 1 2 3 4 5 6 7 8 9 10
$sql = ‘SELECT name FROM foo WHERE id = ‘. $db->getValue(‘time’, MDB_Date::now()); // subselect $subselect = $mdb->subSelect($sql, ‘text’); // sql string $sql = ‘SELECT is_active FROM bar WHERE id IN (‘.$subselect.’)’; // send off query $res = $mdb->query($sql, ‘boolean’);
26
FEATURE
Database Abstraction in PHP
mize your SQL for a number of RDBMS at the same time. Daniel Convissor also put together a set of slides ( http://www.analysisandsolutions.com/presentatio ns/portability/slides/toc.htm) that covers many of the common pitfalls and identifies similarities and differences between database systems. Another resource on this topic is maintained on Troels Arvin’s home page (http://troels.arvin.dk/db/rdbms/). Conclusion There are many reasons to use an abstraction layer beyond being able to switch the database backend of a specific project. While abstraction layers carry a natural overhead, they might even help you improve performance, since they allow you to spend your time on the performance relevant parts of your application. There are many choices out there, each with their own user base. For people who prefer long established code, I recommend PEAR::DB, PEAR::MDB and ADODB. If you are still stuck on PHP 3 then there is really no other choice than Metabase, which is a very complete, yet slow, option. People who love PHP 5 and exceptions should have a look at Creole or wait a bit longer for PDO, even though it will only provide abstraction for prepared statements and errors, it will certainly please the speed freaks, since it is written as a thin layer
April 2005
●
PHP Architect
●
www.phparch.com
on top of the native C APIs. In terms of feature completeness, PEAR::MDB2 can’t be beaten, but due to a lack of a stable release, PEAR::MDB might feel like the safer choice.
About the Author
?>
Lukas Smith (
[email protected]) is a well-known contributor to the PEAR project. Among other things, his fame comes from his work on the MDB and MDB2 database abstraction layers, and on the LiveUser authentication and permission package. He is probably less known for his work on various RDF packages for PEAR. Being one of the founding members of the PEAR group, Lukas is also an active contributor to the organization of the project, itself. He earns his living as one of the coowners, and the chief software architect of BackendMedia (www.backendmedia.com), which specializes in network-related services like intranet applications based on PHP.
To Discuss this article:
http://forums.phparch.com/213
27
NEXCESS.NET Internet Solutions 304 1/2 S. State St. Ann Arbor, MI 48104-2445
http://nexcess.net
PHP / MySQL SPECIALISTS! Simple, Affordable, Reliable PHP / MySQL Web Hosting Solutions P O P U L A R S H A R E D H O S T I N G PAC K A G E S
MINI-ME
$
6 95
SMALL BIZ $ 2195/mo
/mo
500 MB Storage 15 GB Transfer 50 E-Mail Accounts 25 Subdomains 25 MySQL Databases PHP5 / MySQL 4.1.X SITEWORX control panel
2000 MB Storage 50 GB Transfer 200 E-Mail Accounts 75 Subdomains 75 MySQL Databases PHP5 / MySQL 4.1.X SITEWORX control panel
16 95
/mo
900 MB Storage 30 GB Transfer Unlimited MySQL Databases Host 30 Domains PHP5 / MYSQL 4.1.X NODEWORX Reseller Access
NEXRESELL 2 $
We'll install any PHP extension you need! Just ask :) PHP4 & MySQL 3.x/4.0.x options also available
59 95
/mo
7500 MB Storage 100 GB Transfer Unlimited MySQL Databases Host Unlimited Domains PHP5 / MySQL 4.1.X NODEWORX Reseller Access
: CONTROL
php 5 4.1.x
POPULAR RESELLER HOSTING PACKAGES NEXRESELL 1 $
NEW! PHP 5 & MYSQL 4.1.X
PA N E L
All of our servers run our in-house developed PHP/MySQL server control panel: INTERWORX-CP INTERWORX-CP features include: - Rigorous spam / virus filtering - Detailed website usage stats (including realtime metrics) - Superb file management; WYSIWYG HTML editor
INTERWORX-CP is also available for your dedicated server. Just visit http://interworx.info for more information and to place your order.
WHY NEXCESS.NET? WE ARE PHP/MYSQL DEVELOPERS LIKE YOU AND UNDERSTAND YOUR SUPPORT NEEDS!
php 4 3.x/4.0.x
128 BIT SSL CERTIFICATES AS LOW AS $39.95 / YEAR DOMAIN NAME REGISTRATION FROM $10.00 / YEAR GENEROUS AFFILIATE PROGRAM
UP TO 100% PAYBACK PER REFERRAL
30 DAY MONEY BACK GUARANTEE
FREE DOMAIN NAME WITH ANY ANNUAL SIGNUP
ORDER TODAY AND GET 10% OFF ANY WEB HOSTING PACKAGE VISIT HTTP://NEXCESS.NET/PHPARCH FOR DETAILS
Dedicated & Managed Dedicated server solutions also available Serving the web since Y2K
FEATURE
Advanced Sessions and Authentication in PHP 5
F E A T U R E
by Ed Lecky-Thompson Native session support has been present in PHP since version 4, but its lack of sophistication means it is often found wanting in enterprise-level development environments. In this two part article, we’ll tackle sessions from the ground up; from recapping PHP’s built-in support right through to the development of a sophisticated brace of classes, especially optimized for session handling and authentication in PHP 5.
T
he emergence of the web as a powerful platform for building thin client applications that run in the browser is undoubtedly a classic example of using an old solution to solve a relatively modern problem. Indeed, it’s unlikely that Tim Berners-Lee ever imagined his HTTP protocol being used to serve the likes of Google, eBay, Hotmail or any of the other myriad portals, online e-zines and other interactive content so pervasive on the web today. This explosion took everyone a bit by surprise and, indeed, the emergence of such applications quickly brought with it a new requirement: the ability to remember and recognize a particular browser session from one click to the next, and to distinguish between those simultaneous sessions with ease. The most obvious and important requirement was authentication; if you require a user to log in to use your web application, you want the fact that they have logged in, and who they are to be remembered for the duration of their session, or at least until they log out. Otherwise, that user will face the login prompt with every single page they request–not exactly appealing. It’s a very simple and reasonable requirement if you think about it; a traditional application built in, say, Visual C++ will have the ability to remember what it was the user did maybe just five seconds ago. But as we’ll find out, there was never any native provision for such memory in the HTTP protocol and, as a result,
April 2005
●
PHP Architect
●
www.phparch.com
web developers have rather had to shoe-horn in the required functionality over the past ten years. A Quick History Lesson In the mid 90s, very much in the early days of the web as a whole, people started playing with something called CGI (Common Gateway Interface). Rather than web pages on a server simply being pieces of static HTML content, the web server could use scripts (often written in PERL) to make decisions on what content to display in real time, based on input parameters offered by a user. These input parameters were in fact what were known as GET and POST parameters. GET parameters were simply pairs of name=value entities encoded and appended to the URL being requested. POST parameters used the same principle, but the parameters were specified as additional data in the HTTP request made by the web browser. Of course, regardless of the protocol used, these parameters were never actually intended to be passed
REQUIREMENTS PHP
5.x
OS
Linux/UNIX or Windows
Other Software
N/A
Code Directory
sessions
29
FEATURE
Advanced Sessions and Authentication in PHP 5
by the user themselves. The intention was that the tag would be used in underlying HTML, which would allow the author of the HTML to render simple data entry controls on the page. Probably the simplest example of this would have been the standard site search form (think Google)—a single text box, and a submit button. By entering data into the text box and hitting submit, the value of that text box would be passed to a simple CGI script using either GET or POST, as described above. That CGI script would then do something with that value (possibly consult an external database, for example) and then return HTML which had been dynamically assembled in real time. Of course, nowadays, CGI scripts have very much taken a back seat. People are using dedicated web application development languages like PHP to accomplish more or less the same effect. And furthermore, they work more or less the same way; using the HTTP protocol to garner input data by the GET or POST protocols, processing it, and then delivering output directly to the user’s web browser. Let’s take a closer look at how it works under the hood. Under the Hood of HTTP Let’s build a very simple PHP script for the purposes of our example: Hello !
Save this onto your local development platform as hello.php. Assuming your server is truly local (i.e. it runs on the same machine you do your development on) you would access this script as http://localhost/hello.php . As you’ve probably guessed, you can get this script to do more or less what you’d expect it to by firing up a web browser and pointing it to something like: http://localhost/hello.php?first_name=Ed . No prizes for guessing that the output in the browser looks like this: “HHello, Ed!” This is all very basic stuff, of course, and as you’ll also have realized you can achieve the same effect with a replete with text box on a separate page, with hello.php as its target. But, let’s take a look at what your browser is actually doing behind the scenes. Fire up a command prompt (Windows), terminal session (OS X) or shell (UNIX) and execute: telnet 127.0.0.1 80—you’ll see something similar to the following: ed@ashsrv02:~$ telnet 127.0.0.1 80 Trying 127.0.0.1... Connected to 127.0.0.1. Escape character is ‘^]’.
April 2005
●
PHP Architect
●
www.phparch.com
Feed your telnet session the following, exactly as written—you may find it easier to enter this in a Notepad (or equivalent) document first, and paste it in: GET /hello.php?first_name=Ed HTTP/1.1 Host: localhost
Note that there are two carriage returns after the final line there–so you need to press the enter key twice. You’ll see output similar to the following: HTTP/1.1 200 OK Date: Tue, 15 Mar 2005 12:13:25 GMT Server: Apache/1.3.33 (Unix) PHP/5.0.1 X-Powered-By: PHP/5.0.1 Transfer-Encoding: chunked Content-Type: text/html Hello Ed!
So far, so good. You can probably recognize the HTTP response headers being sent by the browser (with various nuggets of useful information contained therein), followed by a carriage return, followed by the output data—the portion that is actually displayed by the browser. Let’s take a look at POST. Same principle, except the data isn’t encoded on the URL–it’s supplied separately. This makes POST a lot better suited where large volumes of data are involved (such as file uploads). There is also a line of thinking that suggests it is appropriate to use GET only where no data is being modified as a result of the request; so, for example, using GET for a search page is appropriate, but for a user registration page is not. Open another telnet session to your web server on port 80 exactly as you did before. This time, however, paste the following into the session: POST /hello.php HTTP/1.1 Host: localhost Content-Type: application/x-www-form-urlencoded; charset=”utf-8” Content-Length: 13 ?first_name=ed
The response from your server should be much the same, since $_REQUEST inspects both GET and POST parameters in the search for the key in question. A Stateless Protocol What you’ve probably spotted from the above is that the web server isn’t told any identifying information about the user making the request. In fact, the only data the web server can glean from any request, be it GET or POST, is: • The URL • The GET and POST (as well as COOKIE, but we’ll cover this later) variable names and their values
30
Advanced Sessions and Authentication in PHP 5
• The remote IP address of the user • The remote User Agent (browser version) … along with various other mostly useless bits of information. You might argue that the IP address is an identifying characteristic, and in the early days of the web, you might have been right, but in practice, these days, externally presented IP addresses get shared among numerous workstations, and can even change from request to request when ISP’s proxy servers are involved. It’s not a good test. For this reason, HTTP is often described as a “stateless” protocol; at the end of the HTTP request, everything is more or less reset back to how it was before it was made. It is as if the request never happened at all. The fact that HTTP is stateless plays havoc on our dream of the web as platform for thin client applications. If it cannot associate two requests as being made by the same web browser, then a number of concepts are suddenly out of reach: • Shopping carts • Persistent logins • Remembering user preferences Clearly, what is required is some mechanism for uniquely identifying a contiguous period of usage of a particular site by a particular user sitting at a particular computer. Basic Authentication A somewhat archaic mechanism which went some way to address this has more or less fallen out of fashion, these days. It is called HTTP basic authentication, and is usually achieved by means of an .htaccess file being strategically placed in the web server directory you wish to protect. When the web browser requests a page for which Basic Authentication is in place, it will respond appropriately to the remote browser in the response headers, essentially informing the web browser that the piece of content being requested is protected, and requires a username and password. The web browser will respond by challenging the user with a simple username and password dialogue box. When the user enters his username and password, the web browser can re-issue the request, enclosing the username and password as part of the HTTP request, with some very rudimentary encryption having first taken place. The web server will then verify the username and password and, if valid, will issue the content in the normal manner. Usefully, however, the web browser will “remember” that username and password for the lifetime of the browser session, and will automatically issue (as part of the HTTP request) said username and password for any
April 2005
●
PHP Architect
●
www.phparch.com
FEATURE subsequent request that falls within the same “realm”—typically this means that the credentials will be delivered for any document residing in the same directory as (or a subdirectory of) the page that originally caused the challenge to take place. This means that the user can browse from page to page without having to re-issue his username or password with each request. Problem solved, right? Well, not quite. There are a few serious limitations with this technique. • Design—there’s simply no means to customize the less-than-beautiful dialogue box that’s popped up by a user’s browser. If you look at a typical web site’s login page these days, you’ll see there’s quite a lot more information than simply “enter your username and password.” • Additional login feature—extra security steps like memorable words or partially obscured codes are out of the question. You either log in successfully, or you don’t. For this reason, forgotten password functionality is tricky, too. • Security—should you wish to associate a shopping cart or other state-critical data with the browser, one can only do so by associating such data server-side with the username and password provided—which always remain the same, of course. There is no unique instance identifier which can be used to divorce two separate user sessions that are potentially many weeks apart, or by which to prevent concurrent usage of the same login by more than one computer. • The username and password—potentially quite valuable pieces of information—are exposed with every single request, not just within the initial login. This increases the risk that they could be intercepted, especially since the encryption used in Basic Authentication is so weak. • In PHP, it is rare that a developer would wish to protect only specific directories or even specific files, so, using .htaccess to deliver these Basic Authentication dialogue boxes is something of a non-starter. In practice, you will want to restrict particular pieces of functionality, and hence you will have to engineer your application to deliver the necessary HTTP headers to challenge the browser–and interpret the returned username and password, yourself. Clearly, this isn’t looking like a very attractive option. There is another way, fortunately—sessions.
31
FEATURE
Advanced Sessions and Authentication in PHP 5
Introduction to Sessions Put simply, a session is comprised of a series of consecutive HTTP requests made of a single web site, by a single user over a period of time, from a single computer, using a single web browser. The principle is quite simple. A session is distinguished by a unique identifier, called a session identifier (or simply “session ID”). The session identifier is usually of a relatively obscure structure; clearly using something as simple as ‘A’, ‘B’, ‘C’ and so forth is not sufficiently obscure, since a malicious user could simply guess a session identifier which could already be in use, and, thus, gain access to another user’s session in the process. For an application such as online banking, this has clearly catastrophic implications. In fact, the most common format of a session identifier is a 32-character hexadecimal string. PHP’s built in session handling, which we’ll meet shortly, uses just such a format of identifier. With 32 characters, and 16 possible values for each character, there are 1632 possible valid identifiers. Clearly, it would be tricky to stumble upon one that’s already in use (though not impossible, as we’ll discover later). But how is the session identifier associated with the session? Propagating the Session Assuming the first request made of the application by a web browser will not present a valid session identifier, a new session will be created. Accordingly, a new session identifier must be generated. Details of this session— the identifier, when it was created, and any sessionlevel data (such as a shopping cart)—will be stored on the server. On subsequent requests, the browser must offer that session identifier to the server, along with the more mundane data of the request. From that, the server is able to identify the session to which the request belongs, and tailor its response accordingly. It all sounds too good to be true, and naturally, there’s a tricky bit. Once a session has been allocated, the browser has to issue that session identifier back to the web server with every single request. PHP’s built-in session handling manages such things for you, of course, but it’s worth looking at the two preferred methods for session perpetuation nonetheless, since in the second part of this article we’ll be more or less abandoning PHP’s built in technology in favour of something more robust. With this in mind, we’ll need to know how it really works. URL Rewriting One very simple way to ensure the session identifier is included with each request is to instruct PHP in such a way that every link in your output HTML is altered to include, as a GET parameter, the session identifier. You
April 2005
●
PHP Architect
●
www.phparch.com
can then reliably look for the session identifier in every HTTP GET request made by your application, safe in the knowledge that if one isn’t offered, this must be the first request made by the browser, and hence a new session is required. For example, let’s say your application contains the link: Go to page two
Once doctored with the session identifier, it will read: Go to page two
In practice, the easiest way to make this happen is to simply doctor the URLs yourself: ”>
Don’t forget to modify any Javascript which might cause the browser to redirect to another URL: window.location.replace(‘pagetwo.php’);
Would need to become: window.location.replace( ‘pagetwo.php?session_id=’ );
As you can see, URL rewriting opens up something of a can of worms. Don’t forget that if you miss just one place where a URL needs rewriting, the user’s entire session can be lost in a single click. There’s another pitfall, too: all of the URLs in the browser window will look something like this: http://www.example.com/index.php?session_id=123456789 AB...
What happens when a user tries to paste a URL to a friend in an instant messaging window, or similar? Quite clearly, it will include the session identifier. You may be smart enough to strip out session identifiers before you send URLs to friends, but your users may not be quite so diligent. One of two things will happen if the recipient clicks that link: with poor session security in place, you’ll find that the original user’s session is hijacked by the recipi-
32
Advanced Sessions and Authentication in PHP 5
FEATURE
ent. If that original user was logged in at the time they sent the link, the recipient will be logged in too—as that original user. Clearly, this is undesirable behaviour. The alternative isn’t all that much more attractive. With some decent session security in place, which we’ll discuss in more detail in the second part of this article, a significant mismatch of IP address or User Agent would render the session invalid. This avoids any potential security risks, but at the same time, invalidates the original session as a side effect of that security. As a result, although the recipient of the link cannot hijack the session of the original sender, the original sender will find themselves forgotten by the site. Clearly, URL rewriting isn’t the ideal way to perpetuate a session, but like so much in life, there is another way.
er. The scope of the cookie is limited to the web server in question, and the lifespan set to some arbitrary (but pre-configured) limit, equating to the desired maximum length of any given session, usually something in the order of thirty minutes. Future requests to the web server would include the session identifier, as a cookie. The server application would then be tasked with consulting either a database, a rule set or some other validation method in order to determine whether or not that session was valid and plausible. If it was, the application would be allowed to work with the session indicated; if not, the specified session would be scrapped for safety’s sake, and a new session would be generated and its identifier sent with the output response headers.
Using Cookies Using URL rewriting to perpetuate session identifiers is conceptually simple, but tricky to implement. Using cookies is the reverse; conceptually rather complex, but in terms of the code required, extremely straightforward. A cookie is simply a small packet of information sent by a web server to the web browser in the HTTP response header, just before the body content arrives. It is, in its simplest form, a variable name (such as session_id ) and a value (such as 012345678901234567890123456789ab). Once sent, the web browser stores that cookie on the computer— either in memory or on disk, depending on the lifespan of the cookie. For future requests, the web browser will send the name and value of that cookie to the server, along with the usual HTTP request (including GET or POST) data. In addition to its name and value, a cookie is blessed with two additional, important properties: lifespan and scope. The lifespan of a cookie determines the duration for which it will be included with requests. After that period has elapsed, the cookie will be destroyed, and will not be included with any further browser-server transactions. The scope of a cookie dictates the servers, paths, and domain names of URL requests for which the browser will send the cookie. Scope may be dictated such that a cookie is only applicable to a particular host name (wwww.example.com), path (wwww.example.com/path/), or domain name (**.example.com). It is important to set scope correctly, lest the web browser inadvertently reveal the cookie to a third party web site which does not need (or is privileged) to see it. Using a cookie to hold the session identifier is a pretty standard practice these days. The application is configured such that if a request does not offer a cookie containing a session identifier, then a new cookie will be issued at the time of that first request, and before any normal document output is sent to the web brows-
Relating Data to a Session This is all very well and good, but more often than not you will want to use your session to persist some kind of useful data. For sites that use some degree of authentication, it is likely you would want to persist the username (or, more likely, user ID) of the currently authenticated user. Similarly, for sites that offer e-commerce functionality, the contents of the user’s shopping cart would be a perfect candidate for persistence. A shopping cart is a complex structure: at the very minimum a linear array of associative arrays, and more likely an instantiation of some form of ShoppingCart class. PHP has a wonderful method called serialize(), which creates a string representation of a variable and its contents, however complex that variable might be— from a simple string, right down to an instantiated class. With this in mind, there are two obvious approaches—only one is a good idea, however. The first approach is to simply send, as a cookie, the contents of the session-level variable to the browser. After all, you might argue, you are sending a cookie with the session identifier. Why not simply send a cookie for everything else you want to store? The answer is twofold. First, that session identifier is useless information to anything other than your web server. If for some reason it were hijacked or discovered, its value is meaningless. There are very few privacy concerns raised about somebody knowing an alphanumeric string. Conversely, there are quite legitimate concerns about somebody knowing the contents of your shopping cart—or worse. Second, cookies are just text files. Go ahead and have a look in your browser’s profile or cache directory if you find this hard to believe. They’re not stored in any encrypted way whatsoever. This means that you can’t really trust that a cookie being sent along with an HTTP request hasn’t been modified in some way—possibly maliciously, or even by some third party virus or application. This holds true for the session identifier cookie too, of course, but modifying the session identifier will
April 2005
●
PHP Architect
●
www.phparch.com
33
FEATURE
Advanced Sessions and Authentication in PHP 5
accomplish very little, given the obscurity and randomness of that identifier. Modify it, and chances are all you’ll do is break it. Clearly, therefore, holding this data in cookies is a bad idea. The second—and far better—approach is to associate pertinent data with the session in question on the server side. This is typically accomplished by using a database, the file system or some other data store to retain the value of particular variables with a particular session—again, using a simple name and value pairing. When the value of a particular variable is needed, the application simply consults that data store, locating the place in which variables pertaining to that session reside, and then looking up its value. A similar technique is used to update the value of an existing variable, or create a new variable. Either way, the fidelity of data is ensured completely, since it is all unique to that session. For authentication purposes, this might work as follows: • User visits site • No session is found, so a new session cookie is sent to the browser • The user tries to access restricted content, and is redirected to a login page • The user logs in successfully • The server associates a user ID with that session • The user browses the restricted area of the site, freely; with each request, the server is able to determine that the session is logged in, successfully, by virtue of the user ID associated with that session • The user logs out, and the server removes the user ID variable from that user’s session • For our shopping cart, it might work as follows: • User visits site • No session is found, so a new session identifier cookie is sent to the browser • The user tries to add an item to her cart • The server determines that there is no cart variable associated with the session, and creates an empty cart, before tying it to that session • The item is added to the cart, successfully • The user browses other areas of the site • The user returns to “My Cart” • The server consults the variables associated with that session and, lo and behold, finds a cart • The value of that variable is retrieved from the database, and the contents of the cart are displayed on screen April 2005
●
PHP Architect
●
www.phparch.com
The most likely candidate for storing these session-level variables is a back-end database, with a table called session_variable or similar, but it is equally feasible (although not always advisable) to use the disk. In fact, that’s just how PHP does it, naturally. Session handling in PHP Enough theory. By now you know how sessions and session-level variables work inside and out. It’s time for us to look at how it’s done in PHP. We’re not going to look at this in too much detail, however, as you’ve probably done this a hundred times without even thinking about it—although at least now you know how it works. Perhaps more importantly, however, is that at the tail end of this article we’re going to look closely at the enormous limitations of PHP’s built in session handling—in effect, why they’re simply not suited for the enterprise. Fear not—next month, we’ll take you through a far better approach. Making a PHP Script Session-aware By default, a PHP script is not session-aware. This is quite intentional, on the part of the PHP development team; there are applications where you simply don’t need sessions, and unnecessarily introducing the overhead in such cases would quickly prove frustrating. Accordingly, if you want to use sessions in your project, you need to tell PHP about it. If you’ve never used sessions in PHP before, you might be expecting some kind of well-structured Session class. No such luck. The function you call is (like many others, in PHP) globally accessible, and separated, entirely, from any kind of structured class hierarchy. To make a script session-aware, simply use the session_start() method, which always, slavishly, returns TRUE, so don’t waste your time trying to determine whether it worked or not by capturing its return value. There are two important points to remember here: • This statement must exist in any script that you want to be session-aware. If it’s not there, no session data will be available. • This statement must be called before any output is sent to the browser. In practice, this means that a control block containing the call must exist in the script before any HTML or white space. If you’re
Listing 1 1 4 Hello! My session ID is .
34
FEATURE
Advanced Sessions and Authentication in PHP 5
using Smarty templates or some other bestpractice MVC model, this won’t be too much of a worry. Let’s try this out with a very simple example that can be seen in Listing 1. Save the code as session_demo.php, fire it up in a web browser, and you should see output similar to this:
in your web browser, and you should see something like this: My favourite food is currently Pasta! Go to page two
Click the link to go to page two. You should see something like: My session id is 282366a16469843a648c2448b818a2cc My favourite food is still Pasta!
Hello! My session ID is 8f3e585753c0a18bcef7c4ae02e7a2e6
Hit refresh. Notice that the session identifier is the same, no matter how many times you reload? Now close your web browser, re-open it, and fire up the demo again—you’ll see that you’ve been assigned a new session identifier. The act of closing your browser flushed the cookie, so next time you went to the URL, no session identifier was available, and PHP assigned you a new one. Tap in another very simple example (below), and save it as showcookies.php . Save it alongside session_demo.php and fire it up in a web browser.
Your output should look something like this:
As you’ve probably worked out, by making an assignment to $_SESSION, the variable name (and its contents) are stored and associated with that session. On the second page, PHP simply consults the lookup table of variables associated with this session, and successfully presents you with the expected value. If you’re curious to see how PHP stores this data, go back to the temporary directory around which we were snooping just moments ago, and open up (in a text editor) the sess_ file that corresponds to your session ID (which is revealed in pagetwo.php). You should see something that looks like the following (note that on UNIX you may have to become root before you are able to read this file successfully): favorite_food|s:5:”Pasta”;
array(1) { [“PHPSESSID”]=> string(32) “2f9eb85773c0a78bcef7c4ae02e7a2e5” }
Crucially, the value of PHPSESSID should equal the session identifier you just printed out using session_demo.php. As you can see, PHP pushes its session identifier to you using a cookie called PHPSESSID, with its scope limited the current web server URL, and its validity equal to the duration for which the browser window is left open. Take a look on your development server’s disk, in /tmp (UNIX and Mac OS X) or C:\WINDOWS\TEMP (Windows). You’ll see a bunch of files that look like this: root@ashsrv04:/tmp# ls sess* sess_02a46185cb4f3067f66ce7fca639feb0 sess_7dab27c074853d6c69a74076262a2516 sess_1e60d310bcbc12f5b7b0276807ddc360 sess_af7b488577ba1e2ec565f30e07e3b8fe sess_1ebaeb66be9fc2b431bc565400417817 sess_c5459608baba66d9c47c4bc162cc4a17 sess_282366a16469843a648c2448b818a2cc
Yep, that’s right, PHP stores its session data—serialized—in files in the system’s temporary directory, with a filename that corresponds the session identifier in question. Once again, it’s not exactly rocket science. Applications for Session-Level Variables You can use this technique to provide primitive authentication in PHP. Simply register a $_SESSION variable, perhaps called USER_ID, which contains the user’s ID because they have successfully logged in. If you are using a back-end database to contain user records (username, password, first name, and so forth) you will almost certainly have a numeric primary key. It is best to register this value as the session-level variable, Listing 2 1 2 3 4 5 6 7 8
This is how PHP knows that the session identifier you’re offering is a valid session—it exists on disk. You might be thinking this is not the most sophisticated of security mechanisms, and you’d be right. Session-level variables in PHP Let’s touch now on how session-level variables are stored using PHP’s built in session handling. Whip up the quick test found in Listing 2, in your text editor, and save it as firstpage.php. Now, in the same directory, save Listing 3 as secondpage.php. Fire up firstpage.php
April 2005
●
PHP Architect
●
www.phparch.com
My favourite food is currently ! Go to page two
Listing 3 1 2 3 4 5 6
My session id is My favourite food is still !
35
Advanced Sessions and Authentication in PHP 5
rather than trying to store the entire user record. Because only your application has the ability to create session-level variables, you can be quite certain that if a session-level variable called USER_ID exists, then it legitimately contains the identifier of a user who has successfully logged in to that session. Similarly, you can use this technique to store a shopping cart. Simply create a primitive class called ShoppingCart which has the ability to store references to products (either as product IDs, or instances of another class, etc.) and the quantity of each product that the cart contains. You can then simply store the instantiated version of this class in $_SESSION exactly as you would store a user ID—PHP will take care of the serialization for you. When you need to grab the cart’s contents, or need to add a product to it, simply bring it back in again by using the data in $_SESSION. Limitations of PHP Session Handling There’s not a great deal more to say about PHP’s built in session handling other than that it does the job, but barely. Those of you interested in the various additional session handling methods that are available may wish to consult the full reference in the PHP manual at http://www.php.net/manual/en/ref.session.php . As you might have guessed by how briefly we touched on it, using PHP’s built in session handling is far from bestpractice. Let’s look at some compelling evidence why. Coding Style The first big complaint is really one of coding style—or rather, the lack of it. In version 5, our language is finally given the OOP support it has so desperately needed from day one, but the lack of object encapsulation of PHP’s admittedly vast range of functionality hardly encourages its developers to make use of it. Sessions are a particularly good example of this. Java makes use of a class interface called HTTPSession. This is retrieved from the instance of HTTPServletRequest that is exposed in the doGet method of a servlet. The accessor methods, getAttribute and setAttribute, are then used to read and write data from and to the session. PHP provides all the same functionality as global function calls and variables, but this is not conducive to PHP’s role in the enterprise. We’ll look at how we can fix this. Security The mechanism that PHP uses to store session data (in the /tmp directory) is far from secure. In a shared hosting environment such as that used by many ISPs, this directory is globally readable and writable by the user as which the web server process runs (such as the user nobody under Apache 1.3.x, by default). In other words, in such an environment, web sites other than your own may have access to your session data files. It would be
April 2005
●
PHP Architect
●
www.phparch.com
FEATURE a relatively trivial task to whip up a script that dumped the output of the entire server’s session data into an email. You could also pick a session that looks interesting, modify your browser cookie to transmit the correct session identifier, and visit the site in question. Wham— you’re in. This wouldn’t be such a big deal if PHP did some kind of sanity checking on session identifiers to ensure they’ve not been hijacked. There are more sophisticated techniques, some of which we’ll meet next month, which include: ensuring the first two octets of the remote IP address remain consistent from request to request, ensuring the HTTP user agent remains identical from request to request, and looking for unusual patterns or timings in a session’s HTTP requests. PHP, by default, performs none of these checks, and believes you if you claim to be the rightful owner of a particular session. Even if you weren’t lucky enough to have access to a list of valid session IDs, it’d be pretty straightforward to write a script which simply brute-forced session IDs at a particular server. This is a pretty common attack, and a good reason why an increasing number of PHP server administrators are choosing to hide the PHP version signature (easily accessible using tools such as that found at netcraft.com, which uses data gathered from standard HTTP headers). Admittedly, PHP session identifiers are 32-digit hexadecimal strings, with quadrillions of underlying permutations. But if you think this session identifier is generated by picking a number between 0 and somethingquadrillion, you’d be wrong; it’s an MD5 serialization of a much simpler seed, based on much smaller random numbers. The actual number of permutations is significantly smaller—something in the order of millions, and not at all hard to cycle through until you stumble upon a valid one. Were PHP to employ some kind of default secondary key technology, whereby the primary key must be accompanied with a valid pair key, and the proffering of any invalid pair key will render the session described by that primary key invalid, session hijacking in this manner would be virtually eliminated. But it doesn’t, and as a result, the door is (in theory) wide open. Multiple Server Environments Finally, many enterprise-class web applications use multiple web servers in a load balanced environment. This load balancing either takes place in DNS, or is distributed by means of a dedicated load balancing appliance, such as an F5 BigIP controller. In the case of hardware load balancing, any given HTTP request may be directed at any one of a number of web servers. The same, actual, server is not “leased” (sometimes called “sticky”) for the duration of a session.
36
FEATURE
Advanced Sessions and Authentication in PHP 5
With session data stored on disk, you can imagine what will happen. The first request will generate the session successfully—but it will only be stored on one of the servers. If the second request happens to hit a different server, the session identifier will appear invalid, and another will be generated. If you try to register a variable against that session, it will be stored on one server, but not any of the others. You can get around this by using a shared session storage directory that is exposed via NFS, but anybody who’s ever played with NFS will appreciate how slow this can be. A Better Approach Of course, there is an approach which gets around all three of the above problems. Next month, we’ll take time to put together a comprehensive toolkit to encapsulate HTTP requests, sessions and user authentication. Our toolkit will be fully object-oriented, it will implement several additional mechanisms to prevent potential breaches of security, and, finally, it will use a database to store both session details and any variables associated with that session, to allow it to be fully-functional in a multiple-server environment. To get the most out of the toolkit, you’ll need to
know how it works, and I’ll go to great length to walk you through every step of the way when you meet it next month. In the mean time, get yourself prepared so you’ll get the most out of your new toolkit—make sure you understand sessions inside and out.
About the Author
?>
Ed Lecky-Thompson (
[email protected]) is founder of Ashridge New Media, a professional development agency based in London, England. Ashridge works almost exclusively in PHP as a preference, and Ed has led development on more than a fifty large PHP web applications in the past six years. Ed has also co-authored Professional PHP 5, and contributed to Beginning PHP 5, both published worldwide by Wrox.
To Discuss this article:
http://forums.phparch.com/214
Award-winning IDE for dynamic languages, providing a powerful workspace for editing, debugging and testing your programs. Features advanced support for Perl, PHP, Python, Tcl and XSLT, on Linux, Solaris and Windows.
Download your free evalutation at www.ActiveState.com/Komodo30
FEATURE
Building a MySQL Database Abstraction Class
F E A T U R E
by Tom Whitbread
In this article, you’ll learn how to tame the MySQL API by creating a class which will handle errors, allow query execution, transport results, and strip or add slashes to your input data.
I
f you are using MySQL as your database server, you will be familiar with built-in PHP commands such as mysql_query(), mysql_connect(), mysql_fetch_array() and so on. After using MySQL for a short while, I noticed that I was using the same chunks of code over and over, such as the mysql_connect() command, a mysql_query() and then an error checking statement. I’d just copy and paste every time I needed to run a query. I felt this was getting ridiculous and there must be another way to do things. Persistent Connections and Includes When I looked around at other peoples’ code on pastebin posts, mailing lists and forums, I noticed they where using classes for all sorts of things, but not for accessing the database. Good old functions like mysql_query(), mysql_result() and their siblings where in repeated use. So, what we have is code similar to what is in Listing 1: around 6 or 7 (or possibly more) lines of code which are doing an important, but simple job. If you look at Example 1, and immediately think that this guy doesn’t know what he’s talking about, please bear with me. The example in Listing 1 is bad practice, and I urge you to not do this. A more appropriate practice would be to have a common file where we specify a persistent connection to the MySQL server. Store the host, username and password within, then include this
April 2005
●
PHP Architect
●
www.phparch.com
file using include() or, more likely, require(), so a connection is formed, and accessible whenever we include this file. This is still—by no means—a “perfect” solution, but it means we don’t need a mysql_connect() call every time we want to run a query. We also don’t need to run mysql_close(), because we are using a persistent connection to MySQL. I personally prefer using mysql_pconnect() on all my projects; it provides ease of use because you always know a connection will be present and it’s a lot more efficient than its non-persistent brother. There are two benefits outlined in the PHP manual for why a persistent mysql_pconnect() is better than a mysql_connect(), in short this is what they are:
REQUIREMENTS PHP
4.x
OS
Any
Other Software
MySQL
Code Directory
mysql
RESOURCES
i
URL http://php.net/manual/en/ref.classobj.php URL http://php.net/manual/en/ref.mysql.php
40
FEATURE
Building a MySQL Database Abstraction Class
When using mysql_pconnect(), the function will first try to find a connection that’s already been opened with the same host, username and password. If successful, an identifier for the existing connection will be returned, instead of opening a new connection. The connection to the MySQL server will not be closed when the execution of the script ends. Instead, the link will stay open for future use, so mysql_close() is not needed, also mysql_close() will not close connections established by mysql_pconnect(). Beware though, if you are limited to any number of persistent connections, and your server’s workload is high, you could easily exceed your connection limits. Your script may not be able to connect, and will, instead, produce an error message. I still always check for errors with the persistent connection just to be on the safe side of things. A simple if block used when you call mysql_pconnect() will do it. Let’s now rehash the code I used as an example (Listing 1), into something that’s much more reusable. We’re going to employ an included file for database parameters, such as the host, username and password. We’re also going to use a persistent connection. Enter your own server’s values for host, username, and password. Be sure to alter the queries to match your schema, and call mysql_select_db(), appropriately. To summarize what we have in Listing 2: we use a file called common.php to create a persistent connection to the MySQL server. We then select a database with mysql_select_db(), and check to make sure that was successful. Then, in another file—I called mine main.php, but it’s really up to you—we included the code from common.php with require_once(), which does the same as require(), except it checks to see if the file has already been included, and if so, it won’t be included a second time. This ensures we won’t have duplicate declarations trying to create persistent connections. The script will also stop executing if require_once() cannot find the file. Next up, we run a query and place the contents into a variable. Be sure to change the query to match your own MySQL database table so it will actually return Listing 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
April 2005
●
PHP Architect
●
www.phparch.com
something. The query I show has dummy columns for the sake of the example. We then check if the query went ok. If it didn’t, mysql_error() will let you know which error MySQL has returned. Then, we display the results using a while loop, combined with mysql_fetch_assoc() to get the results from the query into an array called $row (in this example), and then iterate the result set through a foreach loop to trim any whitespace. I also added a count at the bottom, purely to demonstrate this possibility. Admittedly, this isn’t the best looking code, frontend-wise, and it certainly isn’t a very practical example in its current form. It’s merely a demonstration of one way you might connect to MySQL. With the knowledge of persistent connections, having implemented error checking, we can get onto what I really want to discuss, and that’s extending the MySQL class in PHP so we have a reusable library that is both efficient, and simple to implement. Writing a Class to Extend MySQL’s Functions in PHP If you use MySQL a lot, then this is going to be the obvious progression for you, as a developer, but you may be daunted, confused or maybe you’ve just never thought of doing this, before. No worries, I am writing Listing 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
//////////////////////////
41
FEATURE
Building a MySQL Database Abstraction Class
this to guide you in creating a class to extend MySQL’s functions and also for people who might be new to the whole using classes to extend functionality idea. Writing your own classes is incredibly powerful when programming any application, as you can have a huge amount of control over what happens in your code. You can program completely flexible pieces of code which can then be used over and over with just a small amount of modification (i.e. parameters, database information, directory structure etc.). If you’re now thinking, “Wait a minute! You said this wouldn’t involve reusing code, at the beginning of this article.” Please don’t get me wrong, I am not going to write about going back to copy and paste chunks of code; I’m going to write about another way of doing things: Object Orientated Programming (or “OOP”). OOP is a way of programming your applications, and consists of a whole skill set, in itself. Developing functions and classes that will be flexible enough to recycle in other projects, quickly, and with little hassle, is a really useful skill to have. OOP requires logical thinking and the ability to effectively put your ideas into practice. As always, planning is vital when writing any class or function. Always make sure you have a plan, even if it’s just written in pseudo code, so you know exactly what you are going to do and how you are going to do it. This can save time and avoid unforeseen difficulties, later on, if, for instance, you want to add a feature and then find out you can’t because another piece of code will “break” because of your changes. When you find yourself working around a knot of code, sometimes it’s just easier to start over from scratch, with good planning, so you can try to avoid this as much as possible, and leave the application open-ended. This allows you to come back and add features without needing to redesign sections of code without worrying about code breakage. When coming up with the plan for this class, we know we’re going to need a function that connects to MySQL and then selects a database, and another function to execute queries. The connection function will require parameters such as the host, username and password as well as which database we’re going to use. When we run a query, we’ll want to be able to access Listing 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
April 2005
●
PHP Architect
●
www.phparch.com
the results of this query, easily, so, what this class is going to do is set the columns (as arrays), as a properties of the called object. For instance, say we have a column named title and we want to display it, using this class, we would implement something like Listing 3. The code in Listing 3 will run a query using our class (which I’ll call “MMysqlx” from now on–for MySQL eXtra functions) and then return the results, using a for loop. The $length variable simply contains the number of Listing 4 1
42
FEATURE
Building a MySQL Database Abstraction Class
rows returned, so we know how many times to cycle the loop. We also use the $i variable from the for loop to identify each array item, because each row is contained in the property, as an array, beginning with 0 and going through to the last entry. So, if we where using Listing 3, and wanted to get the first result of title, we would need to execute something like this, Listing 5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Listing 6 1
April 2005
●
PHP Architect
●
www.phparch.com
outside of the for loop:
By looking at Listing 3, we know we need a function called query() to execute the SQL, and we’ll also need to know the length of the array (to determine how many results have been returned). Let’s set a _length property in our object—it is prefixed with an underscore to denote it as a private property to that class, and if we just called it length, we might run in to trouble if a column in a table was also called length. Another, often needed, element is the id that MySQL returned for the last inserted row. So, if we are inserting data, let’s return a property called _last which contains the result of mysql_insert_id() for the executed query. If the query only returns one row, we don’t need an array because there is only one entry. Instead, we’ll just set the object properties (one for each column) to the retrieved data, in string format. We can ensure that we always addslashes() to a query by adding a boolean parameter; if it is set to true we are inserting, and false when we are pulling data. We could code two functions, one for inserting and another for selecting data, but I have decided to use this flag approach, because I think it’s more efficient this way. Creating two functions to do a very similar job is really a waste of time, when you can just set a boolean flag to let the function know what it is doing. Because we are going to use a persistent connection, the function that connects to MySQL can be part of the class declaration. This allows us to drop the include() and also use more than one database server, if ever needed, easily. I also mentioned error checking, earlier. We can write a function to handle this internally, or if you already have a class for email or error reporting, you could combine your class with our class’ error reporting mechanism. The error checking function will use a parameter that is passed to it, internally, and then email the specified address, about the error, if one has occurred. So now we have all this planned out in writing, let’s get to coding it. The code can be seen in Listing 4. Take a look at the error function: the mail function may need to be changed for your own server, and the address will definitely need changing to your own. We now have a fairly robust and flexible MySQL class which we can use to easily run queries on the database server, and return the results of these queries as an object. This class also ensures that if you have any problems with the MySQL server on your site, you will learn about it quickly (via email), and any problems are handled and presented to the user via an error screen.
43
FEATURE
Building a MySQL Database Abstraction Class
How Can I Use this Practically? Of all the tutorials I have ever read, only a handful have ever actually given useful, practical examples of using the information that they taught. So, I am going to try and write some examples of how to use the class we have created in an everyday, practical, production environment.
to make it work with PostgreSQL or Oracle, go for it. I’d love to see the results. Problem 1: Validating a Username and Password A very common task, in PHP, is to check two or more values from an HTML form against the data stored in a
“We can, embed much more complex expressions inside TAL parameters, as long as we obey the proper syntax.” Hopefully, by setting up a few problems, and their respective answers, I will get your creative juices flowing so you’ll be able to see how this class will be helpful to you. I must make a point of saying this though: the Mysqlx class, on its own, is not going to be any where near as powerful for your application as a specific extension of the class, programmed for your own requirements, would be. For pretty much every site I work on; I will code some specific classes just for it, as they save a lot of time and hassle. The Mysqlx class is simply 3 functions that connect, query and report errors. When you think about how many other things you need to do in the life cycle of a program, this really isn’t going to cut it. One of the major practical uses of the Mysqlx class to incorporate it with a templating system, be it the Smarty engine or one you have coded yourself. With such an implementation, you can still send as much non-database-related output as, and an error message for the data that couldn’t be displayed. I find that a white page with some black text saying “Error…” doesn’t really fill the user (or client) with confidence, or a true understanding of what happened. The risk is that the user will leave your site, and not bother coming back. If you have your own mailing class you should add it to the error function of Mysqlx, as this will enable you to customize the error messages more to your own needs. If it is an incredibly important project, and your MySQL server goes down, perhaps you could find a way of being alerted via an SMS or recorded message to your cell phone. So far, we have focused purely on the MySQL database server but the Mysqlx class can be modified for any database server, as long as PHP supports it. Because of the way we have programmed the class, using the built-in functions for MySQL, we haven’t singled out a specific database type. This class will also work on both MyISAM and INNODB types. If you want to modify the class
April 2005
●
PHP Architect
●
www.phparch.com
database table. A password is usually stored in MD5 or SHA-1 hashed form, to improve security. A username or email to identify this user is just a string that will need no encryption. So we just have to check two values against MySQL to determine if they are exist and match, and then output the results. The credentials that were provided from the form are either right or wrong; there is no grey area, apart from a possible misspelling of the username or the wrong password. We need to account for this scenario: we don’t want the user to be scared off by an error message. If it is “EError: Entry Denied” or something similar, and the user thinks their data is, in fact, correct this can be unsettling. So, let’s say we already have a form with two fields: a textfield called username and a password field called password. How can we use the Mysqlx class to access MySQL and check if the credentials are valid? Listing 5 shows you how. Don’t forget to replace the path used by require_once() with the actual path to the mysqlx.php file. Before asking the database, this script checks that the values required for the query ($$_POST[‘uusername’]] and $_POST[‘password’]) are present, with a simple if
Dynamic Web Pages www.dynamicwebpages.de sex could not be better | dynamic web pages - german php.node
news . scripts . tutorials . downloads . books . installation hints
44
FEATURE
Building a MySQL Database Abstraction Class
block. If they aren’t, we issue a horrible black-on-white message—this is just for demonstration though. On a real project, we would format the output nicely, or a template for handling errors. $mysql = &new mysqlx(“localhost”, “root”, “qwerty12345”, “users”); $mysql->query(“SELECT id FROM users WHERE username=’”. $_POST[‘username’] .”’, password=’”. md5($_POST[‘password’]) .”’”, false);
The above section of code creates a Mysqlx object, called $mysql, and connects to a MySQL server. The second part runs a querym using the query() method of Mysqlx. We set the type parameter to false, because we are requesting data and not inserting it. We use an MD5 hash, here, but if you where using the SHA-1 hashing method, you would need to change the md5() call to sha1(). In the listing, I have commented some things that you could do at this point, such as register session variables and redirect the page. But, as this is an example, I did not include the code for these things. The script then goes on to check the _length property, to see if any results where returned from the query (i.e. the username and password were found in the database table). By extending the if block with an else clause, we see if _length has a value of 0. Earlier, I men-
April 2005
●
PHP Architect
●
www.phparch.com
tioned that the user needs to be informed that she is not allowed in because her password and/or username are incorrect. Remember that we don’t want to make it too grim because the user maybe a bit inexperienced at using web based forms, or could think that their information was 100% correct. $name = &new mysqlx(“localhost”, “root”, “qwerty12345”, “users”); $name->query(“SELECT username FROM users WHERE username “.”LIKE ‘%”. $_POST[‘username’] .”%’ LIMIT 1”); if($name->_length != 0){ echo ‘, did you mean <strong>’. $name->username .’?’; }
Above, we use another MySQL query we look for possible usernames that are similar to the one provided by the user, and if one is found, we display it. It’s also a good practice to display a link to a password reset script, but I’ve left that our for the purposes of this article. We create a new Mysqlx object for this query. I always create a new object for each query, so it’s easy to see how many queries are executed and what I am doing with them—it’s not a good idea to bundle two results together. If inclined, you could further modify the Mysqlx class by adding a method that clears all proper-
45
FEATURE
Building a MySQL Database Abstraction Class
ties from the object, and frees all encapsulated result sets. Problem 2: Displaying the Most Recent Rows in a Table Often, I find that I need to fetch the last three entries in a given table, for example to display news headlines. Although this is actually done in the SQL query, I felt that it was so common that I should include an example of using the Mysqlx class to do this. Take a look at Listing 6 and you’ll see how I did it. The main work is performed by the following query: SELECT DATE_FORMAT(created, ‘%W %b %D %Y created, headline, body, author FROM news ORDER BY created DESC LIMIT 3
%l:%i %p’) AS
We then check the [__length] property to determine if any rows were returned, and if there was only one row, we add that to the $display variable. Because Mysqlx returns the results as just variable properties rather then arrays for one row, we must check to see how many the query returned. Next, we use a for loop to iterate over the entire array, again by using the _length property of the $newest object. Because the properties of $newest are now arrays, we need to reference them in some way. Here, I use the increment variable of the for loop, $i, for each property in the object. I also thought it would be nice to give the option of viewing all the articles within the database table as long as there were more then three rows in the table. Take note that the link’s destination is fabricated, as I have not coded this section—it’s just an example. $more = &new mysqlx(“localhost”, “root”, “qwerty12345”, “my_site”); $more->query(“SELECT COUNT(id) FROM news”);
The query that I used, here, should work properly for both MyISAM and INNODB table types, because it counts up an often-cached index (in this case id). If your table type is INNODB, you must provide a parameter (and not COUNT(*)) for the COUNT part of the query, otherwise you will may get the expected results–doing this on MyISAM is fine though. Where to Go From Here? In conclusion, we have created a practical, working MySQL class which enables us to do some powerful stuff; we can be sure all of our queries will execute successfully, without worrying about errors because that’s all handled by the class itself. Also, you will know if an error occurred, right away, because of the email notification. This class works with both the MyISAM and INNODB types, so all you need to worry about is that specific
April 2005
●
PHP Architect
●
www.phparch.com
queries are ok for the referenced database table. I, personally, write specific classes for all of my applications— I believe it is good practice to adapt this class or any class to your own specific project. If, for instance, you have a project which needs to check if a certain value in a database table has been set, you can write a generic function that uses this class to do that. I know that since I have started using this class, my code has started to look a lot clearer: there is far less code that deals directly with MySQL, and it all appears to “just work.” The best practice I have found for using this class is to create a file called common.php, and require_once() the Mysqlx class file, and then have that common.php included on every page, on the first line. In the commonly included code, you can have such things as session_start() to be sure that sessions will be accessible on all of your pages. I know that it seems like this goes against the practice of avoiding a common file to declare such things as the database connection parameters, but this is the only way I have found to ensure that your classes and sessions will be accessible on any page with minimum hassle. You can now start changing the way you code using MySQL in your applications, and—if you modify the class itself–how you use a database server in PHP, completely. Being able to make sure errors are accounted for and having an object that contains your query results is always incredibly useful when programming large or small scale projects. You can get things done more quickly and be more confident in the stability of your applications. If you take the class any further and add any more features, please get in contact with me via
[email protected] and share your achievements with me. Additionally, if you have any questions, or problems with the class, please feel free to drop me a line!
About the Author
?>
Tom is completely self taught and dropped out of school so he could learn PHP, Actionscript, XHTML and CSS. He lives in Somerset, UK and has worked as a freelancer for Web design Studios in the UK for the last 2 years. Tom can be contacted via
[email protected], and he also runs a online portfolio/blog site which can be found at http://www.titbread.co.uk
To Discuss this article:
http://forums.phparch.com/216 46
TEST PATTERN
T E S T
P A T T E R N
Spring Cleaning by Marcus Baker
“Design and programming are human activities; forget that and all is lost.” - Bjarne Stroustrup
C
ode rots. A strange thing to say about a pattern of electrons, but it’s true. You might think that all you have to do is leave the program alone in a corner untouched to keep it squeaky clean. The trouble is that a program that is useful talks to the outside world, and well, the outside world changes all of the time. That means patches, fixes, workarounds and a steady build up of confusion. Soon the original elegant design has been consumed and the code is rotten to the core. It works, but it’s still become a mess. Why is that a problem if it still works? The code cannot stand still, so that means it must continue to accept patches. Patches that are easy to add to clean code take ages when that code is tangled. If the code cannot keep pace with these changes, it becomes a burden. It will likely get rewritten if the process is not reversed, quickly. I am going to start with a real world case study. On one of my current assignments, we have used a wellknown XML-RPC library for about eighteen months. As a public library it should be pretty complete, right? Tested by lots of users and pretty much bug free? Well it wasn’t bad, but we’ve discovered a few glitches and changed our architecture around it in that time. Unfortunately, because the code had no unit tests we’ve been disinclined to tidy it properly and instead patched it as we go. This is not good practice I’ll admit,
April 2005
●
PHP Architect
●
www.phparch.com
not to say embarrassing, but it was an external library after all and so the clean up kept being put on our “to do” list. Patches have now been layered on patches to the point where each patch takes longer to create and test. So long, now, that it is interfering with our main work and the developers dread dealing with it. We’ve decided that it’s become so bad that we are going to have to hire someone to rewrite it at great expense. At least there is some good news for you, the reader. In its current state, it gives us a great record of how much code changes, even when it shouldn’t—I had a look in CVS. Here is the list of patches we have made to just the server part of the library... • Allow selection of UTF-8 or ISO-8859-1 transfer by client request • Fix bug that lost the type in complicated structures
REQUIREMENTS PHP
4
OS
Any
Other Software
None
Code Directory
spring
48
TEST PATTERN
Spring Cleaning
• Allow type hinting to distinguish arrays from structs when empty • Fix bug where struct keys were not XML encoded • Strip illegal control codes from requests to prevent parser crashes • Correctly detect illegal characters from the XML parser • Suppress spurious PHP error output making XML packets unparsable • Add an error handler to give correctly formed XML with PHP fatals • Separate the act of serving the request from construction • Add a setter for the socket timeout option • Add full request and response header debugging on a bad packet • Properly close socket at end of request rather than script termination That’s twelve patches, or about one every six weeks, in an external library of five classes that we planned never to touch. Or from another angle, two to three patches for each class. All of these have made the code unmaintainable, and this is a backwater to our code base. If things can get this bad with a pre-packaged module, what would it be like with our own code in production and still in a state of churn? Code doesn’t just rot, it rots fast. Breaking Out the Elbow Grease Spring is a traditional time of cleaning. To demonstrate the process of dusting off code I need to start with an example, and Listing 1 is festering nicely. According to the comments it used to count all the files inside a directory. A casual glance and it all looks normal, but when you try to follow a path through the count() method, it suddenly becomes hard work. Stepping through the code, it seems that this class can count the number of lines in those files and even pattern match against the filenames, as well. Times must have changed and even the interface is now awful. To count all of the lines of files in a directory that end in “.txt” I have to do this: $counter = &new DirectoryCounter(‘.’,true); $counter->count(‘\.txt’);
This usage is hardly expressive. I defy anyone to figure out what is going on from this without having to look at the code underneath. Encapsulation is more than just hiding variables: if code is encapsulated, the big win is that you never have to look at it at all. Here it’s ruined by bad naming and mysterious parameters, and it’s spreading confusion to the caller. What makes code rotten? Really anything that makes it difficult to work with. What makes code difficult to work with? That’s a psychology career in itself, but as
April 2005
●
PHP Architect
●
www.phparch.com
developers, we are exposed to dodgy code every day and often know it when we see it. In my case, it’s all too often the code I wrote myself, just yesterday. If in doubt, ask your colleagues what they think while you type. There are some rules of thumb that can help as well. The way to tackle the ugly stuff is to go in small steps. After all, if you introduce any new bugs the code, quality won’t help you in finding them, so you want to go slow and steady. Let’s start with something really simple: variable names. There are some single letter variable names in the code that explain nothing. About the only time a single letter is OK is in this idiom: for ($i = 0; $i < $limit; $i++) { }
The only reason we can get away with it in this loop, is that this is so familiar it is almost part of the language. Everywhere else, short names will be a problem. For Listing 1 1
49
TEST PATTERN
Spring Cleaning
this reason I’ll replace $f with $file, $d with $directory_handle, and also expand any other abbreviations. The next problem is that there are some stray comments that used to be print statements. This is obviously debug code that wasn’t cleaned up. The fact that there was so much debugging that the developer decided to leave it in, doesn’t inspire confidence, either. Comments can tell you a lot about a project, although in this case, what they tell is not necessarily the message that the author intended. Of course, we strip them. We also strip the “Added 12/12/03” comment, as we can look that up in the version control system, and especially the “Is there a faster way of counting lines in a file?” comment. Great words when posted to a newsgroup, but not much use when we are trying to understand the code. The results of these and other purges are in Listing 2. The code is starting to express intent, but that was the easy part. Refactoring We haven’t actually changed the meaning of the code, yet. Before I do that, I like to wrap the legacy code in regression tests to make sure I don’t break anything. I am going to skip this part, for the sake of brevity, but in reality, after every step I would be running a test suite. Just imagine that part. One barrier to an overall understanding is clever code. Fancy one liners have their place, but if they are
mixed liberally with the main code, they just serve to obfuscate. With this in mind we’ll move the constructor regular expression into its own method, called ensureTrailingSlash() . function _ensureTrailingSlash($path) { return preg_replace(‘|/$|’, ‘’, $path) . ‘/’; }
This trick, pulling a lump of code into its own method, is known as “extract method.” This name comes from the classic book, Refactoring by Martin Fowler. What is refactoring? When you break down a problem into functions and methods, you are said to be factoring the code into a design. When we refactor, we change that breakdown, but without changing the external functionality. Refactoring is making small internal design improvements without having to change the behaviour, as a whole. This is a fundamental technique in cleaning up code without introducing new bugs. Another source of complication is excessive branching, not just with if blocks, but also with boolean expressions within them. We can use “extract method” again to pull some of this code into its own blocks. A couple of other tidy-ups we can make are to remove the double setting of _count_lines to false, and we cannot get an empty filename, so the main loop condition can be simplified. Listing 3 contains the much improved version. Slowly, the underlying design is becoming more visible.
Listing 2 1
April 2005
●
PHP Architect
●
www.phparch.com
Real Life Flexibility Now that the code is a little easier to understand, internally, the problems with the interface are becoming more glaring. First, what on earth is going on with that Perl regular expression being split into a separate pattern and flags? Usage looks like this: $directory_counter->count( ‘\.php’, ‘i’);
I assume it started life as just an expression with the delimiters added automatically, but then, the developer realized that, to make a case insensitive search, the i flag was needed. Rather than refactor the code so that the caller now had to pass the enclosing delimeters as part of the string, he simply added an optional parameter. The result is a bit of a mess. The quickest solution is not always the simplest. To fix this, we would have to refactor the main code, as well as the class we are working on. Just as before I would only attempt this with the help of tests to catch any mistakes, and just like last time, I am going to ask you to imagine this very important step. It’s getting difficult to recommend other changes without seeing the rest of the code. A class usually has an interface that is designed for the application in which it is needed, and we cannot see the rest of that application. Despite this difficulty, I can make sugges-
50
TEST PATTERN
Spring Cleaning
tions that not only make the code clearer, but also make it more flexible. Hopefully I am hitting the target regardless of the application design. One change that I definitely want to make is to move the behaviour that changes the $_count_lines flag out of the class and into the method where it’s applied. I have a grander plan in mind, here, but let’s take stock with Listing 4. I would run my tests, again, at this point, but of course you didn’t forget that. The code is starting to come under our control, now, and we can start to be more ambitious. Programs are flexible because they can be changed. That change though, has to be carried out by a human being and it doesn’t matter how elegant the design is, if no one on your team understands it, it’s still inflexible. My next step turns the object oriented dial up a notch. If your development team is not used to this highly factored style of development, then the next stage won’t seem like much of an improvement. That’s fine. We just want easy to understand code so that your own team of developers can safely make changes. You choose the direction for your own project. Clean Code is Cohesive A couple of things have been bothering me from the Listing 3
Listing 4
1
April 2005
●
PHP Architect
start of this exercise. The first is the length of the count() method. The human brain has a limited short term memory, typically being able to hold about five to nine things at once. It’s no surprise then that methods longer than about five lines are hard to comprehend at a glance. We’ve managed to get the count() method to tell us the story of how it works, but there is still too much in it when we have to stare so hard. The other problem is the name of the class. Nothing is more important than naming. Again, for encapsulation to work we have to be able to treat items as black boxes. Otherwise we haven’t reduced our mental workload because we have to remember our own code, plus the code of the class we are using. That’s error prone. When methods and classes are well-named we can forget about their internals and concentrate on the job in hand. Our class does several jobs and so there is no name that isn’t confusing, hence the poor compromise of DirectoryCounter. What to call it? After all it’s not counting directories, but is it counting files or lines? When a class does more than one job it is said to lack cohesion. Our class design is bad; it does a whole bunch of jobs. First, it handles the recursion through the directo-
●
www.phparch.com
1
51
TEST PATTERN
Spring Cleaning
ries. Next, it handles the decision to count lines or files. That decision is made by the caller, but it is passed to us as a flag. Why do we have to have all of the machinery for this when we don’t even make that decision? Anytime the decision and the code to carry it out are separated, we have to pass information around. Again, that is complication we could do without. Finally our class does the actual counting of the score. Yuck! In the February issue, we discussed ways to add flex points, including the Strategy pattern. Here, we are going to use the same trick, but this time we are going to use it to tease apart the roles of this class. Instead of passing in a simple flag for the counting policy, we will instead pass in some kind of Reader. We will place all code that is currently in DirectoryCounter into the Reader. What happens to a directory counter when the counting is taken away? You end up with Listing 5. Sadly Directory is a reserved word so we have had to rename our class: DirectoryScanner. Usage has become slightly more complicated because the mechanics of Listing 5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57
2 3 7 8 php|architect :: home 9 10 11 php|architect 12 The Magazine for PHP Profesionals 13 14
15 Welcome 16 username, 17 you have 18 42 19 unreaded mails. 20
21
22
23 24
64
FEATURE
An XML approach to Templating using PHPTAL: Part II
xgettext —add-location —output=example5.pot example5.php
Which generates example5.pot. The problem now is that we have two .pot files and we have to merge them. We have to open example5.pot and assign it the UTF-8 encoding before we can do anything with it. Afterwards, we can run the following command: msgcat —output-file=new.pot example5.pot default.pot
And we’ll have all our strings stored in new.pot. You Listing 7 1 # Spanish translations for PACKAGE package. 2 # Copyright (C) 2005 THE PACKAGE’S COPYRIGHT HOLDER 3 # This file is distributed under the same license as the PACKAGE package. 4 # Automatically generated, 2005. 5 # 6 msgid “” 7 msgstr “” 8 “Project-Id-Version: 0.1.0\n” 9 “Report-Msgid-Bugs-To: \n” 10 “POT-Creation-Date: 2005-03-01 12:32+0000\n” 11 “PO-Revision-Date: 2005-03-01 12:40+0000\n” 12 “Last-Translator: José Pablo Ezequiel Fernández Silva \n” 13 “Language-Team: Spanish \n” 14 “MIME-Version: 1.0\n” 15 “Content-Type: text/plain; charset=UTF-8\n” 16 “Content-Transfer-Encoding: 8bit\n” 17 “Plural-Forms: nplurals=2; plural=(n != 1);\n” 18 19 #: example5.php:25 20 msgid “A multilingual string” 21 msgstr “Una cadena multilingual” 22 23 #: example5.php:26 24 msgid “The second multilingual PHP string” 25 msgstr “La segunda cadena PHP multilingual” 26 27 msgid “php|architect” 28 msgstr “php|architect” 29 30 msgid “php|architect :: home” 31 msgstr “php|architect :: principal” 32 33 msgid “php|a’s logo” 34 msgstr “logo de php|a” 35 36 msgid “The Magazine for PHP Profesionals” 37 msgstr “La Revista para Profesionales de PHP” 38 39 msgid “Welcome ${name}, you have ${number} unreaded mails.” 40 msgstr “Bienvenido ${name}, usted tiene ${number} mensages sin leer.” 41
Listing 8 1 2 3 4 5 php|architect :: principal 6 7 8 php|architect 9 La Revista para Profesionales de PHP 10 11
Bienvenido pupeno, usted tiene 103 mensages sin leer.
12
Una cadena multilingual
13
La segunda cadena PHP multilingual
14 15
April 2005
●
PHP Architect
●
www.phparch.com
may be tempted to use the standard cat command, but msgcat provides parameters specific to .po files, like —less-than=2, which will prevent any duplicated strings from showing up in the output. We now place new.pot in the locale directory, renaming it to example5.pot in the process. Instead of calling msginit, we’ll let I18NFool do the job for us by running: i18nfool-update
On the locale directory. This will produce a .po file for every language; its output will be: * /usr/bin/i18nfool-update working on en - updating domain ‘example4’. done. - creating domain ‘example5’ Created en/LC_MESSAGES/example5.po. * /usr/bin/i18nfool-update working on es - updating domain ‘example4’. done. - creating domain ‘example5’ Created es/LC_MESSAGES/example5.po.
Oops—it also picked up example4! i18nfool-update will pick every .pot file and generate a .po file for each language directory available. We can now translate the es/LC_MESSAGES/example5.po file (as you can see in Listing 7) and then generate the .mo files by running: i18nfool-build
Which will, again, process everything: * * * *
building building building building
./en/LC_MESSAGES/example4.mo ./en/LC_MESSAGES/example5.mo ./es/LC_MESSAGES/example4.mo ./es/LC_MESSAGES/example5.mo
Don’t worry about having to provide “translations” for English—the English output will be “automagically” right, since the strings will just be copied as-is. All we have to do now is run the application (just view example5.php with a browser) to get the output we see in Listing 8 (strings in Spanish). PHPTAL vs. the World Let’s compare PHPTAL’s syntax to two other popular solutions: Smarty and PHP itself. Replacing variables With PHP:
If you have short tags, if not:
With Smarty:
{$string}
65
FEATURE
An XML approach to Templating using PHPTAL: Part II
With PHPTAL:
Example
All three cases respect the (X)HTML standard, but only PHPTAL provides a way for the graphical designer to interact in a friendly way with his own work (instead of having to deal with cryptic source code). In this specific case, it may be important to write a long paragraph to see how the page flows, and this is only doable with PHPTAL. Looping With PHP:
With Smarty:
{foreach from=$list item=item} {$item} {/foreach}
With PHPTAL:
cryptic code. If we had lots of conditionals, however, the output in an editor or browser would be almost useless, however, since you’d see lots of contents with no rhyme or reason, since the conditionals cannot be evaluated. PHPTAL has another drawback here—there is no else clause. We would need to write another paragraph and check for the reverse condition, while Smarty and PHP both provide built-in functionality for this purpose. Conclusion Even with its drawbacks, I can honestly say that I have never used a better solution than PHPTAL (or TAL). It recently saved me in a project that was turning into chaos with Smarty. Just as a test, I’ve tried using PHP as the template system and it was even more chaotic than Smarty. That inspired me to write this article. PHPTAL is a rather new project that is not yet very well-known yet (there aren’t a lot of developers in the PHP world who come from Zope and already know TAL), but its popularity is increasing. Although it has some sharp edges that need to be polished, the development team is at work smoothing the rough edges quite rapidly. During the writing of this article series, I’ve found some bugs in PHPTAL and Laurent Bedubourg (PHPTAL’s maintainer) addressed them promptly. I believe in XML, I like XSLT because it allow us to move from one format to another and, in that scheme, TAL makes a lot of sense.
Yet again, only PHPTAL provides the designer with nice sample content. In this particular case, however, PHPTAL also avoids introducing invalid data inside a table (such as characters inside the table tag but outside of a table column), something that can confuse a browser or a WYSIWYG editor. Conditionals With PHP:
Welcome John
With Smarty: About the Author
{if $name eq “John”}
Welcome John
{/if}
With PHPTAL:
Welcome John
In this example, the fight is more even, with no clear winner. PHPTAL offers a shorter solution that doesn’t show the graphical designer or the content manager
April 2005
●
PHP Architect
●
www.phparch.com
?>
José Pablo Ezequiel Fernández Silva, who is also known as “Pupeno,” is a software developer who has been building web sites since 1997 using Apache, CGI, PHP, MySQL, Zope, Python and Plone, among other tools. He has developed desktop applications, and spoke at conferences and Free Software events. Pupeno has also participated in the creation of a Linux-based set-top box for DVDs. He is currently researching better concepts for graphical interfaces and desktops.
To Discuss this article:
http://forums.phparch.com/215 66
SECURITY CORNER
S E C U R I T Y
C O R N E R
Security Corner
BBCode by Chris Shiflett Welcome to another edition of Security Corner. This month’s topic is BBCode, a format used in many PHP applications in order to allow users to format content. While BBCode can potentially offer a more simplistic markup vocabulary than HTML, it does nothing to help prevent cross-site scripting (XSS). Because this is such a common misconception, I have decided to explain this in more detail.
Markup Basics There are many ways to mark up content. In a plain text environment, there are some common forms of markup that have been adopted specifically for the purpose of being easy to interpret by a human. Examples are *bold*, /italics/, and _underline_. The markup format most familiar to web developers is HTML. The same examples in HTML are bold, italics, and underline. BBCode introduces a new vocabulary, and unfortunately there is no standard to which developers can adhere. However, the most simplistic elements are consistently implemented. Examples include [bb]bold[/b], [i]italics[/i], and [u]underline[/u]. If the BBCode vocabulary were limited to these simplistic elements, it would offer very little benefit over HTML, unless you happen to think square brackets are more user-friendly than angled brackets.
HTML Versus BBCode In order to assess the advantages of BBCode, it is best to compare and contrast the differences in implementation between allowing users to enter a subset of
Note: rather than use an existing solution such as PEAR::HTML_BBCodeParser for this discussion, I perform a manual translation of the markup in order to make the comparison as controlled as possible.
April 2005
●
PHP Architect
●
www.phparch.com
HTML versus a subset of BBCode. As regular readers of Security Corner know, input must always be filtered. When you’re allowing users to enter very complex data, creating a whitelist of acceptable characters can be very difficult. Because of this, many developers employ very weak filtering rules for such input and rely on the escaping performed by htmlentities() for protection. While htmlentities() can save you from poorly filtered data, relying on escaping alone is not ideal. Because an attacker can send any type of data, it’s equally unwise to rely on BBCode for protection—you can’t assume that the attackers will abide by your rules unless you enforce those rules in your programming logic. To better illustrate these points, consider a simple form that allows anonymous users to provide a comment:
Comment:
From a security perspective, the major difference in implementation is when the output is escaped and presented as part of the page, so that users can view previous comments. If a subset of HTML is allowed, the implementation is to escape every character and deliberately remove the escaping on certain characters, allowing them to be interpreted:
example, I might intend to bold something: This comment is bold.
Instead, I might intend to tell someone else how to bold something in HTML: You bold things like this: bold.
If the same markup is allowed, but only with BBCode, this example becomes the following:
As you can clearly see, there is very little difference in the treatment of this data. While some might argue that using BBCode allows you to use strip_tags() to eliminate any HTML, it’s important to realize that this is no safer than if strip_tags() were used with the second optional parameter that allows some HTML tags. It has a few notable weaknesses: 1. It is a blacklist approach. 2. It violates the security principle that says that invalid data should not be modified in order to make it valid. 3. It is not as exhaustive as htmlentities(). 4. It does not consider character encoding.
If BBCode were allowed, it would be easier for a user to distinguish between these two scenarios. Without BBCode, the user has to enter a comment like this: You bold things like this:
bold.
After htmlentities() is performed the first time, this becomes: You bold things like this: bold.
Therefore, the bold tags will not get translated back, but they will be displayed in the browser as the user intends. This is where the gap in user-friendliness becomes clear, and I think this is the strongest case in favour of implementing BBCode. Of course, any time users want to explain to other users the actual markup vocabulary used in the comments, the situation is going to be slightly complicated using either approach.
Until Next Time... Hopefully, you now realize that BBCode is not something that increases the security of your application in any way. It can, however, offer some advantages over a subset of HTML. The appropriate choice depends upon your own needs and the opinions of your users. Choose whichever method best suits you, but don’t fool yourself into thinking that security has anything to do with the decision. Until next month, be safe.
There is, in fact, no security benefit to allowing BBCode versus a subset of HTML.
Why Use BBCode? BBCode isn’t entirely useless. Some BBCode markup can potentially be easier for users to remember and understand. For example, consider using a red font: [color=red]red text[/color]
There isn’t an HTML equivalent that’s quite this intuitive. Of course, this could be made just as easy with something that closely resembles HTML markup:
About the Author
?>
Chris Shiflett is an internationally recognized expert in the field of PHP security and the founder and President of Brain Bulb, a PHP consultancy that offers a variety of services to clients around the world. Chris is a leader in the PHP industry, and his involvement includes being the founder
of
the
PHP
Security
Consortium,
the
founder
of
PHPCommunity.org, a member of the Zend PHP Advisory Board, and an author of the Zend PHP Certification. A prolific writer, Chris has regular columns in both PHP Magazine and php|architect. He is also the author
red text
of the HTTP Developer's Handbook (Sams) as well as the highly antici-
Another potential advantage of BBCode is that it helps to eliminate collisions between HTML that users want to be interpreted and HTML that they do not. For
April 2005
●
PHP Architect
●
www.phparch.com
pated PHP Security (O'Reilly). You can contact him at [email protected] or visit his web site at http://shiflett.org/.
69
You’ll never know what we’ll come up with next For existing subscribers
NEW
Upgrade to the Print edition and save!
LOWER PRICE!
Login to your account for more details.
php|architect
Visit: http://www.phparch.com/print for more information or to subscribe online.
The Magazine For PHP Professionals
Address: _________________________________________ City: _____________________________________________ State/Province: ____________________________________
E!
RIC
RP
NE
W
*US Pricing is approximate and for illustration purposes only.
WE
Name: ____________________________________________
Your charge will appear under the name "Marco Tabini & Associates, Inc." Please allow up to 4 to 6 weeks for your subscription to be established and your first issue to be mailed to you.
Choose a Subscription type:
Canada/USA International Air Combo edition add-on (print + PDF edition)
LO
php|architect Subscription Dept. P.O. Box 54526 1771 Avenue Road Toronto, ON M5M 4N5 Canada
$ 77.99 CAD $105.19 CAD $ 14.00 CAD
($59.99 US*) ($80.89 US*) ($10.00 US)
ZIP/Postal Code: ___________________________________ Country: ___________________________________________ Payment type: VISA Mastercard
American Express
Credit Card Number:________________________________ Expiration Date: _____________________________________
Signature:
Date:
*By signing this order form, you agree that we will charge your account in Canadian dollars for the “CAD” amounts indicated above. Because of fluctuations in the exchange rates, the actual amount charged in your currency on your credit card statement may vary slightly.
E-mail address: ______________________________________ Phone Number: ____________________________________
To subscribe via snail mail - please detach/copy this form, fill it out and mail to the address above or fax to +1-416-630-5057
EXIT(0);
Old School, New School, NO SCHOOL by Marco Tabini
e x i t ( 0 ) ;
E
very now and then, my overcrowded inbox welcomes a mail from a developer who comes across a copy of php|a at their local newsstand or on the desk of one of their colleagues. The message usually starts along the line of “I’m an old-school programmer— still working on Cobol applications at my bank—and I wonder whether I’ll be able to pick up PHP and become a web developer?” I confess that I am always at a loss for words—to the point that I often let an unforgivable amount of time lapse before I manage to put together a response that doesn’t sound like the rambling of an idiot—or, at least not of a complete one. The trouble is that I consider myself “old-school,” although not nearly as old school as the guy who got started on computers when a debugging cycle on a productionready application started with a stack of perforated cards. When I started dabbing in computers, items like memory and storage space came at such a premium that you literally had to watch your bytes. I still clearly remember saving for months from my allowance so that I could upgrade my Apple II+ to 48kB of RAM (which was, of course, the maximum it could handle without an expansion card). Despite the fact that when I tell these stories I am often looked at like some sort of Computersaurus
April 2005
●
PHP Architect
●
fossil, I am quite happy that things turned out my way. Because of the limitations of the technologies I needed to work with, I have a nasty habit to try and find simple solutions to any problem that I find those who got their start more recently lack. Why worry about bytes when you have megs at your disposal? I’ll tell you why—for the same reason why children shouldn’t be allowed to get within two hundred feet of a calculator until they’ve mastered at least the basic operations and understood how the others work. Only then a calculator becomes a convenience tool rather than a substitute for your own intelligence. Similarly, dealing with the severe limitations of an older computer teaches a developer to find a solution that is as optimal as possible, rather than one that “just works.” That’s why I never know how to answer the old-school-developer questions. In my mind, their experience is an advantage, rather than a liability. Learning to write code in a new language is relatively easy— but learning how to write good code is not. Of course, every programming environment has nuances that one must be aware of—if you’re used to writing PHP code and want to learn C, pointers are probably going to be one of the biggest stumbling blocks you’re going to have to overcome. The great difficulty in being an
www.phparch.com
old-school developer is resistance to change. Unlike what happened even as late as ten years ago, computer development today is not about functionality for the sake of solving a problem any longer, but about making the problem-solving capabilities of a computer accessible to as many and in as convenient a way as possible. Therefore, while computers in general are in constant evolution, this evolution takes place primarily on the user interface, as opposed to the functionality. Think about it—how many “innovative” applications in which the innovation wasn’t limited to a better user interface were developed over the last three years? Of course, that’s not to say that the user interface isn’t important. It’s what lets us drive a car without being Michael Schumaker, make a phone call without being Antonio Meucci and write a computer program without being John von Neumann. The fact that this is where computers are making giant steps these days, however, highlights the importance of a solid basic understanding of computer science—that “write a spreadsheet application that fits in 10kB of memory” je-ne-sais-quoi—that so many today must force themselves to discover, but that people who started out with the limited computers of yesteryear had no choice but to deal with. php|a
71