This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Zend Studio 3.0 is the official PHP IDE of php|cruise
We’ve got you covered, from port to sockets.
?>
php | Cruise
Port Canaveral • Coco Cay • Nassau
March 1st - March 5th 2004 ENJOY LEARNING PHP IN A FUN AND EXCITING ENVIRONMENT—AND SAVE A BUNDLE! Features
Visit us at www.phparch.com/cruise for more details. Andrei Zmievski - Andrei's Regex Clinic, James Cox - XML for the Masses, Wez Furlong - Extending PHP, Stuart Herbert - Safe and Advanced Error Handling in PHP5, Peter James - mod_rewrite: From Zero to Hero, George Schlossnagle Profiling PHP, Ilia Alshanetsky - Programming Web Services, John Coggeshall Mastering PDFLib, Jason Sweat - Data Caching Techniques Plus: Stream socket programming, debugging techniques, writing high-performance code, data mining, PHP 101, safe and advanced error handling in PHP5, programming smarty, and much, much more!
php | Cruise
Conference Pass
$ 899.99**
Hotel
Included
Meals
Totals:
Traditional PHP Conference* $ 1,150.00
($ 400.00)
Included***
$ 899.99
($ 200.00)
$1,750.00
You Save $ 850 * Based on average of two major PHP conferences ** Based on interior stateroom, double occupancy *** Alcohol and carbonated beverages not included
TABLE OF CONTENTS
php|architect Departments
6
Editorial
Features
9 Object Overloading in PHP
7
by Alessandro Sfondrini
What’s New!
I N D E X
15 40
Introduction to Version Control with CVS
Product Review PHPEclipse by Eddie Peloke
58
by Dejan Bosanac
25 Product Review
Introduction to PHP-GTK
PhpED 3.2.1 by Marco Tabini
by Eric Persson
30 61
Speaker on the High Seas
Tips & Tricks
An Interview with Wez Furlong
By John W. Holmes
33 65
67
Book Reviews
An Introduction to SQLite
by Peter MacIntyre
by John Coggeshall
44
Bits & Pieces
Working with PEAR::XML_Serializer
Real. Interesting. Stuff. by Peter James
by Stephan Schmidt
52 69
exit(0); Frequently Annoying Questions By Marco Tabini
November 2003
●
PHP Architect
●
www.phparch.com
Implementing Web Server Load Management by Rodrigo Becke Cabral
3
! W E N
Existing subscribers can upgrade to the Print edition and save! Login to your account for more details.
php|architect
Visit: http://www.phparch.com/print for more information or to subscribe online.
Your charge will appear under the name "Marco Tabini & Associates, Inc." Please allow up to 4 to 6 weeks for your subscription to be established and your first issue to be mailed to you. *US Pricing is approximate and for illustration purposes only.
*By signing this order form, you agree that we will charge your account in Canadian dollars for the “CAD” amounts indicated above. Because of fluctuations in the exchange rates, the actual amount charged in your currency on your credit card statement may vary slightly. **Offer available only in conjunction with the purchase of a print subscription.
To subscribe via snail mail - please detach/copy this form, fill it out and mail to the address above or fax to +1-416-630-5057
EDITORIAL
E D I T O R I A L
R A N T S
I
like to consider myself a fairly adept PHP developer. I’ve been using PHP for just over two years continuously (plus massive amounts of overtime and hobby development), and have made some pretty cool applications (IMO, at least). Even so, I have yet to actually touch the vast majority of PHP extensions. And those are just the ones documented on www.php.net. Countless others have been developed that may never be profiled there. This leads me to my point. Often, the article ideas submitted to us at php|a deal with aspects of PHP development that we know little about. As technical editors, though, it is our job to ensure that the material presented here is as accurate as possible, which forces us to (as much as possible) fully understand and grasp the technology being discussed. This presents incredible opportunities for learning and growth. I really think that there are stages in knowledge. When you first start out learning something, you are ignorant and you (usually) make no bones about it. As your knowledge grows, you can quickly and easily become cocky and arrogant. This cocky and arrogant attitude usually succeeds giving you plenty of humble pie to eat, which manages to eventually settle you into the comfortable stage where you know what you’re doing, but aren’t going to make a big deal out of it. When I look back at the time I’ve been at php|a, and the articles I’ve worked with, I’m astonished at how much I’ve learned. When I started at php|a, I was definitely in the “armed and dangerous” camp. I knew it all. After a number of months working with some of PHP’s best and brightest, I now realize that I know nothing, and should just shut up. It’s unfortunate, therefore, that I must leave php|a. I’ve really enjoyed working with everybody here, as well as you readers. You’re the reason we burn the midnight oil bringing you the best that the community has to offer every month. Your feedback and comments every month help us create a better publica-
November 2003
●
PHP Architect
●
www.phparch.com
php|architect Volume II - Issue 11 November, 2003
Publisher Marco Tabini Editor-in-Chief Peter James [email protected] Editor-at-Large Brian K. Jones [email protected] Editorial Team Arbi Arzoumani Peter James Peter MacIntyre Brian Jones Eddie Peloke Graphics & Layout Arbi Arzoumani Managing Editor Emanuela Corso Director of Marketing J. Scott Johnson [email protected] Account Executive Shelley Johnston [email protected] Authors Dejan Bosanac, Rodrigo Becke Cabral, John Coggeshall, Eric Persson, Alessandro Sfondrini, Stephan Schmidt php|architect (ISSN 1705-1142) is published twelve times a year by Marco Tabini & Associates, Inc., P.O. Box 54526, 1771 Avenue Road, Toronto, ON M5M 4N5, Canada. Although all possible care has been placed in assuring the accuracy of the contents of this magazine, including all associated source code, listings and figures, the publisher assumes no responsibilities with regards of use of the information contained herein or in all associated material.
EDITORIAL tion... one that we can all be proud of. Many of our readers are also our authors, and I’ve very much enjoyed working with you as well. Your seemingly limitless practical knowledge of PHP has brought much enlightenment to the team here, and we thank you for that, as well as for the excellent articles that you offer us. Of course, php|a will continue. I mean, where would it go? Brian Jones will return as “Guest Editor-in-Chief” for the December issue, which is our first anniversary and will feature some of the best content yet (of course!), and our own Marco Tabini will do the honours again starting with January. Good luck Marco! Until then...
php|a
S T U F F
What’s New! PHP.net announced the release of PHP 4.3.4. The latest version of the 4.0 branch of PHP, 4.3.4, has been released to the public. It contains, among other things, these fixes, additions and improvements: • Fixed disk_total_space() and disk_free_space() under FreeBSD. • Fixed FastCGI support on Win32. • Fixed FastCGI being unable to bind to a specific IP. • Fixed several bugs in mail() implementation on win32. • Fixed crashes in a number of functions. • Fixed compile failure on MacOSX 10.3 Panther. • Over 60 bug fixes
N E W
For more information, visit www.php.net.
PHP 5.0.0 Beta 2 Can’t wait until PHP5? Well, we are one step closer with the latest Beta 2 release of the new version of PHP. PHP.net announced: ”This is the first feature complete version of PHP 5, and we recommend for PHP users to try it. PHP 5 is still not ready for production use! Some of the more major changes include: • PHP 5 features the Zend Engine 2. • XML support has been completely redone in PHP 5, all extensions are now focused around the excellent libxml2 library (http://www.xmlsoft.org/). • SQLite has been bundled with PHP. For more information on SQLite, please visit their website. • A new SimpleXML extension for easily accessing and manipulating XML as PHP objects. It can also interface with the DOM extension and vice-versa.
PEAR 1.3b3 PEAR has released of the PEAR Base system. The PEAR package contains: • the PEAR base class • the PEAR_Error error handling mechanism • the PEAR installer, for creating, distributing and installing packages • the OS_Guess class for retrieving info about the OS where PHP is running on
November 2003
●
PHP Architect
●
www.phparch.com
• Streams have been greatly improved, including the ability to access low-level socket operations on streams. There have been many changes since Beta 1, some of them documented in the NEWS file and most language changes are documented in ZEND_CHANGES “
6
NEW STUFF • the System class for quick handling common operations with files and directories Changes in this release include changes to the PEAR installer. Get a taste of the new PEAR at PEAR.PHP.net
PhpDocumentor 1.2.3 phpDocumentor is a JavaDoc-like automatic documentation generator for PHP written in PHP. The phpDocumentor team announces: “The phpDocumentor team is pleased to announce the release of phpDocumentor 1.2.3. This is a bugfix maintenance release. Only a few small bugs have been found and fixed. Notice: PEAR users will want to read the release notes for directions on how to automatically setup the web interface on install” Get all the files and more information from the project’s Sourceforge page at:
first beta version in a production environment. Soon, the first stable releases will be available here or at the project area in SourceForge.net. In this development period, almost 100 classes were created and tested. Almost all common features needed by a small, medium or high web application were included in the framework. However, the documentation of this first release is still poor. In a huge framework, that may be used to act like a basement to the development of one or more applications, documentation, tutorials and help stuff are very important. Because of that, as the time goes by, the complete instructions on how to build simple or complex stuff for your Web applications using PHP2Go will be available here. For now, we suggest that you make your first experience with PHP2Go. Bug reports, suggestions or any doubts will be very welcome.” Get more information or download from Sourceforge.net at: http://sourceforge.net/projects/php2go/
http://sourceforge.net/projects/phpdocu/
PHP2Go Sourceforge.net announces the release of PHP2Go, a
web development framework. The release announces: “After 12 months of development, we are very proud to announce that the first beta version of PHP2Go Web Development Framework was released. We strongly recommend that you don’t use this
:object::kitchen R4 Objectkitchen announces release number 4. What is it? Objectkitchen is a Java application used through a client/server-style connection from your application. It was written with the following goals in mind. It should be very easy to use. An average programmer should be able to start storing and retrieving object data within 15 minutes of installation.
MySQL.com announces the release of MySQL version 4.0.16. Some changes mentioned in the changelog include: • Added the following new server variables to allow more precise memory allocation: range_alloc_block_size , query_alloc_block_size , query_prealloc_size , transaction_alloc_block_size, and transaction_prealloc_size. • mysqlbinlog now reads options files. To make this work one must now specify —read-from-remote-server when reading binary logs from a MySQL server. (Note that using a remote server is deprecated and may disappear in future mysqlbinlog versions). • Block SIGPIPE also for non-threaded programs. The blocking is moved from mysql_init() to mysql_server_init(), which is automatically called on the first call to mysql_init(). • Added —libs_r and —include options to mysql_config. • New `> prompt for mysql. This prompt is similar to the ‘> and “> prompts, but indicates that an identifier quoted with backticks was begun on an earlier line and the closing backtick has not yet been seen. Get more information or download from MySQL.com.
November 2003
●
PHP Architect
●
www.phparch.com
7
NEW STUFF It should be usable from a multitude of languages, but with primary focus on PHP and Java. It should provide natural mapping of an object-oriented design. Relations between objects should just work without any extra work. It should fast enough to be usable for reasonably busy websites. The best way to discover what objectkitchen is actually about is to read the language introduction. Its a small one-page document, which provides you with an easy to read example using PHP as the client language. Get more information from: Objectkitchen.narcissisme.dk.
ADOdb 4.00 PHPeverywhere announces the release of ADOdb 4.00 The release announces: ”ADOdb 4.00 is out after a 3 month beta testing process. The distinguishing feature of this release is the performance monitoring functionality. AFAIK, it is the first Open Source cross-platform, multi-database performance monitoring and health check software in the world. It features: • A quick health check of your database server using $perf->HealthCheck() or $perf>HealthCheckCLI() . • User interface for performance monitoring, $perf>UI(). This UI displays: - the health check, - all SQL logged and their query plans, - a list of all tables in the current database - an interface to continiously poll the server for key performance indicators such as CPU, Hit Ratio, Disk I/O • Gives you an API to build database monitoring tools for a server farm, for example calling $perf>DBParameter(‘data cache hit ratio’) returns this very important statistic in a database independant manner. “ Get more information from PHPEverywhere at: http://php.weblogs.com/2003/11/05#a3100
PHP Meeting in Paris The French PHP User Group AFUP association is proud to announce the third annual PHP meeting in Paris, on November 26th and 27th, 2003. What is it? Developers and managers will gather to meet Zeev Suraski and other prominent community experts for two days of conferences, packed with solutions and advanced techniques. Get more information from: AFUP.org
PHP-GTK 1.0.0 PHP-GTK announces the release of version 1.0.0. What is it? PHP-GTK is an extension for PHP programming language that implements language bindings for GTK+ toolkit. It provides an object-oriented interface to GTK+ classes and functions and greatly simplifies writing client side cross-platform GUI applications. The release announces: “PHP-GTK Version 1.0.0 is finally out after almost a year of being in stasis. This is probably the last major version that will work with PHP 4 and Gtk+ 1.x. There might be more bugfixes, but no new features or upgrades will be implemented. PHP-GTK 2 is under development and will focus on PHP 5 and Gtk+ 2.x. “ Get more information from GTK.PHP.net.
Direction|PHP NEXEN.NET, leading portal for PHP/MySQL platform in French, and php|architect, the Magazine for PHP Professionals, have announced the immediate availability of DIRECTION|PHP, the French monthly magazine for PHP/MySQL professionals. ”The French PHP community is among the largest and the most advanced in the world.”, says Damien Seguy, editor-in-chief of Direction|PHP. “ It needed a resource of expert knowledge and indepth coverage of the industry. Naturally, that led us to partner with php|architect. “. php|architect provides a monthly technical resource for PHP professionals in the English market that includes in-depth articles, news, editorial comment, and product and book reviews. Direction|PHP is focused tightly on covering French news, and licenses part of its content from php|architect. ” Most technical knowledge is available in English, even if its authors are not native English speakers. I believe Direction|PHP will create opportunities for both French authors and companies to step up to the plate and shine “, adds Damien Seguy. Get more information about Direction|PHP, visit their homepage.
php|a November 2003
●
PHP Architect
●
www.phparch.com
8
Object Overloading in PHP
F E A T U R E
by Alessandro Sfondrini
What is overloading? Overloading is an important feature of most Object Oriented Programming (OOP) languages like Java and C++. In these languages, overloading allows the programmer to declare many methods or properties with the same name. Which one will be used when a call to that function name is made depends on the type (integer, string, etc.) and the number of arguments passed to the method. If you aren’t too familiar with OO languages, maybe a very basic example in Java will help you out. Take a look at Listing 1. When the Java Virtual Machine finds a call to method hello() of class Greet in this code, it will choose the right method depending on the number of arguments. Of course, if we try to do this in PHP, all we will obtain is a fatal error, like so:
support. One of these improvements is the overloading extension. Although this extension differs from the Java version of overloading, we’ll show how to use it to simulate that overloading in PHP. What is overloading in PHP? The “Object property and method call overloading” extension was introduced as a built-in, experimental extension in PHP 4.3.0. It is supposed to become stable in PHP 5. Its purpose is something different from the overloading present in most object-oriented languages. In fact it LISTING 1: Basic OO example in JAVA class Greet { hello() { System.out.println(“Hello World”); }
Fatal Error: Cannot redeclare hello()
For a strongly-typed language, overloading is important because it allows methods to behave in different ways depending on the number and type of the arguments received. If we overload the constructor of a class, we can obtain different objects depending on the number and type of parameters passed. Overloading is one of the most useful peculiarities of OO languages. Unfortunately (at least, in my humble opinion), PHP wasn’t born to be object-oriented. OOP is only really supported since PHP4, but still in quite a poor way. Thankfully, many improvements are being introduced, and PHP 5 will finally provide great OOP November 2003
doesn’t even allow you to redeclare a method. Instead, its purpose is to allow you to use methods (and properties) which haven’t ever been declared inside the class. It does this by offering up three special methods that will be executed when an attempt is made to access a method or property that doesn’t exist. These “magic” methods are listed below. __get():
called when trying to use the value of undefined properties __set(): called when trying to set the value of undefined properties __call(): called when accessing undefined methods.
The most useful of these three methods to us is __call(), since you’ve always been “allowed” to set and get undefined properties in PHP 4. Among other things, __call() allows us to execute built-in PHP functions or user-defined functions as if they were actually methods of the class. Overloading properties: __get() and __set() To get the value of an undefined property, we must use __get(). boolean __get([string property_name], [mixed return_value])
The property_name parameter is the name of the undefined property being accessed, and the return_value parameter is the value we will set inside the function. return_value should be passed by reference (using the special character “&”) to allow __get() to set the value. Listing 2 shows how the __get() method works. First, we set the $greet property. This is the only property of class TryGet that we can access without overloading the class. Next, we have to fill in the __get() method with the logic to determine what value ($PropValue) should be returned for an undefined property ($PropName). In this case we set $PropValue to an error message LISTING 2 1 ”; 6 7 function __get($PropName, &$PropValue) 8 { 9 $PropValue = “$PropName isn’t defined ”; // Sets the value 10 return TRUE; // Returns TRUE = doesn’t display errors 11 } 12 } 13 14 overload(‘TryGet’); // Overloads the class 15 $overloaded = new TryGet(); 16 echo $overloaded -> greet; // Prints the defined var 17 echo $overloaded -> foo; // Prints an undefined one 18 echo $overloaded -> bar; // Prints another undefined one 19 20 ?>
November 2003
●
PHP Architect
●
www.phparch.com
(“$PropName isn’t defined”), but we could also set it to FALSE, to zero, to a blank string, or to whatever makes sense. Finally, we return TRUE from __get(). This is because all of the “magic” methods should only return TRUE or FALSE. This tells the parser whether the access was successful—returning FALSE will make the parser display a notice (“Notice: Undefined property”). Before we can test this class, we must overload it using the overload() function. void overload(string ClassName)
Lastly, we try to print a defined property ($greet) and two undefined ones ($foo and $bar). The output is as follows: Hello World! foo isn’t defined bar isn’t defined
This way we never display the notice; we could also decide to return TRUE only for some properties and display the notice for the others, using an if-else control (if the property is allowed return TRUE, else return FALSE). To set an undefined property we must use __set(). boolean __set([string property_name], [mixed value_to_assign])
This “magic” method must be used together with __get() in order to work—setting a property without
being able to get it is useless. Have a look at Listing 3 for an example. The __set() method stores the value of the undefined property that the user tried to set into an associative array ($elem), and then returns TRUE. Like for __get(), returning FALSE will cause a notice. Next, we put an if-else control in method __get(). If the undefined property is stored in the $elem array, we’ll set $PropValue to that value, otherwise set the value to our error message. Finally, we return TRUE. Note that we could return FALSE instead of setting the property to our error message—in that case we could only get the properties we’ve set (trying to get undefined properties we’ve never set would cause a notice). Let’s test it out. After having overloaded the class, we set an undefined property ($foo) to 3.14. Next, we get $foo and print it. Finally, we try to get and print another undefined property ($bar) we haven’t ever set. The output is shown below: 3.14 bar isn’t defined
The first value echoed is the one we’ve just set, while the other is the error message we decided to display. Overloading methods: __call() __call() is by far the most interesting feature of overloading in PHP, allowing you to call undefined methods.
elem[$PropName] = $PropValue; // Stores the prop. and its value in $elem array return TRUE; } function __get($PropName, &$PropValue) { if (isset($this -> elem[$PropName])) // If the prop. is in $elem $PropValue = $this -> elem[$PropName]; // Assigns the value in the array else $PropValue = “$PropName isn’t defined ”; // Else sets another value return TRUE; } } overload(‘TrySet’); // Overloads the class $overloaded = new TrySet(); $overloaded -> foo = 3.14; // Sets an undefined property echo $overloaded -> foo . “ ”; // And prints it echo $overloaded -> bar; // Prints an undefined and never set property ?>
It requires three arguments: the name of the method to call, an array containing the arguments, and the return value (which is passed by reference). The syntax for __call() is as follows: boolean __call([string method_name], [array arguments], [mixed return_value])
In dealing with __get(), we decided to return an error message; with __call() we can’t do anything similar. Instead, we will look for the undefined method that was called, and execute it—there may be a function or a method of another class which has the same name as LISTING 4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
”; } class TryCall { var $accept; function TryCall() { } // Constructor is empty function __call($FuncName, $FuncArg, &$RetValue) { $this -> accept = array(“greet”, “sqrt”); // Array of valid functions if(in_array($FuncName, $this -> accept)): // If a valid function is called $RetValue = $FuncName($FuncArg[0]); // Executes the function return TRUE; // And returns TRUE else: return FALSE; // Else returns FALSE endif; }
16 17 18 19 20 21 22 23 24 25 26 27
}
overload(‘TryCall’); // Overloads the class $overloaded = new TryCall(); echo $overloaded -> greet(“John”); // Calls greet() as method 28 echo $overloaded -> sqrt(16); // Calls sqrt() as method 29 echo $overloaded -> log(10); // Calls log() as method 30 31 ?>
November 2003
●
PHP Architect
●
www.phparch.com
the one called. This “magic” method can also return TRUE or FALSE. In fact, in the next example we’ll return FALSE if the function method isn’t in the array which contains the ones allowed to be executed. Just remember that returning FALSE will cause a warning instead of a notice. Listing 4 shows an example of how to use __call(). Outside of our class we declare function greet(); we’ll need it to demonstrate that __call() also works with user-defined functions. Inside the class we’ve left the constructor empty, and declared __call(). Our __call() function sets up an array ($accept) where we store the name of the functions we will accept as methods of this class. If the method called is defined in the $accept array, we’ll call the specified function, pass in the parameter as an array element, and return TRUE. If the method is not defined in the array, we’ll return FALSE (a warning). It’s best to allow only a few functions to be called as a method (just the ones you need). Now we can test it out. After having overloaded the class, we try to call three methods: method greet(), the PHP function sqrt(), and the PHP function log(), echoing each value returned. The output is shown below: Hi John! 4 Warning: Call to undefined method trycall::log()
Notice that we called a predefined PHP function, but got a warning. This is because we decided not to accept log() as a valid method to our class. Also notice that in __call() we had to pass the parameter as an array element ($FuncArg[0]). That’s a problem, since it means we can only call one-argument functions! To fix this we can use the call_user_func_array() PHP function. Its syntax is as follows: mixed call_user_func_array (callback function[, array paramarr])
11
FEATURE
Object Overloading in PHP
call_user_func_array() allows us to call functions and methods, passing the parameters as an array. Since they are passed into __call() as an array, we can now easily call any function we want. As an example, have a look at Listing 5. Here, we set up our class and use call_user_func_array() to call the function and pass the parameter array to it. PHP will automatically execute the function, as long as the script has called the method with the right number of parameters. Note that being able to call any function as a method is not so good. It isn’t necessarily a security issue, since the call is done by a script of ours, and not an external user, but it can certainly cause some confusion. The output of Listing 5 is shown below. 12 256 11111010
As we said, we can use a regular function as a method of our class; but we can also call a method of another class as one of ours, like so:
Since an open connection to MySQL is always needed to perform any operation, we can put mysql_connect() in the constructor of the class, and store the resource link returned in a class property ($link). In __call() we decide which functions we’ll accept, and we store them in two array properties. The reason for two arrays is that some MySQL functions require the connection link as the last parameter, while others don’t. We will automatically add the link to the parameters of the ones which need it. We put the functions which need the link in $accept_l, and the ones that don’t in $accept. They are both associative arrays—the key is the method name and the value is the MySQL function name. Note that the prefix “mysql_”, which each function should have, is omitted for simplicity and will be added when we execute the call. Now comes the most important part of our class. It’s all contained in an if-elseif-else block. • If the function called as a method is valid and requires the connection link, we will add $link to the parameters array, call the function, and return TRUE
$retVal = call_user_func_array( array ( new ClassName(), $MethName), $MethParam);
That means that by using overloading, we can also easily extend a class without using the extension mechanism. Now it’s time to write a class which does something useful with the overloading extension. Overloading methods: A practice example For our example, we will create a class which manages database operation using the overloading extension. The database chosen is MySQL, a widely-used opensource database, and we will be executing queries on the following sample table: CREATE TABLE names ( id TINYINT UNSIGNED NOT NULL AUTO_INCREMENT, name VARCHAR( 30 ) NOT NULL)
The purpose of our class is to allow us to use OOP techniques to perform the most common database operations, such as: • Connecting to MySQL and closing the connection • Selecting, creating and dropping databases • Sending queries and counting the rows affected • Fetching the result data The code for this example is shown in Listing 6. November 2003
●
PHP Architect
●
www.phparch.com
• If the function is valid, but doesn’t require a connection link, we will simply call it and return TRUE • If the function is not valid, we will return FALSE (the script will return a warning) After having called a MySQL function, the script will check if it has been successful. If it hasn’t, a MySQL error message obtained by mysql_error() will be displayed. Now our class is complete—we only have to overload and test it. We create a new object, passing the connection parameters to the constructor, and then select the database. Next, we execute a SELECT query on the LISTING 5 1 2 3 4 5 6 7 8 9 // 10 // 11 12 13 14 15 16 17 18 19 20
// The constructor is empty
function __call($FuncName, $FuncArg, &$RetValue) { $RetValue = call_user_func_array($FuncName, $FuncArg); Calls the function return TRUE; And returns TRUE } } overload(‘TryCallParam’); // Overloads the class $overloaded = new TryCallParam(); echo $overloaded -> sqrt(144) . “ ”; echo $overloaded -> pow(2, 8) . “ ”; echo $overloaded -> base_convert(“FA”, 16, 2); ?>
12
FEATURE
Object Overloading in PHP
table shown above, fetching and echoing the result. Finally, we close the connection. All these operations are performed in OOP. We wrote less code than with procedural coding, and the code looks clearer. We performed only a few of the operations allowed, but you can be sure that the others work just as well. This example has shown how we can use the overloading extension to organize our code in an object-oriented manner, and simplify it. Of course the class can be saved in an external file, and included to make the code even clearer. You can use overloading to group any set of functions into a class; for instance, you could
create a CURL or Output Buffering class. Creating Java-like overloading in PHP At the beginning of this article we explained the “classic” meaning of overloading. We also explained that in PHP, we can’t redeclare a method. You might have guessed that __call(), though, can help us to do something similar. It won’t be as elegant as it could be in Java but... it will work! Imagine we have a method on which we want to allow overloading. What we have to do is to prepare different methods using the original method’s name concatenated with the number of arguments passed to
function MySQL($server, $user, $pass) { // The constructor opens the connection $this->link = mysql_connect($server, $user, $pass) or die(mysql_error());; }
function __call($FuncName, $FuncArg, &$RetValue) { /* Accepted funcs that need a resource link $this->accept_l = array(“close” => “close”, // “db” => “select_db”, // “newdb” => “create_db”, // “dropdb” => “drop_db”, // “query” => “query”, // );
link */ mysql_affected_rows() mysql_num_rows() mysql_fetch_array() mysql_fetch_row() mysql_fetch_object()
if(isset($this->accept_l[$FuncName])) { // If the function requires $link $FuncArg[] = $this->link; // Adds $link to the parameters of the array $RetValue = call_user_func_array(“mysql_”.$this->accept_l[$FuncName], $FuncArg) or die(mysql_error()); return TRUE; }elseif(isset($this->accept[$FuncName])){ /* If the function doesn’t require $link calls the function */ $RetValue = call_user_func_array(“mysql_”.$this->accept[$FuncName], $FuncArg) or die(mysql_error()); return TRUE; }else // If the function isn’t accepted return FALSE; // Returns an error } // End __call() } // End class
overload(‘MySQL’); // Overloads the class $db = new MySQL(‘localhost’, ‘user’, ‘pass’); // Creates the object $db->db(“mydb”); // Selects the db $res = $db->query(“SELECT * FROM names”); // Executes the query /* Fetchs and prints the data */ while( ($data = $db->afetch($res, MYSQL_ASSOC)) !== FALSE ) echo $data[“id”].”: ”.$data[“name”].” ”; $db->close();
// Closes the connection
?>
November 2003
●
PHP Architect
●
www.phparch.com
13
FEATURE
Object Overloading in PHP
it. This would look something like: “method name” . “argument number”
But –and this is important– we mustn’t declare a method called “method name”. If we do, __call() won’t be able to manage the call to our method (since it wouldn’t be undefined). Listing 7 shows how this all works. In method __call() we first check the number of the arguments of the undefined method call. Depending on that, we execute the “right” method. We’ve named the “right” methods as methname0, methname1, methname2, etc., depending on how many arguments they need. When we call methname(), the parser will execute __call(), which will determine the right method by appending to “methname” the number of arguments passed. Note that the method to call is indicated as array(&$this, $FuncName.$NumArgs)
That’s because it isn’t an external function, but a method of the current object ($this). After method __call() we declare all the methods we may need to LISTING 7 1 ”; 17 } 18 19 function greet1($name) 20 { 21 echo “Hello $name! ”; 22 } 23 24 function greet2($name, $from) 25 { 26 echo “Hello $name! Greetings from $from! ”; 27 } 28 29 function power1($i) 30 { 31 echo pow(2, $i).” ”; 32 } 33 34 function power2($base, $i) 35 { 36 echo pow($base, $i).” ”; 37 } 38 } 39 40 overload(‘JavaOverLoad’); // Overloads the class 41 $overloaded = new JavaOverLoad(); 42 $overloaded -> greet(); 43 $overloaded -> greet(“Peter”); 44 $overloaded -> greet(“John”, “London”); 45 $overloaded -> power(4); // It should return 2^4 46 $overloaded -> power(3, 4); // It should return 3^4 47 48 ?>
November 2003
●
PHP Architect
●
www.phparch.com
use. Now we can overload the class, and try to call the methods. The output from Listing 7 will be: Hello World! Hello Peter! Hello John! Greetings from London! 16 81
This feature can be particularly useful if we want to overload the constructor, which would allow us to build different objects depending on the number of arguments passed! Automatically overload a class As we’ve seen, to overload a class we have to write the line: overload(“ClassName”);
This is tedious, especially if the class has been created to be overloaded. In PHP 5 this is not supposed to be needed, but now it currently is, we might want to fix it. If we want to automatically overload a class, we only have to add this line to the constructor: overload(get_class($this));
That means that when we build a new object the constructor automatically enables overloading support. This is another detail which may help us to keep our code clear. Conclusion The overloading extension has got great potential. Of course you can write any application you need without using it (as you don’t need to use OOP), but it can provide an intuitive and elegant solution to some problems; for instance, it can help you to group a bunch of procedural functions (user-defined or built-in) into a class library, or to create easily-extensible classes without using the extension mechanism. Remember that the overloading extension is still experimental, and the interface and inner workings may still change drastically, but I expect that it will become a very powerful coding technique. I hope this article has helped you to find out how overloading can make your coding easier. If you have any questions about it, you can look at the PHP on-line manual (which unfortunately doesn’t treat this topic very well), ask on the php|architect forum, or mail me. About the Author
?>
Alessandro Sfondrini is a young Italian PHP programmer from Como. He has already written some on-line PHP tutorials and published scripts on most important Italian web portals. You can contact him at [email protected].
Click HERE To Discuss This Article http://forums.phparch.com/57 14
Introduction to Version Control with CVS
F E A T U R E
by Dejan Bosanac
Why is it so important? As you browse through Sourceforge (www.sourceforge.net) in search of the open-source software that you need, you’ll notice that all of the projects offer you their source through CVS. If you are not familiar with Concurrent Versions System (CVS), the first question is usually: “Why don’t they just pack the files and distribute it as an archive?” CVS has multiple uses in software development, so let’s take a few different angles and try to find out how you could benefit from it. Suppose, for a start, that you want to start an open source project. You want to have a lot of developers work together from all around the world. So the first question is: “How can we all work on the same code base and keep that repository consistent?” Let’s say that you assign two developers to work on the same file (e.g. system/lib/classes/customer.php). You give them both accounts on the development server so that they can copy files to and from their local environment. It all sounds good at the start, but sooner or later you’ll get into the following situation. Both developers get a copy of the file in order to work on different methods. After a while the first developer puts a changed file on the server, but he doesn’t remember to notify the second developer that he should collect a new version of the file. When the second developer is done coding, he puts his version of the file back on the server, overwriting the work of the first developer. OK, so you can say that communication within the team should be organized in such a manner as to help avoid this situation, and you’re right. But in this examNovember 2003
●
PHP Architect
●
www.phparch.com
ple we assumed only two developers and one file; imagine how would it be when your team expands to include tens or hundreds of developers and resources. You shouldn’t have to waste your time and energy keeping track of concurrent changes in the source—let software do the dirty work. The best way is to install a CVS server and make it public (so everyone can get the latest version of the source using an anonymous account). Then assign accounts with commit privileges to your developers so they can contribute their code to the repository. Your problem is solved. Every developer has to pull the latest changes from the repository before he can start working, and after the work is done he asks the server to synchronize the repository with his changes. If someone else has committed changes to the same files in the meantime, the developer is given the opportunity to resolve any conflicts. This way we are sure that the source repository is always consistent, and that no code can mysteriously vanish. If you are working in the local environment where your development team is changing the code directly on the server, you could suppose that this article has no value for you—but don’t jump to conclusions. CVS is
much more then just keeping the code consistent. It also offers a total history of all code changes. Have you ever found yourself in the following situation? You learn that you have to make huge changes to some portion of the code. You start working, and after a week you find yourself thinking that you would like to have the original code back because you realized that there’s a much easier way to do it. But now you don’t have that code, and it will take you a lot of time to get it back to that state (if it’s even possible in the first place). With CVS, situations like this are not as horrific. You can just pull the desired code version (or date) from the repository, and start all over again. I’ll mention another benefit of CVS, and I’m sure that once you start using it, you’ll find that you can’t live without it. Imagine the following. You find yourself doing pretty well with your project. You have a large number of clients that have purchased different versions of the software. For some of them you had to make a few customisations, and in some versions different bugs have been found. All of a sudden you’re in a no-win situation again. You could easily get lost with which code base should be modified or fixed, and you could end up delivering the wrong update package to the customer. With CVS you can easily organize your code in versions and branches, and let the software take care of that. All you need to do is to checkout the right branch and version of the code, modify it, check it in again, and update client’s distribution. Piece of cake, isn’t it! Learning by example Now we’ll go through all of the everyday steps in order to get the picture on how to use basic CVS commands. Assume that we started to work on an e-commerce software project in PHP. On the development server, the project is in the /home/web/e-commerce folder. This is known as the “working directory” in CVS terminology. Let’s imagine that we have only three files for now. This will make our examples clear, but still descriptive enough. index.php login_form.php login.php
Before we add our project to CVS, CVS should be properly installed and configured on the desired server. If you are a Unix/Linux user, then you will probably have it already on your machine. If this is not the case, get the source and all the necessary documentation from the official CVS site (http://www.cvshome.org), and follow the instructions regarding your particular system environment. Once you have installed CVS, you can start using it as a client for other repositories. If you want to use it as a source repository for yourself, you should first initialize the repository. You can do this using the init command
November 2003
●
PHP Architect
●
www.phparch.com
$ cvs -d /usr/local/cvs init
This creates a repository in the /usr/local/cvs directory. We’re now ready to use CVS in the local environment. In later sections some other interesting configuration issues for remote servers will be introduced. When using CVS as a client, we need to tell the client program where the code repository is. We can do this either by using the “-d” switch directly in the command line (e.g. cvs -d /usr/local/cvs/ login) or by setting the $CVSROOT environment variable, like so: $ export CVSROOT=/usr/local/cvs)
The first thing we should do is to import our project into CVS for tracking. $ cd /home/web/e-commerce $ cvs import -m “Imported source” e-commerce “OurCompany” start
This command tells CVS that we want to start tracking the source in this folder in the repository under the name “e-commerce” (module name). The “-m” switch is optional, and it allows us to add an appropriate comment for this action. If it is omitted, CVS will start an editor to allow us to enter the comment. CVS keeps track of all the comments that are entered with each change so that we can have a complete history of all the important notes for the resource and the project as a whole. This switch is also used in some other commands, as we will see in a moment. In the above command, “OurCompany” represents what’s known as the vendor tag. “start” is a release tag. These tags may have no role in this context, but because CVS needs them, they must be present. Now we can check what have we done. We will checkout the project from the repository to be sure that everything is fine, and start working on the source that is under CVS control. First, we should move the current project to a temporary folder. $ mv /home/web/e-commerce /home/web/e-commerce.orig
And do the checkout. $ $ U U U
cd .. cvs checkout e-commerce e-commerce/index.php e-commerce/login.php e-commerce/login_form.php
The checkout command fetches the latest version of all the files that are in the repository and puts them in the “e-commerce” folder. To be sure that this code is the same as the original code, we could do something like this: $ diff -r /home/web/e-commerce /home/web/e-commerce.orig Only in e-commerce: CVS
You’ll see that the files are exactly the same. Maybe now is a good time to remove the original source folder, just to be sure that we don’t accidentally edit it
16
FEATURE
Introduction to Version Control with CVS
instead of the files that are under CVS. Before you do this, you may want to consider backing it up somewhere, just in case. $ rm -r /home/web/e-commerce.orig
If you list the working directory now, you would see that beside the regular project files we have a “CVS” directory. It is the system directory used by CVS, and under normal conditions you should ignore it. It is used to keep information such as which repository is in use, what files and versions have been checked out, as well as all the other useful data about the current working directory’s state. We will talk about this folder in the next section, but unless you know what you are doing, keep away from it or you could damage the tracking process. CVS directories exist in every subdirectory of your project that has been checked out. We said that each time you start working on the code, you should ask CVS if any changes have occurred in the files that matter to your work. You can do this by using the update command. $ cvs -qn update
You won’t get anything in this example because we’ve just checked the project out, and all the files are up-to-date. We will see how this command works later in the article. The “-q” switch tells CVS to be “quiet” and print only important information. This is an optional switch, but it is very useful if a large number of files have been changed. The “-n” switch tells CVS not to do any real updates, but just to show us what needs to be done. If we omit it, all the files that are shown as not up-to-date would be updated according to the changes that are in the repository. If you agree that all the files should be updated, just repeat that last command without the “-n” switch. $ cvs -q update
You can update just the files that you are interested in, rather than getting all the changes. This can be done by specifying the files of interest at the end of the command line. $ cvs -q update index.php login.php
Now we are ready to start working on our source. Lets say that our index.php was like this: // TODO ?>
We will modify it so that we can show how CVS reacts to these changes, and describe further steps. Let’s add some code to the file. echo “E-commerce 1.0 - under construction”; ?>
It’s not a big progress in the project, but it should be
November 2003
●
PHP Architect
●
www.phparch.com
enough for now. If we try to see the difference between working copy and the repository, we would get something like this: $ cvs -qn update M index.php
A status of “M” means “locally modified”. It means that the local copy of the file has been changed and that those changes have not yet been committed to the repository. This would seem to be the exact case with our index.php. Some other possible statuses are: U - “update needed” - means that the local file version is not accurate and that a newer version of the file has been committed to the repository. P - “patch needed” - means that the local file version is not accurate and that file has to be patched with one in the repository. This is usually the case when the changes in the file are not big and CVS could just patch the file with the current version. It has practically the same meaning as a status of “U”. C - “conflict” - means that file could not be patched (or updated) automatically. We will see a little bit later how to deal with the conflicts. You can see the differences between the working and repository versions with the diff command. $ cvs diff index.php Index: index.php ================================================== RCS file: /home/office/cvsroot/test/index.php,v retrieving revision 1.1.1.1 diff -r1.1.1.1 index.php 2c2 < // TODO —> echo “E-commerce 1.0 - under construction”;
This line will give us the exact changes between the working and repository versions. If you look at it carefully, you can see that in the current version the text “// TODO” has been taken out, while the text “echo “E-commerce 1.0 - under construction”;” was introduced in line two. The next thing we want to do is to commit our work to the repository so other developers can use it. We can accomplish this by doing the following: $ cvs commit -m “Welcome note added” Checking in index.php; /usr/local/cvs/e-commerce/index.php,v <— new revision: 1.2; previous revision: 1.1 done
index.php
Of course, we can also commit on a file basis, just as with the update command. This is useful in situations when you have finished one piece of functionality and started another. You need to submit only the changes
17
FEATURE
Introduction to Version Control with CVS
made to some files, because other files are still unfinished. When you want to do this, just append the relevant file names to the command. $ cvs commit -m “Welcome note added” index.php
We talked about the “-m” switch with the import command, and it serves the same purpose here: specifying the comment for the action. If you check for changes now, you’ll see that everything is up-to-date and it’s time to move on with our project. Imagine that we add some new scripts to our project; of course we need to add them to CVS repository, as well. First we have to create a new file, for example logout.php. You can do it in your favourite editor or integrated development environment. If you now check CVS for changes, you will see something like this. $ cvs -qn update ? logout.php
Our new script is there but with the unusual status “?”. That means that CVS has no information on this file, and that the add command should be used. $ cvs add logout.php cvs add: scheduling file `logout.php’ for addition cvs add: use ‘cvs commit’ to add this file permanently
If you now check for changes, you’ll see the “A” status, which means that file has been scheduled for adding, but should be committed in order to finish the process. $ cvs -qn update A logout.php $ cvs commit –m “Added to the project” logout.php Checking in logout.php; /home/office/cvsroot/test/logout.php,v <— logout.php initial revision: 1.1 done
Now the process is completed. Of course, one of the common CVS operations is the removal of files from the repository. Like all of the other operations, this is quite easy to do. First you have to remove the file from the working directory. $ rm logout.php
Then use the CVS remove command to remove it from the repository $ cvs remove logout.php cvs remove: scheduling `logout.php’ for removal cvs remove: use ‘cvs commit’ to remove this file permanently $cvs -qn update R logout.php
Now, you can see that file has been marked with an “R”, which means that the file is marked for deletion, but we need to call the commit command to complete the operation. $ cvs commit –m “Removed from the project” logout.php Removing logout.php; /usr/local/cvs/e-commerce/logout.php,v <— logout.php new revision: delete; previous revision: 1.1 done
November 2003
●
PHP Architect
●
www.phparch.com
If you are not sure about the statuses that the update command gives you, you can always find more details on the file by using the status command. $ cvs status index.php =========================================================== File: index.php Status: Up-to-date Working revision: 1.2 Tue Sep 2 20:34:20 2003 Repository revision: 1.2 /usr/local/cvs/ecommerce/index.php,v Sticky Tag: (none) Sticky Date: (none) Sticky Options: (none)
We were talking about conflicts before, so now we will intentionally create one and see how to deal with it. For that purpose we will need another working directory. $ cd /home/web $ mkdir e-commerce_conflict $ cvs checkout -d e-commerce_conflict e-commerce
Now we shall modify the index.php in our original e-commerce directory to be like this: echo “Debug:”.“E-commerce 1.0 - under construction”; ?>
and commit it. If we now change the index.php code in t h e e-commerce_conflict directory like this: echo “E-commerce 1.0 - under construction”.“-debug”; ?>
and run the update command, you’ll see the following message: $ cvs -q update RCS file: /usr/local/cvs/e-commerce/index.php,v retrieving revision 1.1 retrieving revision 1.2 Merging differences between 1.1 and 1.2 into index.php rcsmerge: warning: conflicts during merge cvs update: conflicts found in index.php C index.php
The index.php now looks like this <<<<<<< index.php echo “E-commerce 1.0 - under construction”.“-debug”; ======= echo “Debug:“.“E-commerce 1.0 - under construction”; >>>>>>> 1.3 ?>
The conflict has been clearly marked, and you can see that two different lines have been found at the same place in the different revisions. You can resolve this conflict in one of two ways. You could choose to delete the local copy and replace it with the version in the repository, or you could manually fix the conflicting part of the code as you see fit, and commit the changes. echo “Debug:”.“E-commerce 1.0 - under construction”.“ - debug”; ?>
Which approach you choose depends on how many changes have been made to the file. If your changes are small, for instance, then it might be easier to just
18
FEATURE
Introduction to Version Control with CVS
remove the local copy and start all over. The commands introduced above cover the most common tasks that you as a developer will perform with CVS, and often it is all you need to know.
• options and tagdate are connected with revi-
sion numbers and sticky tags, which are not going to be covered by this article An example file entry could look like this:
How does it work? Let’s turn our focus now to a few advanced CVS topics, and see how it works “under the hood”. This can be very useful in cases where you experience difficulty working with CVS. We already mentioned the CVS directories that are placed in the working subdirectories. These directories keep track of the state of the working directory with regards to a certain repository. The CVS directories can contain several files, but here we’ll examine only the ones that are most important for you as a developer. The Root file contains information about the current CVS “root”, which is the directory where the CVS repository lives. This will likely be the directory that we chose in CVS initialization - /usr/local/cvs. The Repository file contains the directory for our project in the CVS repository. Remember, one CVS repository can be used for several projects. This could be an absolute or relative path, so our Repository file could contain e-commerce or /usr/local/cvs/e-commerce. The Entries file lists the files and directories in the current working directory that are under CVS control. It is a plain text file that contains a file or subdirectory on each line. You can tell what kind of entry each line is by examining the first character of that line. If the first character is “/”, then it is a file entry in the following format: /name/revision/timestamp[+conflict]/options/tagdate
• name is the name of the file within the direc-
tory. • revision is the version of the file in the work-
ing directory. It could also be a zero (0) for newly-added files or a dash followed by a revision number (-1.1) for removed files. • timestamp is a universal time (UT) timestamp
that shows when the file was created by CVS. If it differs from the file’s modification time, the file has been changed. If you want to force the file to always be considered as modified, you could put a different string here (e.g. “Always modified”), since CVS always does just a simple string comparison. • conflict indicates that there was a conflict
during the file update, and if the file modification time is the same as the timestamp it means that the developer hasn’t resolved the conflict yet.
The format for directory entries in the Entries file is: D/name/filler1/filler2/filler3/filler4
• name is the name of the subdirectory • filler fields are left for future enhancement.
An example entry for the system subfolder would be: D/system////
Now that you know how the working directory CVS directories are organized, we will briefly go through the repository organization, which will give us an idea of how the other side works. The repository is just a directory on your server (or a remote server, as we will see later). Project files are stored in the repository with names that are the same as the working names with “,v” appended to the end. These files are known as history files or RCS files. They contain enough information to recreate any version of the file. These files also contain all of the comments that were entered in the import and commit process (remember the “m”switch). This way a complete log history of each file can be generated (including the usernames of the committers). Sometimes CVS stores RCS files in the Attic subdirectory. If we suppose that our CVSROOT is /usr/local/cvs, then it is normal that the history file for the index.php in the e-commerce project would be /usr/local/cvs/e-commerce/index.php,v. If the file was removed, it is stored in the Attic subfolder (in our example, /usr/local/cvs/e-commerce/Attic/index.php,v). As an aside, if you followed all of the operations in the first part of this article, and you are able to cd into the CVS repository, you should find logout.php in the Attic directory. In the repository you will also find the CVSROOT directory. This directory contains administrative files. For a complete list of all possible administrative files and their organization, you should check the CVS documentation. Here we will mention only the modules file, as that is the most important one for us. The modules file can be used to create aliases and to group project resources into logical modules. It is a plain text file with one line for every module or alias. Lines can be continued over more than one line, by appending the backslash (“\”) character to the end of each line. Aliases have the following syntax:
19
FEATURE
Introduction to Version Control with CVS
alias_name –a what_to_alias
For example, let’s go back to our e-commerce directory from earlier. If we don’t like typing “e-commerce” all the time, we can define an alias. ecom -a e-commerce
This would make the result of the following two commands the same. $ cvs checkout e-commerce $ cvs checkout ecom
Modules are defined as: module_name [options] directory [files...]
for example
can go a little further and see how you can adapt it to your specific organization’s needs. The first thing you should think about is the location of the CVS server. All of our previous examples assumed that the CVS repository is on the same server as the working directory. This is acceptable in some cases, but not always. If we go back to the example of the opensource project with developers spread all over the world, this configuration would not be very useful. What we need is a CVS server that is publicly available via the Internet for developers to use. Fortunately, using remote repositories is as easy as using local ones. All you have to do is to use the following format for your CVSROOT (specified either with “-d” switch or in an environment variable as we saw above): :method:user@hostname:path_to_repository
ecom e-commerce
method indicates what authentication method is in use.
This way when you do a checkout you will get an ecom directory, instead of e-commerce, although the files will be the same. $ cvs checkout ecom
By specifying a filename behind the directory name in the module file, you can include only those files in the module, rather than the whole directory. This might be useful, for example, if your project is split into logical sub-systems. ecom e-commerce login.php
If you now do the checkout, you will only get the file(s) you specified. A module definition can include other modules. If you want to do that you should add an ampersand (“&”) sign before the module name. ecom &e-commerce &other_module
Administrative files are stored in the CVSROOT directory in RCS format, just like regular project files, but a working copy of each file should also be present. So for the modules file, you should find both modules,v and modules there. You can freely edit any of the administrative files the same way you edit the project file. All you have to do is to check it out, change it, and commit it back to the repository. $ cvs checkout CVSROOT $ cd CVSROOT [edit modules file] $ cvs commit -m “CMS module added” modules
Of course, you should be careful with this process as it could affect general CVS behaviour. Only modify administrative files if you know exactly what you are doing. Adapt it to your organization’s needs Now that we know all the basic things about CVS, we
November 2003
●
PHP Architect
●
www.phparch.com
You can connect to CVS using rsh, kerberos or password (pserver) authentication. In this article we will focus on the password authentication; for more details on other connection methods you should consult the CVS documentation. Connecting with password authentication is useful in the situation when rsh or Kerberos are not available for some reason. First of all you should set up your inetd to correctly receive connections for CVS on the appropriate port (2401 by default) and to execute the “cvs” command. Usually, adding the following line in /etc/inetd.conf should be sufficient. 2401 stream tcp nowait root /usr/local/bin/cvs cvs —allow-root=/usr/local/cvs pserver
Separate CVS and system account passwords can be introduced, which is very useful because pserver authentication sends plain-text passwords through the network, meaning that system accounts could be compromised. Setting up a separate CVS account password is done using the $CVSROOT/CVSROOT/passwd file. This is a plain text file that can be created and modified with any text editor. It’s in the same format as the /etc/passwd file except it has only three fields: cvs_username, password, optional_system_username. The optional username allows you to map your desired cvs username with the appropriate system account. This means that if you authenticate as cvs_username, you would end up using the CVS server under the optional_system_username system identity. Passwords are encrypted with the standard Unix crypt() function so it is possible to copy and paste passwords from the /etc/passwd file. In this case you should consider protecting the $CVSROOT/CVSROOT directory with the same privileges as the /etc directory to prevent malicious users form compromising your CVS server. If CVS can’t find a username or the passwd file it will try system user look-up (system password file, LDAP,
20
FEATURE
Introduction to Version Control with CVS
etc) in order to try to authenticate the user. This feature can be disabled by specifying a “no” value for the SystemAuth variable in the $CVSROOT/CVSROOT/config file. On the client side, the user just has to use the cvs command specifying the remote repository with the pserver authentication method (either using “-d” or storing it in the $CVSROOT variable). When you are accessing the remote repository for the first time, you have to login. That is done using the login command, which will prompt you to enter your password. $ cvs -d :pserver:[email protected]:/usr/local/cvs login Logging in to :pserver:[email protected]:2401/usr/local/cvs CVS password:
After logging in, your password will be stored in the $HOME/.cvspass file. The password is not stored in plain
form, but it is trivially encoded and could be easily compromised. This is, along with sending passwords in the plain-text form, the biggest security flaw of the pserver method of CVS authentication. The location of this client password file can be changed by setting the CVS_PASSFILE environment variable. Now we’ve solved the problem of remote access to the project repository for developers, but that is not enough. What we want to do is ensure that everyone can obtain the source of the project, but that only the core developers can commit changes. This can be done by using an anonymous account with the pserver authentication method. An anonymous account can be created by inclusion (a username is explicitly marked as the read-only account) or by exclusion (username is not on the list of users with write privileges). For this purpose you use the readers and writers files in the $CVSROOT/CVSROOT directory. You can create and modify these files (as well as the passwd file) in the same way as we did the modules file earlier. In the readers file put all usernames that you want explicitly to have a read-only account, for example anonymous guest
In the writers file you should explicitly declare all the usernames that have write privileges, for example dejanb arno
All of these usernames must exist in the CVS passwd file or on the system. The authentication method then goes like this (once the user is authenticated): • If the readers file exists with that username listed, then the user gets read-only access. • If the writers file exists and there is no given username there, then the user would get readonly access again. • Otherwise, the user gets write access to the repository.
November 2003
●
PHP Architect
●
www.phparch.com
There could be a conflict situation, too. If a username exists in both readers and writers files, that user gets read-only access. Now we have all that we need to start working on our project. Branches CVS is not restricted to linear development. You can create branches of your source, which could be valuable for fixing bugs in previously-released versions of your project. Let’s say that we have released version 1.0 of our e-commerce suite, and continued on to work on the 1.1 release. After a while, a client calls and reports a bug in the code, but our repository has changed since that release and we don’t have an accurate version of the source to work on. Every branch has its own number. Branch numbers consist of an odd number of period-separated integers. The revision number of the file being branched becomes the branch number followed by a period and an integer. In Figure 1 we can see examples of the file revision and branch numbers.
It can be very useful if we make a branch of the code after every project release. Use the tag command with the “-b” switch to name the branch. $ cvs tag -b e-commerce-1-0 cvs tag: Tagging . T index.php T login.php T login_form.php
Now, when you need a working copy of some earlier product release, you can access it by using the “-r” switch with the name of the branch on checkout and update commands. First, you need to checkout the branch and create a working directory for it. $ cd /home/web $ cvs checkout
-r e-commerce-1-0 e-commerce
After you made modifications you wanted, you can commit it as before and changes will be applied in the desired branch and not in the main development branch. If you want to get any changes that other developers made in the desired branch, you could use the “-r” switch with the update command $ cvs update -r e-commerce-1-0
There are many other possibilities with branching,
21
FEATURE
Introduction to Version Control with CVS
such as merging branches back into the main development branch, but they are outside the scope of this article. CVS Tools Until now, we only talked about the CVS server and command-line client commands. If you like to work in a window-based, graphical environment and find yourself uncomfortable with typing commands, there are lot of solutions that you can use. Many modern integrated development environments (IDE), for example, support CVS. You could use Eclipse (http://www.eclipse.org) with the PHP plug-in (http://phpeclipse.sourceforge.net) and benefit from the power of their integrated GUI for CVS access. It seems that the majority of script developers tend to just use a good text editor for their development (no matter whether they work in Windows or Unix environments), so we will focus on a few standalone applications that could make your client CVS tasks easier. Tortoise CVS (http://www.tortoisecvs.org/) is Windows-based tool that allows you to integrate basic CVS commands into Windows Explorer. After installation you can find extra options on your right-click menu that allow you to update, commit, or do anything you need
with your code. See Figure 2. CVSGui (http://www.wincvs.org) is a set of GUI’s available for the Windows, Macintosh, or Unix/Linux environments. It’s a powerful application that uses the native look-and-feel of your operating system. It contains a file browser, a module browser, command line support, and many other advanced features that can help experienced users to automate their tasks. See Figure 3. Chora (http://www.horde.org/chora) is a PHP application that allows you powerful file data (authors, logs, differences) browsing and graphical branch representation. CVSweb (http://www.freebsd.org/projects/cvsweb.html) - is a single Perl script written originally for the FreeBSD project. Over time it has earned great popularity among software developers. It enables you to browse a repository’s revision history with a web browser.
Figure 3
November 2003
●
PHP Architect
●
www.phparch.com
22
FEATURE
Introduction to Version Control with CVS
Further steps If you’re planning a potentially large project, you’ll definitely need a version control system, and you’re better off to do it from the start. CVS is a proven tool for the job, and has a huge resource base. Many developers are familiar with it, reducing the need for extra training, and there are a plethora of tools out there for working with CVS, including source tree analysis. This article has only covered the tip of the iceberg with CVS and associated tools. There’s much more to learn, and you’ll find a ton of information on the web, so check it out!
Figure 2
About the Author
?>
Dejan Bosanac works as a fulltime software developer for DNS Europe Ltd (http://www.dnseurope.net) on the Billing software system for ISP's. In his spare time he also serves as a Lead Engineer at Noumenaut Software (http://www.noumenaut.com) on the online journaling project. He holds a Bachelor degree in Computer Science and currently is on the master studies in the same field.
Click HERE To Discuss This Article
http://forums.phparch.com/58
FavorHosting.com offers reliable and cost effective web hosting... SETUP FEES WAIVED AND FIRST 30 DAYS FREE! So if you're worried about an unreliable hosting provider who won't be around in another month, or available to answer your PHP specific support questions. Contact us and we'll switch your information and servers to one of our reliable hosting facilities and you'll enjoy no installation fees plus your first month of service is free!* - Strong support team - Focused on developer needs - Full Managed Backup Services Included Our support team consists of knowledgable and experienced professionals who understand the requirements of installing and supporting PHP based applications. Please visit http://www.favorhosting.com/phpa/ call 1-866-4FAVOR1 now for information.
November 2003
●
PHP Architect
●
www.phparch.com
23
Can’t stop thinking about PHP? Write for us! Visit us at http://www.phparch.com/writeforus.php
Introduction to PHP-GTK
F E A T U R E
by Eric Persson
P
HP-GTK is a PHP extension that provides an objectoriented interface to the GTK+ toolkit. GTK+ is a multi-platform toolkit for creating graphical user interfaces, or GUI’s. PHP-GTK is an excellent tool for the rapid development of stand-alone desktop applications. It gives you the opportunity to create these “real” applications in PHP, without having to bother with more expensive development environments like Visual Basic or Delphi. Another advantage, of course, is the multi-platform possibility. PHP with the PHP-GTK extension can be run on several platforms, including Linux and Windows 98/NT/2000/XP; some have also reported it to work under the new MacOS X. Before starting your quest throughout the PHP-GTK world you may wish to brush up on PHP’s object-oriented side, as this extension is heavily object-oriented. Have a peek at http://www.php.net/oop and http://www.php.net/language.references for a refresher (or introduction).
Installation Before we can get going with some actual coding, you’ll need to install the PHP-GTK extension. Start by downloading the latest stable release from http://gtk.php.net/download.php. For this article I used the 0.5.2a Windows and PHP binary, which is probably the easiest way to get going quickly. The Windows installation is a 5 minute job where you unzip the package and move some files around. Read the README.TXT file included in the pack-
November 2003
●
PHP Architect
●
www.phparch.com
age for the exact steps. Under Linux the install is a bit different, as there are no officially released pre-compiled binaries. This means that you have to download the source and compile it on your own. Again, information on how to go about this can be found in the tarball. Your first PHP-GTK application How else should we start than by writing a short “Hello world” application? Let’s have a look at Listing 1, which is actually a bit of an enhanced “Hello world” application, since it includes some extra functionality to make it a little more interesting. In order to work with the GTK functionality we first need to load the PHP-GTK extension. We do this dynamically at runtime using the dl() function. Note that it’s a bit different under *nix and Windows, since Windows uses .dll files and *nix uses .so files. Although we’re choosing to do it manually for clarity, it is possible to load this extension automatically by putting it in your php.ini file along with the other PHP extensions. Loading the extension manually, though, is less resource intensive when you use your PHP binary for things other than PHP-GTK.
Getting back to Listing 1 we then have the quit() and the click_button() functions, which we’ll put to use in a moment. Finally, we get into using PHP-GTK. We start off by creating an instance of the GtkWindow class, and by setting its title and size. This is the first window that our application will show, and it will be 300 pixels wide by 150 pixels high, with a title bar of “Hello world”. To add some excitement to this we will also fill the window with a small text label and an “OK” button. In GTK, if you want more than one widget in your window you first have to “pack” the widgets into a container (either vertical or horizontal), and this container is then added to the window (or, optionally, to another container). “Widgets” are the graphical elements with which you build your application’s interface—things like labels, buttons, and lists. The connect() method calls in Listing 1 serve to connect a user event with a callback function. The first connect() call tells GTK it should call our click_button() function when the button is “clicked”. This function uses the label widget’s set_text() method to change the label’s text to show we clicked the button. The second connect() call tells GTK to call our quit() function when the user clicks the window’s Close button (the X in the upper right corner). The quit() function ends the application by stopping the “main loop”, which is started on the last line of Listing 1. The GTK main loop is really what sets apart PHP-GTK applications from web-based PHP applications. It is what makes the application run and respond to events, and to compare it with a web application is a little tricky. For instance, a PHP web application draws the user interface once (outputs HTML). A PHP-GTK application, however, redraws its user-interface constantly in response to events. While the whole user interface is not redrawn on every iteration of the main loop, any pending changes (such as our label text change) are updated on each iteration. Now that we know what Listing 1 does, let’s see it in action. PHP-GTK applications, like their command-line cousins, are started by running the script with your PHP Figure 1
Listing 1 1 2 3 4 5 6 7 8 9
/* This function is started when the program quits, quite like a destructor function */ 10 function quit(){ 11 Gtk::main_quit(); 12 } 13 14 /* This function is started when the Ok-button is clicked.*/ 15 function click_button(){ 16 17 /* Change the text in the $label text box. */ 18 $GLOBALS[‘label’]->set_text(‘Thanks for clicking ok.’); 19 } 20 21 /* Create the window with “Hello world” as title and a fixed size */ 22 $window = &new GtkWindow; 23 $window->set_title(‘Hello world’); 24 $window->set_default_size(300,150); 25 26 /* Create a small label with the following text*/ 27 $label = &new GtkLabel(‘test’); 28 29 /* Create a button */ 30 $button = &new GtkButton(‘Ok’); 31 32 /* Connect the click_button() function to the clicked signal */ 33 $button->connect(‘clicked’, ‘click_button’); 34 35 /* Use a verticle box container */ 36 $box = &new GtkVBox(); 37 38 /* Pack both of the to objects in there. */ 39 $box->pack_start($label); 40 $box->pack_start($button); 41 42 /* Fit the box into the window */ 43 $window->add($box); 44 45 /* Connect the quit() function to the delete_event signal */ 46 $window->connect(‘delete_event’, ‘quit’); 47 48 /* Show the fantastic window that we created. */ 49 $window->show_all(); 50 51 /* Start the gtk main loop */ 52 Gtk::main(); 53 54 ?>
interpreter. If you’re running Windows, for instance, you can start it as shown in Figure 1. You may notice that the DOS window from which you start the PHPGTK script hangs and waits for the script to terminate. This means that every PHP-GTK application you run comes with its very own black DOS window, which is not so nice if you plan to run these applications on a regular basis. In the Windows PHP-GTK distribution package there is a special php_win executable that behaves a bit differently from the normal executable. The php_win executable starts the script, detaching it from its DOS window. This makes it look more neat and professional. More theory So, the workflow for developing a PHP-
November 2003
●
PHP Architect
●
www.phparch.com
26
FEATURE
Introduction to PHP-GTK
GTK application is to instantiate, initialize, and pack the interface widgets, set up callbacks to handle any relevant events, and start the main loop. Almost every GTK object has events (often referred to as signals in GTK) that can be caught and handled. These could be anything from from a window losing focus, to characters being entered into a textbox. These events form the foundation of GTK applications. If no events happen or get caught, your application will just sit there doing effectively nothing. By catching events, you can do all sorts of weird and wonderful things. It is important to remember that when developing PHP-GTK applications everything depends on the main loop. As an example, let’s say you have a normal loop in one of your GTK callback functions that performs a large number of iterations. If you change something in the user interface (like a label) in the middle of this loop, the interface won’t actually be updated until your callback function returns. In fact, your loop would actually lock up the entire application, not allowing any events to be processed until the main loop regained control. Fortunately, there is a way around this which we’ll see when we get to Listing 4. Let’s now take a look at some of the problems with PHP-GTK applications. Drawbacks of PHP-GTK One thing that has followed PHP throughout time is limitations in its memory management. Since most PHP scripts are expected to have a lifetime of a matter of seconds, allocated memory isn’t released until the script ends. Unfortunately, this also applies to PHP-GTK applications, and results in the process hogging more and more memory over time. This is an important consideration to take into account when planning, developing, and deploying a PHP-GTK application, especially for applications that will run for long periods of time, like a whole business day or several weeks. This problem has been discussed many times on the PHP-GTK mailing list and while there are some solutions that have been reported to work, none of them seem to be a quick fix. Another issue is the packaging and distribution of your finished application. Unfortunately, there is no easy way to make a self-contained package, or application binary. One solution I tried was adding PHP code to the actual PHP interpreter executable itself. This approach worked nicely, but had size limitations that made it almost hopeless to use for anything more than educational purposes. One of the biggest problems I’ve encountered when creating PHP-GTK applications is the excessive number of objects—one for each layout element. In large applications, this can get really hard to keep track of, so I usually organize them all into a global array (such as $WIDGETS). This way I know where everything is, and I
November 2003
●
PHP Architect
●
www.phparch.com
can var_dump() it to easily see what’s in there. An example of this will be shown in Listing 4. As you may have guessed by looking at Listing 1, it would require a lot of code to create the user interface for larger applications. This can be reduced, of course, if you start making general functions for all kinds of windows and common tasks, but in the end the drawing of the interface is just not very nice to do with code. It would be much better if you could just drag and drop interface elements like in Visual Basic or Delphi, and then just add the logic to the interface. Guess what? This is exactly what you can do if you use Glade. Introducing Glade Glade is the desktop application equivalent of web page templates, and is a good way to separate our code from the user interface. Glade uses XML to describe the layout of an application, so it would be possible for you to give the design work to someone else and then just program the “behind-the-scenes” logic to activate the user interface. Another benefit of using Glade is that you don’t have to write all that code necessary to create the user interface. Instead, you can use a user-interface builder (see Figure 2) to just drag and drop what you want, set all the relevant widget properties (such as name and size), and then export it all to a Glade file. I used wGlade (http://wingtk.sourceforge.net) for the examples in this article. Other Glade interface builders exist, but I find wGlade very easy to use. To demonstrate how it works, I changed Listing 1 to get the user interface from a Glade file. Have a look at Listing 2, and see how much cleaner it is. Listing 3 shows the Glade file used in Listing 2. Let’s take a look at Listing 2, and explain what we did. The first difference is that we now load the Glade file, creating the $glade_app object which can be used to reach all the widgets in the layout. Once we have variables pointing to these widgets, the rest of the code is the same as Listing 1. So far we have limited ourselves to the GtkWindow, GtkButton and GtkLabel widgets, but there are plenty more; for example, GtkEntry, GtkCombo, GtkMenu, and GtkCalendar. You can find a complete list of widgets at developer.gnome.org/doc/API/gtk/gtkobjects.html. Building a port scanner Now that we’ve got the basics down, and can use Glade to ease the interface development, let’s do something practical, and create a small port scanner. Along with performing a meaningful task, this example will also serve to introduce some new widgets. The code for this example can be found in Listing 4 (not listed due to length), with the Glade file in Listing 5 (not listed due to length). Both listings can be found in this month’s code package.
27
FEATURE
Introduction to PHP-GTK
Again, wGlade was used to build the interface (Figure 3). The wGlade application is kind of buggy in places, and you may end up having to enter some property values manually in the Glade file—I had to do this with the window’s default width and height. As you can see in Listing 4, I have organized all of the relevant widgets from the Glade file into the $WIDGET array. This way it’s easier to manage them. Next, we see the port_scan() function that does the actual connection attempt, and the port_service() function that returns a service likely to be running on a specific port number. The dialog() and close_dialog() functions are a good example of how you can combine a Glade application with “regular” PHP-GTK constructions. dialog() accepts a string as an argument, and shows this string in a dialog window with an OK-button. We create the dialog window from scratch using the GtkDialog widget, and then add a GtkLabel and GtkButton to it. The functions update_statusbar() and update_progressbar() are used to update the status bar and progress bar, respectively. The scan() function is really the main part of the application. It gets the IP address, port, and timeout information from the options tab and the scan tab, and then performs the actual scanning. scan() also manages to find time to update the user interface while scanning.
The progress bar and the status bar are two of the new widgets we used in this example, and they are extra important—especially when understanding the main loop. The actual scanning consists of a while loop which runs from the start port number up to the end port number. Inside of this loop we do the following: • update the progress bar. • update the status bar. • check if the port is open; if it is, add a row to the clist widget. • start the GTK main loop manually to finish up pending tasks. • continue to the next port. In the fourth item we are starting the PHP main loop to finish up pending tasks. What is that? Well, in item 1 and 2, we updated the progress bar and the status bar, but that wasn’t actually reflected in the interface. Instead, the update was queued; the actual update is not done until the main loop gets a chance to do it. That’s what the fourth item is for—we’re giving it a chance to update the interface. At the end of the scan() function we update the staFigure 2
Listing 2 1 set_text(‘Thanks for clicking ok.’); 19 } 20 21 /* Create the glade_app object and load the glade file */ 22 $glade_app = &new GladeXML(‘listing4.glade’); 23 24 /* Get all interesting objects in the user interface */ 25 $window = $glade_app->get_widget(‘window’); 26 $label = $glade_app->get_widget(‘label’); 27 $button = $glade_app->get_widget(‘okbutton’); 28 29 /* Connect the click_button() function to the clicked signal */ 30 $button->connect(‘clicked’, ‘click_button’); 31 32 /* Connect the quit() function to the delete_event signal */ 33 $window->connect(‘delete_event’, ‘quit’); 34 35 /* Show the fantastic window that we created. */ 36 $window->show_all(); 37 38 /* Start the gtk main loop */ 39 Gtk::main(); 40 41 ?>
November 2003
●
PHP Architect
●
www.phparch.com
28
FEATURE
Introduction to PHP-GTK
tus bar once more to tell the user that the scanning is finished. This time, though, we don’t need to start the main loop manually, as the function is finished, and will be returning control to the main loop, anyway. The rest of the script is almost the same as in the first example, and is pretty standard for most PHP-GTK applications. The application is now ready to be used (see it running in Figure 4). While it may not contain all features that you would expect from a port scanner, it is a good example of a PHP-GTK application to look at and learn from. This port scanner is limited to IPV4 IP address, and the way it determines if ports are open or not is probably not the most efficient. Listing 3 <project> helloworld <program_name>helloworld <source_directory>src pixmapsCFalseFalse <widget> GtkWindowwindow <width>300 150Hello worldGTK_WINDOW_TOPLEVEL <position>GTK_WIN_POS_CENTER <modal>False FalseFalseFalse <widget> GtkVBoxvbox1True <spacing>0 <widget> GtkLabellabel <justify>GTK_JUSTIFY_FILL <wrap>False <xalign>0.5 0.5 <xpad>0 0 <padding>0 <expand>True True <widget> GtkButtonokbuttonTrueGTK_RELIEF_NORMAL <padding>0 <expand>True True
November 2003
●
PHP Architect
●
www.phparch.com
Real-world applications So, all this sounds wonderful, doesn’t it? But do people actually use PHP-GTK? There are actually quite a number of applications developed in PHP-GTK, and many of them are released with full source code, which offers a potential goldmine for learning. There is a nice list of applications at http://gtk.php.net/apps. I would like to mention a few of these applications, and briefly introduce them to you. The first one is a integrated development environment for web/GTK projects. It’s called PHPMole, and is worth checking out at http://www.akbkhome.com/Projects/Phpmole-IDE. Another application is SAC.php, which is a content management system with a PHP-GTK administration system. The last one is actually a game called Deep Dungeons. Deep Dungeons is a role-playing game developed with PHP-GTK, and can be found at http://deepdungeons.sourceforge.net. Summing up After a whole article about PHP-GTK, you will hopefully be eager to start writing your own applications. It’s actually quite simple once you wrap your mind around the concept of PHP as a desktop application language. And Glade makes it even easier (although it’s not without its problems). PHP-GTK has great potential, but there are still some key issues holding it back, and those are probably a real pain for the developers of the GTK extension. If we can get the packaging and distribution figured out, as well as finding a solution to the garbage collection and memory usage problems, I think more companies will start exploring it. I hope I inspired you to embark on new adventures through the PHP jungle. Please drop me a note if you do something really cool.
About the Author
?>
When Eric's not out skiing or hiking, he's working as a freelance developer on various projects. His current focus is finishing his education in open-air alpine environments.
Click HERE To Discuss This Article http://forums.phparch.com/59 29
Speaker on the High Seas An Interview with Wez Furlong I N T E R V I E W
by Marco Tabini
O
ur continuing coverage of php|cruise takes us all the way “across the pond” to the Old Continent—to Great Britain, to be precise— where this month’s featured speaker resides. Wez Furlong is a well-known PHP author, and is a developer of PHP, responsible for such features as the streams API and the new SQLite extension. I have had the opportunity to chat with Wez on a few occasions, and when we started kicking around names of potential speakers for php|c, he was one of the first we tapped—and I’m glad we did, because he has come through with some excellent talks. php|a: Wez, let’s start with a quick introduction for our readers. What is your role in the development of PHP? I wrote and implemented the Streams API found in PHP 4.3 (which required a fairly broad understanding of the whole of the PHP internals), and have worked on a number of other extensions and interfaces to PHP, such as ActivePHP SAPI, OpenSSL, COM (which I rewrote for PHP 5), mailparse, sqlite, sysvmsg, and a SAPI interface to load PHP 5 into the Irssi IRC client (known as php-irssi). In addition to code, I also help to keep the php.net systems ticking from time to time, and started up a project to replace the slow and CPU intensive docbook build system we use for the PHP
manuals with a very fast and flexible alternative using PHP and SQLite— I call this livedocs right now, but since Macromedia has something with that name, it will probably need to be renamed. php|a: What attracted you to PHP? When did you join the project? It was initially work related. One of my first tasks in the commercial world of programming was to rewrite and maintain a UK site where people could advertise their property (houses) for sale. I rewrote this from some custom Unix C code into an ASP site. At the time, I’d never heard of PHP before (even ASP was new to me). A year or so later, that same employer had a potential customer that wanted a web based e-learning system, but it had to run under Unix. So we looked around and found out about this thing called PHP 3—it sounded like ASP, it was free and even better, it used C-style syntax and could even send email without installing a third-party component. What more could you ask for? :)
“I really like the idea of making PHP work in places it has never been before!”
November 2003
●
PHP Architect
●
www.phparch.com
30
INTERVIEW
An Interview with Wez Furlong
So we adopted PHP for the project. At that time, PHP 4 was in late beta, so we started out with PHP 3, and later migrated to PHP 4. After a while, we felt the need for SSL sockets to be returned from the fsockopen() function (to use for some credit card authorization stuff), so I cooked up a simple patch to do it and submitted it to the php-dev list. This was sometime in September of 2000. As time went by, I refined my patch a little, but was frustrated by the way that the PHP code handled sockets vs plain files—the code was nasty, and it just wasn’t easy to make the SSL stuff working without making it any uglier. After some discussion, I suggested making the socket stuff more modular, and I think it was Andrei Zmievski that suggested making the whole of the PHP file access more modular. It took some months of juggling in my spare time to get this fully implemented, but finally, the Streams API was born.
“PHP would really benefit from something like the .Net framework,”
php|a: Are there any other PHP-related projects you’re working on? For example, I know you were working on a version of PHP that could be embedded in other applications, and you recently ported the Lemon parser to PHP. Yes, I really like the idea of making PHP work in places it has never been before! My first foray in this field was the ActivePHP SAPI, which allows PHP to be used from any ActiveScript enabled application, including--but not limited to— Internet Explorer (client-side), ASP (server-side), and Windows Scripting Host. People often jump when they hear that they can use PHP on the client side—well, don’t get your hopes up too high—PHP is just far too powerful to safely deploy on the client side of a browser. Just pretend that you didn’t hear that it was possible, and save yourself a lot of security issues! The ActiveScript SAPI is still quite beta (the PHP 5 version, which I need to rewrite, will be much better). In terms of real world use, the WeaverSlave IDE has support for scripting/macros using this. In a similar way, I’ve embedded PHP into the irssi IRC client using Edin Kadribasic’s Embed SAPI, and more recently, I’ve begun work on a cross-platform email November 2003
●
PHP Architect
●
www.phparch.com
application that uses an embedded version of PHP 5 to reduce the burden of coding the presentation layer which would otherwise have been implemented using C++. I also initiated a project that I call livedocs to replace the slow, heavy and inflexibile openjade way of transforming DocBook XML into HTML. php|a: You were also one of the principal architects of the new SQLite extension. Can you tell us what prompted you to develop it and in which situations SQLite will be most useful? Well, I had a deadline to finish an article for a PHPbased magazine. When it comes to writing, I find that I can either sit there and type it out, or not. I was in the “not” phase and needed something to occupy my mind. I’d heard of this thing called SQLite before, and heard that someone else had written an extension for it. I looked at the code for this thing and realized that it was totally broken and was surprised that it could even work as well as it did. As a challenge, I allowed myself 2 hours to implement the basic features of the SQLite library from scratch. 2 hours later (on the nose!) I had a working extension—and it turned out to be pretty good. Since then, others have helped develop the extension, and Marcus Boerger has even come up with a whole OO API for it in PHP 5. In terms of use, SQLite is particularly well suited to situations where you need to retrieve data based on some selection criteria—e.g.: looking up data cross-referenced by names and dates. SQLite is very fast, since it doesn’t have the overhead of a “real” RDBMS, so you get performance and simplicity (flat file with SQL interface!) in a single package. SQLite is NOT well suited for situations where you have a very high degree of concurrent users making updates to the database. SQLite locks the whole file when making an update, and if you have a couple of hundred users attempting to update at the same time, you get a massive blocking problem. There are tricks you can use to reduce this, but to be honest, if you are in that situation, you can or should be able to afford a real database. Most people using MySQL with PHP aren’t really using it except as a glorified filesystem; in terms of the overhead there (administration, maintenance and runtime access speed) it makes more sense to use SQLite instead. Of course, SQLite doesn’t have all the features of MySQL, so it’s not really a drop-in replacement for it in existing projects, but an alternative choice you can make when you are writing your application. php|a: What do you think are the best new features of PHP5?
31
INTERVIEW
An Interview with Wez Furlong
The new object model has to be the number one feature. Aside from the benefits to user code, it finally allows extensions to wrap OO based libraries properly (and more easily). For example, I completely rewrote the COM extension for PHP 5, and it kicks ass. It works the way COM is supposed to work—COM exceptions are mapped to PHP exceptions, and variant types are supported much more thoroughly. This just wasn’t possible before. One of the other features that not many people know about, is that the ZE2 now has streaming support in its scanners/lexers. This allows much greater flexibility for people writing stream implementations to be able to create a stream from any source (not just a file or socket based resource) and feed that into the scanner. In layman’s terms, it makes it easy to implement custom encrypted/encoded storage for their code, without requiring it to be stored in a temporary file on disk (which could potentially reveal the source code). This isn’t really a major point, but it’s just another one of my contributions that means a lot to those that will end up making use of it. Sterling Hughes (and others) have been looking at the performance for PHP 5, and I think it is actually faster than PHP 4 in a lot of cases. So, our overall performance should be better too.
moment, PHP 5 is a much more attractive proposition for the enterprise than PHP 4, although I expect it will be PHP 5.1 before they really start to pick up on it. php|a: Tell us about what you plan to talk about at php|c. I will be presenting 3 sessions: Socket Programming in PHP 5 will focus on using the new socket transport features I added to the Streams API in PHP 5. In addition to introducing these things, I will show how to write portable, working socket code, both client-side and server-side. I’ve seen too many PHP classes where people play with socket settings when they don’t need to. They also abuse things like non-blocking mode—you almost never need to use that. So, if you aspire to be a socket guru, you should find this session very useful.
“PHP is just far too powerful to safely deploy on the client side of a browser.”
php|a: Do you think these features will cause PHP to become more popular in new environments (like the enterprise, for example)? They will definitely make PHP much more attractive to these (almost mythical!) Enterprise people. However, in my opinion, what will really make them sit up and take PHP even more seriously is a rock solid class library. PHP would really benefit from something like the .Net framework, but I just can’t see something like that being produced by a community spread across the globe with no dictator to decide what should be done. PEAR isn’t theanswer to this—its goals are different—a community where anyone can (and does) contribute code, no matter how small, or how useful. Don’t get me wrong—PEAR is a very useful resource—there are some brilliant packages in there, but it makes the enterprise people nervous. Is the code any good? Is the code safe to use—either as-is, or can we modify it to fix problems? GPL and even the LGPL can present big legal problems to these guys. Luckily, the new COM extension also provides integrated support for .Net, so this will be less of an issue under Windows. However, sweeping aside my negative opinions for a November 2003
●
PHP Architect
●
www.phparch.com
E-mail Manipulation and Transmission in PHP — correctly building up or working with standards compliant e-mail can be a tricky task. In the past, I’ve worked on some commercial grade web-mail software and had to learn these standards. In addition, the software needed to be capable of sending mail using a Far-Eastern character set. In this session I will be sharing some of my hard-earned knowledge for your benefit. My final session is about Extending PHP. If you are ready to ascend to the ranks of an internals hacker, or are looking at exposing your business logic (that you have already written in C/C++) to the web using a simpler interface, this session will be an ideal starting point. There isn’t enough time to reveal all the internals magic of PHP, but there will be enough that you can take a C library and make a functional PHP extension for it.
php|a
32
An Introduction to SQLite
F E A T U R E
by John Coggeshall
F
or those of you who have been keeping track of the impending release of PHP 5, one of the most talked about new features will be the introduction of the SQLite into the standard PHP release. For those of you who haven’t been following the development of PHP 5, I think some introductions are in order. SQLite is, as it’s name implies, a relational database package which allows you to store data within tables in databases (just like MySQL). However, SQLite is unique from any other database packages in a number of ways which I’ll go over now. Differences between SQLite and other databases packages The single biggest difference—and greatest benefit— when comparing SQLite to other database packages is its architecture. Most other common RDBMS packages, such as MySQL, work using a client/server system where the SQL client (in our discussion, this would be PHP) would store and retrieve data to and from a database server. SQLite, on the other hand, stores and retrieves data in databases locally without the need for an additional server. Since PHP 5 now ships with both the database package and the interface to use it, developers can leverage their SQL knowledge in developing PHP applications, but not have to worry whether the end user will have another RDBMS package (such as MySQL) or not. To make this look even more appealing, SQLite will be bundled (both the library and the extension) with PHP 5, making the need to develop custom filesystem storage mechanisms for your scripts obsolete.
November 2003
●
PHP Architect
●
www.phparch.com
The end result is that scripts are both easier to develop and maintain. SQLite is typeless Now that you understand why SQLite fills a niche which was until now void in PHP, let’s take a look at the differences between SQLite and other RDBMS packages in a bit more detail. For starters, SQLite is a typeless database engine. This means that, like PHP, SQLite does not distinguish an integer from a string or any other type of data. In fact, when defining tables in SQLite just about anything can be used to describe the data type of a column. This is quite different in comparison to, for instance, MySQL, which has a predefined notion of different types of data and treats each type differently within the database. Although SQLite does not require the explicit declaration of typing information for each column within a table, it does have a very generalized concept of typing that is based on the data type provided, and that is essential for the database to properly function. In SQLite, data is classified into one of two categories: “textual” or “numeric”. These classifications are designed to assist the database engine in determining
the appropriate action when sorting or otherwise comparing values. Here are the rules by which SQLite will classify each column within a table: 1. columns with a data type containing “CHAR”, “TEXT”, “BLOB”, or “CLOB” are considered textual 2. all others columns are considered numeric. Hence, a column with a provided data type of “MY_SPECIAL_CHAR” would be considered textual, while a data type of “MY_SPECIAL_NUMBER” would be considered numeric. This can be further illistruated through the following SQL CREATE statement defining a table in SQLite: CREATE TABLE mytable (name NAME_TEXT, age YEARS);
Based on this table definition, a table mytable would be created containing two columns—name and age— which would be classified as “textual” and “numeric”, respectively, based on the provided data types. Since at times it might not be entirely clear if a particular piece of data is being classified as a textual or numeric by SQLite, a SQL function typeof() is provided which will return the classification of any piece of data. As a general rule of thumb, SQLite operates quite intuitively when determining the classification it assigns. For instance, any math operations between textual and numeric pieces of data will always result in a numeric result. Another interesting quirk regarding SQLite is the ever-useful auto-increment column. In MySQL, a column could be defined as auto-incrementing using the AUTO_INCREMENT flag. In SQLite, however, auto-incrementing columns are defined differently. Specifically, any column which is defined using the INTEGER data type and is a primary key will automatically increment unless a value is provided to it during an INSERT statement. As an example, in MySQL the following would be used to create an auto-incrementing column: CREATE TABLE autoinc(value INTEGER AUTO_INCREMENT PRIMARY KEY);
The following would be used in SQLite: CREATE TABLE autoinc(value INTEGER PRIMARY KEY);
It is important to note that the datatype cannot be abbreviated to INT as in MySQL. In order to create an auto-incrementing column you must use the datatype INTEGER in SQLite. Dealing with NULLs in SQLite Another realm under which SQLite differs slightly from MySQL is how NULLs are treated. In fact, for those familiar with other database packages, SQLite treats
November 2003
●
PHP Architect
●
www.phparch.com
NULL values identically to Oracle, PostgreSQL, and DB2. For those not familiar with these database packages, the following list shows some common situations, and how SQLite would handle NULL’s in those situations: • Adding NULL to a value evaluates to NULL • Multiplying by NULL evaluates to NULL • NULL values are distinct in a UNIQUE column • NULL values are not distinct in SELECT DISTINCT • NULL values are not distinct in a UNION • Comparison of two NULL values is true • Binary operators (NULL or 1) is true Multi-threading / Multi-access and SQLite The SQLite library is considered to be thread-safe, and can be used in an threaded application. SQLite does not run in a client/server model, though, and the fact that SQLite databases are stored in the local filesystem opens up a number of file-locking considerations. When SQLite is used in conjunction with a web server (even a non-threaded one), it is quite possible that a number of different processes will attempt to access the same SQLite database at the same time. Although this not a problem when both processes are attempting to read from the database, no other instance of SQLite can access that database if one of these processes is attempting to write to it. Although not a serious concern for many applications, this database locking does put limitations on the package’s ability to scale—something which must be considered in large projects. It is also not recommended that SQLite databases be stored in filesystems where locking is either unavailable or unstable, such as NFS filesystems or in older Windows platforms such as 95, 98 and ME. SQLite in PHP 101 Now that I have bored you to tears with all of the technical considerations and concepts behind SQLite, let’s take a look at some real PHP code. As already mentioned, the following code will only work in PHP 5 – if you’d like to play with this yourself you’ll need to get a copy of the latest PHP development snapshot from http://snaps.php.net/. Opening a database connection The first step in any database operation is of course to open a connection to the database. In SQLite, this is done using the sqlite_popen() function. This function has the following syntax: sqlite_popen($filename [, $mode [, &$errmsg]]);
As you can see, unlike MySQL which accepts a host-
34
FEATURE
An Introduction to SQLite
name to connect to, SQLite requires that the local filename of the database is specified via the $filename parameter. The optional $mode parameter specifies the filesystem mode to open the database under, while $errmsg is a reference to a variable which stores any error message if something goes wrong when accessing the database. The sqlite_popen() function will either open an existing database, or attempt to create the database. This function either returns a database resource representing the open connection, or false on failure. It is useful to note that when specifying the filename the special string “:memory:” can be used to create a SQLite in-memory database. Beyond being incredibly useful in example code (I’ll use this for every example in this article), it is also quite useful to create temporary tables for data analysis, etc.
NOTE: Currently, the $mode parameter is ignored by SQLite – it is intended for future releases.
The sqlite_popen()function opens a persistent connection to the database that will exist between requests from the web server. Once a connection has been opened, subsequent calls to sqlite_popen() will attempt to use an already existing connection to the database before a new one is created. If for whatever reason you would like PHP to close the connection after every request the sqlite_open() function is available with the same syntax. In either case, the SQLite database connection can be closed manually using the sqlite_close() function, passing it a valid database resource. Performing queries SQLite provides a number of methods to query the database once a connection to it has been made. The most generalized method is to use the sqlite_query() function, which has the following syntax: Listing 1 1
November 2003
●
PHP Architect
●
www.phparch.com
sqlite_query($db_r, $query);
$db_r is the database connection resource and $query is the SQL query to perform. Upon successful execution of the query, sqlite_query() returns a result-set resource representing the result set from the query (if any). If, for whatever reason, the query fails, sqlite_query() will return false. An example using what I have discussed thus far is found in Listing 1, which creates a simple table in memory. When sqlite_query() is called, the provided query is executed against the SQLite database specified. Depending on the type of query being executed, a result set may or may not be created (for instance, during a SELECT statement). These results, if they exist, are then buffered so that they can be accessed in a nonsequential fashion if desired. Depending on the circumstance, there may be no need to access the results of a query in a non-sequential fashion and therefore buffering the results is an unnecessary waste of server resources. For these situations, SQLite also provides the sqlite_unbuffered_query() function. This function has a purpose, result, and syntax identical to sqlite_query(), but data returned from a call to sqlite_unbuffered_query() is not buffered.
Escaping special characters As is the case with other database packages, data being inserted into the database may contain characters which hold special meaning to either SQLite or SQL in general. Since attempting to insert data containing these raw characters can produce both failures and security risks, measures must be taken to encode data before being inserted into the database. For this purpose, SQLite provides the sqlite_escape_string() function which accepts a single parameter (the string to encode) and returns an encoded version of that string which is acceptable to insert into the database. Its syntax is as follows: sqlite_escape_string($string);
As is the case with any database, it is strongly recommended that all data be encoded before being inserted into the database, especially if it has come from a potentially insecure source such as the user or the environment. When reading data back from the database in a result set, SQLite will automatically decode the data—as a developer your only concern is ensuring it is encoded before an insert or update query is performed. Retrieving result sets Now that you are familiar with the process of performing queries in SQLite, let’s look at how data can be retrieved from a database after a SELECT statement has been performed. As is the case with the act of performing queries, SQLite provides a number of different
35
FEATURE
An Introduction to SQLite
methods of retrieving data from a query. We’ll look at some of the most useful of these methods now. The most basic way to retrieve data from the database is to use the sqlite_fetch_array() function, which will retrieve the current row of the result set in the form of an array. The nature of this array is determined by how the function is called. The syntax is as follows: sqlite_fetch_array($result_r[, $array_type [, $decode]]);
Where the $result_r parameter is a result resource returned from sqlite_query() (or similar). The first optional parameter, $array_type, can be one of the SQLITE_ASSOC, SQLITE_NUM, or SQLITE_BOTH constants, indicating the type of array to return (assocative, numeric, or an array with both). By default this value will be set to SQLITE_BOTH. The final parameter, $decode, is a boolean indicating if SQLite should automatically decode the data in the row (default) or if it should leave the data in the encoded form. In general, this parameter shouldn’t be used, and is only useful if (for instance) moving particular rows from one table or database to another where encoding should be preserved. This function operates identically to the common mysql_fetch_array() function in the sense that repeated calls will return each row within the result set sequentially until no results remain (in which case the function returns false). To illustrate the use of this funcListing 2 1
November 2003
●
PHP Architect
●
www.phparch.com
tion, consider the following example in Listing 2 which populates and queries a table in memory. Note that in Listing 2 both sqlite_query() and sqlite_unbuffered_query() were used. In this case, either query could have been performed using an unbuffered query since the sqlite_fetch_array() function can only return results sequentially. Both were used only to provide an example of use for each. Often developers prefer to access data from a result set in its entirety within an array. Using what we have learned thus far, this could be accomplished with the following small code snippet:
To simplify the life of the developer, SQLite provides a single function which executes a query and returns the entire result set in an array of the desired structure. This function is called sqlite_array_query(), and has the following syntax: sqlite_array_query($db_r, $query [, $array_type [, $decode]]);
Where, as expected, $db_r is the database resource, $query is the query to perform, $array_type is the type of array to return, and $decode is a boolean indicating whether the data should be automatically decoded. Another convenience function provided by SQLite is the sqlite_fetch_single() function, which is useful for returning the first column and row within a result set. This function is useful for single-row, single-column result sets which do not require all of the extra code involved with the process of retrieving more complex result sets. Rather, when executed this function simply returns a string representing that single column of the result set. The syntax of this function is as follows: sqlite_fetch_single($result);
$result is the result set to retrieve the result from. If, for whatever reason, this function does not succeed, a boolean false will be returned.
Counting Result Sets SQLite provides three functions which allow the developer to determine a number of statistics regarding the results of a particular query. The first of these functions is the sqlite_num_rows() function, which as its name implies determines the number of rows in the provided result set. The syntax for this function is as follows: sqlite_num_rows($result);
Where $result is the result set to count. SQLite can
36
FEATURE
An Introduction to SQLite
also tell you how many columns there are in a given result set using the sqlite_num_fields() function, which has a similar syntax: sqlite_num_fields($result);
another table). This value can be determined using the sqlite_last_insert_rowid() function which accepts the database resource as its single parameter as shown: sqlite_last_insert_rowid($db_r);
For statistics about queries where a result set is not available (such as when an UPDATE or DELETE statement is executed), SQLite provides the sqlite_changes() function. Unlike the two previous functions (which worked on a particular result set), the sqlite_changes() function returns the number of affected rows (rows deleted or updated, for example) in the last query only. The syntax for this function is as follows: sqlite_changes($db_r);
Where $db_r is the database resource to retrieve the statistics for. Retrieving the last ID inserted in an auto incrementing column Earlier in the article, I discussed the concept of autoincrementing columns. Under certain circumstances, it is useful to be able to determine the last integer used during an INSERT statement. This is usually for use with a follow-up query, such as inserting that value into
As
is
the
case
with
the
aforementioned
sqlite_changes() function, this function will retrieve
the most recent automatically-generated integer primary key for the last query performed against the database. Dealing with errors when they occur SQLite provides a standardized method of retrieving information regarding an error which occurred that applies to all SQLite functions except those responsible for opening a database connection, such as sqlite_popen(). When an error does occur opening the database connection, it can be captured by passing a reference to a variable as the third parameter to the connection function. This variable would then contain the relevant error message. For all other functions, SQLite assigns both a numeric error code and a string describing the error which occurred. These error values can be retrieved through the use of two functions sqlite_last_error() and sqlite_error_string(), respectively. The syntax for
Table 1
Constant
Description No error occurred SQLite error (or database not found) An internal SQLite error Access permission denied Callback routine aborted The database file is currently locked A table within the database is locked SQLite memory allocation error An attempt to write to a read-only database Interrupted operation An file I/O error has occurred The specified database is corrupted Database is full Could not open database file Database lock protocol error The database schema changed Too much data for a single row Abort due to constraint violation Data type mismatch Authorization denied sqlite_step() has another row ready sqlite_step() has finished executing
sqlite_last_error() is as follows: sqlite_last_error($db_r);
Where $db_r is the database resource to retrieve the error for. When executed, this function will return a numeric error code representing the error which corresponds to a built-in constant. A list of these constants and their meanings can be found in Table 1. These constants can be useful for writing scripts which intelligently handle errors that may occur during the execution of your SQLite-based scripts. To the enduser, though, these constants are poor for actually describing the error which occurred. For these purposes, SQLite also provides a function which translates these error codes into human-readable error messages suitable for displayed to the user. This translation is done using the sqlite_error_string() function: sqlite_error_string($error_code);
$error_code is one of the constants defined in Table 1.
Closing Although we have covered a lot of material here, the reality of the situation is that we have really barely scratched the surface of the SQLite extension. With
what I have discussed today, though, you should be familiar enough with SQLite to get started writing SQLite-based applications. Although PHP 5 hasn’t even been released (outside of beta), SQLite promises to be widely used in the future. If you’d like to learn more about it, you can read about the library itself at http://www.sqlite.org/, or by visiting the SQLite section of the PHP manual at http://www.php.net/.
About the Author
?>
John Coggeshall is a PHP consultant and author who started losing sleep over PHP around five years ago. Lately you'll find him losing sleep meeting deadlines for books or online columns on a wide range of PHP topics. You can find his work online at O'Reilly Networks onlamp.com and Zend Technologies, or at his website http://www.coggeshall.org/. John has also contributed to Apress' Professional PHP4 and is currently in the progress of writing the PHP Developer's Handbook published by Sams Publishing.
Click HERE To Discuss This Article http://forums.phparch.com/60
Have you had your PHP today?
• Subscribe to the PRINT edition • Subscribe to the ELECTRONIC edition
Visit us at http://www.phparch.com and subscribe today.
November 2003
●
PHP Architect
●
www.phparch.com
php|architect
38
P R O D U C T
R E V I E W
PHPEclipse ( http://phpeclipse.sourceforge.net) by Eddie Peloke
W
hen the development team where I work was given the go-ahead to take on our first Java project, we were given one vital rule: ”It shouldn’t cost anything!”. With that we quickly ruled out many commercial IDEs. A few of us had used Eclipse in the past so it was a fairly easy choice to consider it for our editing environment. For those unfamiliar with Eclipse, it is an open source development IDE which touts itself as “an open extensible IDE for anything and nothing in particular”. Eclipse is—at its core—a Java development IDE, yet it has a great capacity for plug-ins which makes it highly customizable. That is where the PHPEclipse plug-in comes into play. Learning Java during the day and doing my own PHP development at night, I was intrigued by the idea of using one IDE for both. Why Bother? Ok, I see the look on your face, “Why use another PHP IDE?” you ask. “There are now a handful of PHP IDE’s out there, and a few of them are free, so why bother?” A valid question, and if you are currently using a feature-rich environment such as Zend Studio or PHPEdit, you may find PHPEclipse under-powered for your needs. The PHPEclipse team, however, feels the benefit lies in “integration”. There’s a huge library of Eclipse plug-ins (the Eclipse plug-ins site boasts 389 at the time of this writing), and more are constantly being added. The idea being that we (developers) can use one IDE for coding, database modeling, and deployment. If you are currently using several different development envi-
November 2003
●
PHP Architect
●
www.phparch.com
ronments for all the pieces of the development cycle, Eclipse is worth a look. Warning, Fine Print Ahead.. First, before we jump into PHPEclipse I want to provide my disclaimer. There are, as stated previously, so many Eclipse plug-ins that you can mold it to do just about anything. The PHPEclipse site, for example, provides a Tidy plug-in and there is also a PHPEclipse debugger plug-in provided elsewhere. For the scope of this review, I only installed two plug-ins: the PHPEclipse core and the PHPEclipse help. Now, let’s get started! Plugging In In order to run the PHPEclipse plug-in, you obviously need the base Eclipse application installed. The latest version of PHPEclipse requires Eclipse 2.1, so be sure your version of Eclipse is current. Once you have successfully installed Eclipse and the JDK (remember, it’s a Java application), installing the PHPEclipse plug-in is a breeze. Most Eclipse plug-ins are installed by merely unzipping the plug-in code into the Eclipse plug-ins directory, PHPEclipse is no exception. Once the plug-in has been unzipped into the plug-ins directory, all that is left to do is to restart the Eclipse application. Eclipse will notice the plug-in and should take care of the rest. Within a few minutes of download, the PHPEclipse plug-in should be successfully installed and running.
40
PRODUCT REVIEW Set Up Before coding begins, there are a few configuration options that should be set. If the install was successful, you will notice a new preferences section in Eclipse for PHP. The options allow you to tell Eclipse where PHP, Apache, and MySQL all live and the commands used to start them. Once you are in the PHP perspective of Eclipse, you will notice buttons in the tool bar which allow you to start/stop Apache and start MySQL. I found these very useful. (Note, I did not notice support of other databases other than MySQL.) In the preferences window, you also have the ability to set various coding options such as syntax highlighting color, language preferences, line numbers and so on. Let’s Code! Once you are all set and ready to go, you will notice Eclipse has a nice clean layout. The IDE provides you with the directory structure, a tabbed code editor, an
PHPEclipse
outliner and a console window. The navigator window is nice, it provides you with a quick look at all of your files regardless of type. A quick double-click on any of the files in the navigator quickly opens the code in the code inspector, and an attempt is made to load a preview of the PHP page in the console window (if Apache is running). Eclipse also puts a nice visual representation of the code into the outliner. Right clicking on any navigator file gives you the options, along with the standard Eclipse options, to run the file in the browser, run an external PHP parser, or run a PHP obfuscator. I find the outliner especially useful. If, for example, you have several classes in a script that you are currently editing, the outliner will show each class in a collapsible view. When expanded, the view shows the data members and functions of each class. Double clicking on any of the data members or functions in the outliner will automatically throw you to that line in the code
Figure 1
November 2003
●
PHP Architect
●
www.phparch.com
41
PRODUCT REVIEW inspector. This can be very useful when you are working with a large script with many classes and functions. The code editor is where PHPEclipse falls short of many other PHP IDE’s. It does provide syntax highlighting for *.php, *.php3, and *.php4, but is missing support for the latest PHP5 additions. PHPEclipse is making an attempt to add code help with what it calls “experimental” code shortcuts. With some functions, hitting ctrl+space brings up suggestions on code and pops in the remainder of the function. This is a great idea but it doesn’t work for many of the functions. One powerful feature of PHPEclipse is the ability to set up templates. Templates allow you to define code snippets that can then be re-used throughout your code. Think of it like code complete where you can define the completions. Where Am I? If you have elected to install the PHPEclipse help, you will notice it is accessible via the Eclipse help menu (as you might expect). It provides a good resource for PHP—although it is a little out-dated—but it is of little help with PHPEclipse itself. The PHPEclipse site contains a few helpful tips on getting started, but is also short on good documentation. One nice feature is that you can quickly jump to the help section for a particular function by simply placing your cursor within the
PHPEclipse
function, right clicking and pulling up PHP help. What I Liked There are many things about the general Eclipse platform that I like, and which PHPEclipse inherits. I like the general look and feel of Eclipse. Windows are wellplaced and work well together. The preview window is a nice addition as it allows you to quickly see what your page will look like through the browser. PHPEclipse also gives you the power to start and stop Apache and MySQL which is nice. Eclipse also does a good job of integrating CVS, making working with a team quite easy. My current PHP editor is simply a text editor, so I also don’t mind the light feel of PHPEclipse. What I Didn’t Like PHPEclipse is still not a mature project. Features like full-fledged code completion and syntax highlighting for PHP5 functions would go a long way towards improving its overall functionality. The Eclipse platform itself can sometimes be slow and resource-intensive, which can be troublesome on older machines. PHPEclipse is making an effort with code help but the “experimental” features can sometime prove more irritating than helpful. As mentioned the code help doesn’t work for all functions which can sometimes lead you to hitting ctrl+space several times before giving up and
Figure 2
November 2003
●
PHP Architect
●
www.phparch.com
42
FEATURE
PHPEclipse
conceding that the function isn’t working yet. I was disappointed not to find a debugger built into the core of PHPEclipse. I realize that there is one available as another plug-in, and maybe the idea of adding on to the core goes against the theory of allowing the user to plug-in what they want, but it would still have been nice to have it built in. Last but not least, PHPEclipse could benefit from a huge dose of documentation. I admit, I usually jump right in and build the new entertainment unit or Christmas gift without even opening the instructions, but they can be useful when I have to decide what to do with the leftover parts. PHPEclipse is no different. Many of the features are self-explanatory but a few of the features, such as templates, could use a little more information on the help menus. Conclusion This is a very young project with a bright future. By using the Eclipse IDE as its base, it benefits from a mass of developers and large companies. As these companies continue to improve the Eclipse platform, PHPEclipse will also improve. The idea of “integration within a standard platform” has a lot of merit. The downside is that if you don’t like
the core platform, you probably won’t like many of its pieces. I personally do not find this to be the case with Eclipse. It has a lot of nice features, and its wealth of plug-ins allow you to do just about anything. There is even a plug-in that “aims at making distributed pairprogramming happen”, as well as several databaserelated plug-ins. If PHPEclipse continues to refine and improve its features, it could easily be a major contender in the arena of PHP IDEs. The idea of one environment for everything is both intriguing and powerful at the same time. I just don’t have enough time (read: “I’m too lazy”) to learn how to use more than one IDE. Maybe that is why I am still using a text editor.
php|a
Figure 3
November 2003
●
PHP Architect
●
www.phparch.com
43
Working with PEAR::XML_Serializer
F E A T U R E
by Stephan Schmidt XML is cool, but working with XML can often be a pain. Or at least it used to be in the dark ages, when you had to use complicated or silly APIs like SAX and DOM to access XML documents, or had to use a legion of echos or string functions to build an XML document. But today a new hero has arrived and he has promised to get rid of all your XML-related problems.
“Is
it a bird? Is it a plane?”, you may ask. “No!”, I would reply, “It’s XML_Serializer and his sidekick XML_Util.” Since the dawn of mankind, developers have had to create XML documents; but since the day that sales managers stumbled upon buzzwords like content syndication and data exchange, developers have to create even more XML documents, often from data that is stored elsewhere or that was just computed. Creating an XML document manually Think of your own small website where you publish news. Being a developer, you want to show that your site is up-to-date. To this end you decide to supply an RSS feed for your news so your friends can include your news in their sites. At first glance this may seem simple: you select the data from the database, write a small loop, and implement some sprintf()’s or other string methods. But as you take a closer look, problems arise; things like remembering to replace entities like “<”, “>” and “&” in your data, ensuring the XML declaration is present and specifies the correct encoding, and so on. When finished you will have 50 or more lines of PHP code and the whole task took at least an hour. And the worst thing is that this script will only create an RSS file. If instead you decide to use PEAR’s XML_Serializer to create your XML documents, you will follow this simple three-step process: fetch the data, let XML_Serializer create the XML document, have a coffee. This article will show you how to use XML_Serializer
November 2003
●
PHP Architect
●
www.phparch.com
to create XML documents from PHP data structures like arrays or objects without knowing anything about XML. It will also show you that it is possible to go the the other way, using XML_Unserializer to create PHP data structures from XML documents. Installing XML_Serializer XML_Serializer is a PEAR1 package. That means it can easily be installed with the PEAR installer that has been bundled with PHP since version 4.3. All you have to do is to open a shell and type “pear install XML_Serializer” and you should see the following output: -sh-2.05b$ pear install XML_Serializer downloading XML_Serializer-0.9.tgz ... ...done: 12,592 bytes install ok: XML_Serializer 0.9
NOTE: You may have to install the XML_Parser and the XML_Util packages first.
The version number may differ as XML_Serializer is still being developed—you can find a changelog, bug lists and other information on the package homepage2. If the include path in your php.ini file is set properly,
you should be able to include XML_Serializer in your projects: require_once ‘XML/Serializer.php’;
The API The API of XML_Serializer is quite simple. Basically, you instantiate a new object, pass some options to the constructor that tell XML_Serializer how the data should be serialized and than pass the data that should be serialized to XML_Serializer::serialize(). If no error occurs you can fetch the resulting XML document with XML_Serializer::getSerializedData(). A basic example looks like Listing 1. Listing 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
If you run Listing 1, the output should be: <array> 42 <main>Auth.php DB.php
are what makes XML_Serializer so powerful. Changing the layout In the above example the tags have not been indented, so it’s quite hard to view the structure at first glance. To add indentation to the document, you just have to set an option before serializing the document: $serializer->setOption(“indent”, “
“);
This will force XML_Serializer to use four spaces per nesting level to indent the tags, which results in the following XML document: <array> 42 <main>Auth.php DB.php
Now the structure can be recognized by just looking at the document. You may supply any string for this option, but in most cases it will be one or more spaces, or maybe a tab character. By default XML_Serializer uses “\n” for linebreaks—you may use the “linebreak” option to change this. Adding type hints In the above example we used an integer as well string values, but it all looked the same in the XML document. It is impossible to determine whether the value inside the tag was the integer value 42, or a string containing “42”. If you need this kind of meta information in your documents, you should enable type hints: $serializer->setOption(“typeHints”, true);
The resulting document with the same input will be: NOTE: If you are running the script in a web browser you should take a look at the sourcecode of the resulting page, as the rendering engine of the browser will hide the tags.
Let’s take a look at this XML document. The root tag is <array>, which is quite logical as we supplied an array. The next tag is , the same as the first key in the array. Inside the tag there is “42”, which was the value of “foo” in our array. If you traverse the document, you will see that it follows the same pattern for all the array keys. You can see that by default XML_Serializer will recursively walk through the data and use the keys as tag names and the values as tag content. All of the following examples will look the same, except the part where a new XML_Serializer object is created. This is where you may pass the options for the serialization (in fact, you may also set them at a later point with XML_Serializer::setOption()), and the options
As you can see, each tag now has an attribute “_type”, that contains the type of the values enclosed in the tag. To change the name of the attribute you may set the “typeAttribute” option to any value you like. Serializing objects and indexed arrays Until now we have always been serializing a simple array. In this next example I will show you how XML_Serializer is able to serialize more complex structures. Take a look at Listing 2. Here, our data is an object of the built-in stdClass class. $data = new stdClass; $data->foo = 42;
Furthermore we set a new option that defines a default tag name for properties that have no key: $serializer->setOption(“defaultTagName”, “anyTag”);
The rest of the script is the same as Listing 1, since XML_Serializer::serialize() accepts any kind of data
structure. The resulting XML should be: <stdClass _class=”stdClass” _type=”object”> 42onetwo
There are three things that need to be explained in this example: • When the data structure is an object, the root tag will be the class name of the object. • When serializing an object with type hints enabled, a “_class” attribute will be added to the tags. The name of the attribute can be changed by setting the “classAttribute” option. • When a key cannot be used as an XML tag name (it is an integer key or contains invalid characters like spaces) a default tag name will be used. If type hints are enabled, the original key will be stored in the “_originalKey” attribute of the tag. The attribute name can be changed with the “keyAttribute” option. Emulating simpleXML simpleXML is an extension by Sterling Hughes that will allow you to modify XML documents by supplying a native PHP object structure that represents the XML document. This is one of the mosted hyped extensions of PHP5. The next example shows you how to use XML_Serializer so it will look like simpleXML. The only thing that has to change is how indexed arrays should November 2003
●
PHP Architect
●
www.phparch.com
be treated. Let us take a look at Listing 3. The data structure is as follows: $data = new stdClass; $data->item = array(“zero”,”one”, “two”, “three”);
We have an object with a property “item” that contains four values. Basically, that means we’ve got not only one item, but four items. When serializing the document using Listing 3 we get something like this: <stdClass> <XML_Serializer_Tag>zero <XML_Serializer_Tag>one <XML_Serializer_Tag>two <XML_Serializer_Tag>three
Although it represents the data structure, it is different from the way it looks in PHP. To access the value “two” you would use a path like stdClass.item.XML_Serializer_Tag, which is not what we want as it differs from the original PHP data structure. That’s why I implemented the mode option that can be set to “simplexml”. $serializer->setOption(“mode”, “simplexml”);
If you add this line to Listing 3, and run the script again, you will get this XML document: <stdClass> zeroonetwothree
A real life example As mentioned in the first paragraph, a typical usage of XML_Serializer might be creating RSS files. In Listing 4 you will find a simple array structure that contains the data for the RSS document. Furthermore, you can see how XML_Serializer is used to create an RSS document from the array. This example also shows how several options can be set at once by supplying them with the constructor. The biggest part of the work in this example is the creation of the array that contains all information for the RSS,
46
FEATURE
Working with PEAR::XML_Serializer
and this part would later be automated, and probably fetched from a database. The resulting XML document is shown is Listing 5. In this example you can also see that it is possible to change the root tag name, and even add attributes to the root tag by using the options “rootName” and “rootAttributes”. More features XML_Serializer has even more features. It is possible to specify that scalar values be serialized as attributes instead of nested tags, that linebreaks should be included in tags with several attributes, and so on. Just take a look at the examples included in the PEAR package or the sourcecode, which is well documented. Turning it around What use would the PHP function serialize() have if there was no unserialize()? The same applies for XML_Serializer. It would be silly to serialize data to an XML format if there was no way to get the data back from an XML document. That is why XML_Unserializer was developed and included in the XML_Serializer package. It is as easy to use as XML_Serializer, and basically has two modes to operate in: • It can be used to unserialize an XML document that has been created with XML_Serializer with the type hint option enabled. This way it can be used to replace serialize() and unserialize().
• It can be used to read any XML document and create arrays or objects from it. Unserializing XML_Serializer results At first I will explain how to unserialize XML documents that have been created with XML_Serializer and type hints enabled. As an example, I will unserialize a document that we created in one of the XML_Serializer examples. This document is shown in Listing 6. All that needs to be done is to include the XML_Unserializer, create a new instance and hand over the XML document. Have a look at Listing 7 for the unserializing code. Listing 5 0.91”> JLA Distress Calls J http://www.dccomics.com h Doomsday is destroying MetropolisD Superman has already reacted but needs <description>S help! Rogues loot Flash MuseumR Captain Cold has been sighted in Keystone <description>C City
Listing 6 stdClass” _type=”o object”> <stdClass _class=”s integer”>4 42o one t two
The output of Listing 7 is as follows: stdClass Object ( [foo] => 42 [bar] => Array ( [0] => one [1] => two ) )
Figure 1
If you look back at Listing 2, you’ll see that the above output is exactly the same data we used to create the XML document in the first place. Note that if you used one of the options “keyAttribute”, “typeAttribute” or “classAttribute” in XML_Serializer to change the attribute names for type hints, XML_Unserializer will not be able to restore the original types or keys unless you set the same options for XML_Unserializer. For example, if instead of using “_originalKey” in Listing 6, you had used “myKey”, you would need to set the following in the unserialization script: $unserializer->setOption(“keyAttribute”, “myKey”);
The second parameter to XML_Unserializer::unserialize() indicates whether the first parameter should be treated as a filename. If you are creating an XML document on-the-fly, you may pass the complete XML string and just supply false as the second parameter. The drawbacks of using XML_Serializer instead of the native PHP functions are that creating and parsing XML will take more time than using serialize(). Furthermore, XML_Serializer does not yet support references. This feature is scheduled for a future release. There is one big advantage to replacing the native PHP functions with XML_Serializer: The resulting data can be easily interpreted by humans. Reading custom XML files A more interesting feature of XML_Unserializer is that it can be used to read any XML document you like, and in most cases you will get a PHP structure (arrays and/or objects) that you can use in your applications. So it is “Goodbye writing custom XML parsers” and “Hello concentrating on the application” when you are using XML_Unserializer. Let us take a look at an example XML file that contains a listing of (fictitious) php|architect issues, shown November 2003
●
PHP Architect
●
8” editor=”J Joe Somebody”> 2 <articles> PHP vs ASP <article>P Dummy article <article>D 9” editor=”J Jane Somebody”> 2 <articles> PHP vs Java <article>P PHPCon 2003 <article>P
in Listing 8. When reading this document with the XML_Unserializer script in Listing 7 (after changing the input XML document to “listing8.xml”), you will get the structure shown in Figure 1. The whole document will be represented by an array. If a tag contains more than just Cdata (i.e. it is a complex structure), it will also be converted to an array. If a tag contains no tags (i.e. it is scalar), it will be converted to a string. In both cases, tag names will be used as keys in the associative arrays. Furthermore, you will notice that if a tag is used more than once in the same context (an article, for example), an indexed array containing all values will be created. In this example, all attributes (“number” and “editor”) are being ignored. In the example the tags can be identified by the “number” attribute which functions as some kind of primary key. In the result array, this is not possible. This can easily be changed by using the “keyAttribute” option, as follows: $unserializer->setOption(“keyAttribute”, “number”);
By setting the “keyAttribute” option to “number”, XML_Unserializer will look for an attribute called “num-
48
FEATURE
Working with PEAR::XML_Serializer
ber” and use its value as a key for the array, like so: Array ( [8] => Array ( [date] => 2002-08-01 ... ) [9] => Array ( [date] => 2002-09-05 ... ) )
The “editor” attribute is still being ignored, although it contains some important information. Since version 0.8, XML_Unserializer is able to parse attributes of tags and include the value in the data structure. You can enable this by setting the option “parseAttributes”: $unserializer->setOption(“parseAttributes”, true);
Furthermore, it is also possible to force XML_Unserializer to create objects instead of arrays by setting the “complexType” option to “object”. If you set these options in the example, the output of the script will be as shown in Figure 2. If you had to implement an XML parser that would build a data structure like this from your XML document, it surely would have meant 100 lines or more of code, and at least 2 hours of work. With XML_Unserializer it can be done in 10 lines of code and 5 minutes of work. Serializing and unserializing objects By default XML_Unserializer creates arrays from nested tags. As the previous example showed, it is also possible to create objects by changing the “complexType” option. The resulting objects are all objects of the class Figure 2 stdClass Object ( [8] => stdClass Object ( [number] => 8 [editor] => Peter James [date] => 2002-08-01 [articles] => stdClass Object ( [article] => Array ( [0] => PHP vs ASP [1] => Dummy article ) ) ) [9] => stdClass Object ( [number] => 9 [editor] => Peter James [date] => 2002-09-05 [articles] => stdClass Object ( [article] => Array ( [0] => PHP vs Java [1] => PHPCon 2003 ) ) ) )
November 2003
●
PHP Architect
●
www.phparch.com
“stdClass”. But what use is it to create objects, when these objects do not have any methods that work with the data in your XML document? This last example of this article will show you how to develop objects that can be perfectly serialized and unserialized with XML_Serializer. Listing 9 shows the defintions of two classes: “superhero” and “superheroTeam”. A superheroTeam has a name, an abbreviation, and several members. Each of these members is an instance of superhero. Listing 10 Listing 9 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
name = $name; } function setRealname($name) { $this->realname = $name; } function setPowers($powers) { if (!is_array($powers)) { $powers = array($powers); } $this->powers = $powers; } function addPower($power) { array_push($this->powers, $power); } } /** * superhero team * * @access public */ class superheroTeam { var $abbrev; var $name; var $members = array(); function setAbbrev($abbrev) { $this->abbrev = $abbrev; } function setName($name) { $this->name = $name; } function setMembers($members) { if (!is_array($members)) { $members = array($members); } $this->members
= $members;
} function addHero($hero) { array_push($this->members, $hero); } } ?>
49
FEATURE
Working with PEAR::XML_Serializer
shows you how to create instances of superheroes and to build a team of them. The superheroTeam object is then passed to XML_Serializer and serialized into an XML document. When serializing the object, all properties will be converted to XML tags with the property values as the content of the tag. The output of the serialization is shown in Listing 11. Before an object is serialized, the magic function __sleep() is called so that you can get rid of properties that do not need to be serialized, or close existing database connections or files. All properties are then fetched and serialized using class names or property names as tag names. The resulting XML document can be easily interpreted or edited by humans. Creating your superhero team from XML Now that you have implemented your classes and tested whether they are correctly serialized, it is possible to Listing 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
Listing 11 <superheroteam> JLA J Justice League of America J <members> SupermanS Clark Kent C Flight <powers>F Superhuman strength <powers>S <members> The FlashT Wally West W Superspeed <powers>S
November 2003
●
PHP Architect
●
www.phparch.com
create new superheroes or teams from XML documents. Listing 12 shows how to do that. The XML document first has to be created somehow—in this simple example we just assign it to a variable. Along with Superman and The Flash, we added a new hero called Aquaman. Then we create a new instance of XML_Unserializer and pass two options in the constructor: 1. “complexType” will force XML_Unserializer to create objects instead of arrays. 2. “tagMap” can be used to map a tag name to a different name. In our example this means that everytime a tag called “members” is encountered it will be treated like “superhero” has been found in the XML document. Before discussing what happens in the process of unserialization, let’s take a look at the result in Figure 3. As you can see, we get exactly what we wanted to: One “superheroTeam” object that has the properties “abbrev” and “name”, as well as “members”, which is a collection of “superhero” objects. But how does this work? 1. When XML_Unserializer unserializes nested Listing 12 1 6 JLA 7 Justice League of America 8 <members> 9 Superman 10 Clark Kent 11 <powers>Flight 12 <powers>Superhuman strength 13 14 <members> 15 The Flash 16 Wally West 17 <powers>Superspeed 18 19 <members> 20 Aquaman 21 Arthur Curry 22 <powers>Telepathic abilities 23 <powers>Commands sea life 24 25 ’; 26 27 $options = array( 28 “complexType” => “object”, 29 “tagMap” => array( 30 “members” => “superhero” 31 ) 32 ); 33 34 $unserializer = new XML_Unserializer($options); 35 36 $result = $unserializer->unserialize($xml); 37 38 if( $result === true ) { 39 $team = $unserializer->getUnserializedData(); 40 print_r( $team ); 41 } 42 ?>
50
FEATURE
Working with PEAR::XML_Serializer
tags to objects, it will check whether a class with the name of the current tag has been defined (which is why we include “listing9.php”). If such a class has been defined, a new instance of this class will be created. Otherwise an instance of “stdClass” will be created.
“you can create quite complex object structures from your XML documents without implementing a custom XML parser.” 2. Then it will traverse all child tags and use them as properties of the tag. In PHP4 all properties of an object are public, which means they can be set from outside of the object. With PHP5 this will change, and because of that you will be able to implement setters for the properties. That means XML_Unserializer will check for a method set{$tagname}() in the object. This is why methods like setPowers(), setName(), etc., have been implemented in the superhero object.
3. After all properties have been set, the magic method __wakeup() will be called, if it has been implemented. By using the tag map to map any tag to any class name, you can create quite complex object structures from your XML documents without implementing a custom XML parser. Future features XML_Serializer is still under heavy development, which means that in future releases several features will be added. Currently planned is 1. namespace support for XML_Serializer and XML_Unserializer 2. the detection of references, as well as resolving references in XML documents by using “id” and “idref” 3. custom serialization and unserialization of objects 4. custom serialization of complex structures to tags with attributes 5. automatic type conversion depending on the tag name and/or the current context If you are using XML_Serializer and need a feature, feel free to contact me at schst@php.net.
Figure 3
November 2003
If this method exists it will be called with the value of the child tag as the only parameter. If not set, then the method has been implemented, and it will simply assign the property.
Stephan Schmidt is a superhero comic devotee and web application developer from Karslruhe/Germany. He works for Metrix Internet Design GmbH (http://www.metrix.de), and as well as being a founding member of the open source project PHP Application Tools ( http://www.php-tools.net), he is an active contributor to PEAR. He has spoken at several international conferences and regularly contributes to PHP publications.
Click HERE To Discuss This Article http://forums.phparch.com/61 51
Implementing Web Server Load Management by Rodrigo Becke Cabral
F E A T U R E
How many users can your web server deal with? What if too many users decide to go for your services? This article presents a simple way for managing web server load by putting users in a queue and giving an estimate for how long they will have to wait.
Introduction Small- or medium-sized websites rarely need to worry about load management. Website access is sparse and never crowded. On the other hand, large websites usually have a good infrastructure and staff prepared for all possible contingencies. The operation must never stop, and the website is always ready to answer users’ demands. Scalability is part of the overall web server infrastructure planning, and the IT staff knows beforehand when they should use it. That is in an ideal world. More than once I have faced the problem of server overload. If you haven’t had the pleasure of dealing with this yourself, remember that “there’s a first time for everything”. Unexpectedly, it may affect the web server, the database server, or perhaps some out-of-reach behind the scenes third-part middle-tier server. Timeout after timeout, interconnected services begin to crumble, and depending on service topology, other applications not related to the web service are also affected. Understanding the Problem The dreadful source of web server overload is user behavior. Special events may draw more traffic than usual, forcing all servers to work harder, as during promotions, subscriptions, enrollments, surveys or other online offerings. You may underestimate how the public is going to react, and that is when the server overload occurs. Consider 20,000 users attracted for a special two-day offer on your website. You may think you can handle 10,000 of them a day, but what if all of
November 2003
●
PHP Architect
●
www.phparch.com
them plus their friends decide to go for it in the first hour on the first day? Will it work? How can you have a contingency plan for that? Common sense dictates that you don’t invest in a super server structure based purely on the threat of a potentially attractive online service. Let’s get to the specifics. What you really have to be concerned with is the number of concurrent requests, and how complex the processing is that you have to carry out for each. If your entire web application is extremely simple and fast, you are less likely to be worried about processing high numbers of requests. If you expect to have 3 requests in 10 seconds, and each request takes only half a second to be fully processed, the chance of those requests overlapping is minimal. Should each request take more than three seconds to be processed, though, you will certainly have your web server processing a minimum of two simultaneous requests in a 10 second time frame. You will also notice that, since your server has to deal with multiple requests, it will preemptively split its capacity amongst them, making each request take longer than usual to complete. Bottlenecks like memory capacity and/or disk I/O will become more evident. Eventually, being the consummate professional that you are, you will call for server load tests to determine
how many users your infrastructure can simultaneously accommodate with a minimum level of quality. The final question is what to do with this testing information. The answer lies in creating a mediation layer to prevent an excessive number of users from struggling for a piece of your pie. Simple as it seems, I will show you how to create an “online queue” to sort out requests should the number of simultaneous users increase beyond predetermined operational levels. Hopefully, you will never have to use it, but at least you can be confident that you will not face the overload issue unprepared. The Algorithm Users enter and leave the web site randomly. The web server has a large but finite capacity to support a given number of users. Should the number of simultaneous users accessing the web server escalate beyond safe levels, a mediation algorithm must step up. I recommend using an online queue algorithm to put overflowed requests in line, preventing more users than the web server can support from requesting service. As users in the server leave, other users in the queue may enter and use the service. For the online queue you will need to know two things: how long a single user occupies the web service–or the user service time, and how many users the server can simultaneously handle–or the max users. Usually, you will find that those are not deterministic measures, and can be fairly arbitrary. User service time is the time between when the user signs in and begins to use the service, to when he or she leaves. Sampling time from individual requests allows you to get an average and standard deviation for this variable. If you plot a frequency chart of collected data, you will notice that your variable is likely to follow a normal distribution function, as shown in Figure 1. I will use this measurement to define $servicetime as the first parameter in my algorithm. $servicetime is calculated as the average user service time plus some arbitrary safety margin. I estimate the safety margin based on the standard deviation and on how safe I want to be.
I recommend calculating the safety margin as two times the standard deviation of the user service time. That implies that I would be underestimating server usage in only 2.75% of web requests. Server load testing is another task you must carry out. For that, you will need to write a “robot” to start requests for all of your user web transactions. Run multiple instances of the robot until you notice that your service time begins to degrade. Again, you will get close to the “average” of how many users the server can safely handle. This is the second parameter I need, defined as $maxusers. I will now approach the problem in three steps: • Script Basics: first of all, I need to initialize parameters and control variables, and to label web requests—here I’ll lay down the basics for access control • Users in the Server: secondly, I need to estimate how many users are in the server, and the $servicetime parameter is the key to that • Users in the Online Queue: finally, when the server is working in full capacity, I must send new requests into the online queue, providing information like the current position in line and estimated time for service.
NOTE: The entire load management script can be found in loadmgmt.php. The listings that follow are pieces of that larger script, and contain the line numbers in loadmgmt.php where they can be found.
Script Basics All web requests share some common data needed to control how many users are in the server and in the online queue. This shared space can be easily set up using a unique session amongst all requests, as follows: 6 session_id( “LOADMGMT” ); 7 session_start();
I also need to initialize the parameters for the algorithm:
1*60; // in seconds // number of simultaneous users // timeout for queue, in seconds 2; // refresh time for browser, // in seconds
$servicetime and $maxusers are set to one minute and 3 users, respectively. The $timeout and $refreshtime parameters are related to access control. When a user is in the online queue, the web browser refreshes each $refreshtime seconds to inquire about its current position and estimated time for service. This action also works like a
November 2003
●
PHP Architect
●
www.phparch.com
53
FEATURE
Implementing Web Server Load Management
“keep-alive” signal, maintaining its spot in the queue. If no signal occurs within $timeout seconds, the user is purged from the queue. The script flow is essentially controlled by eight variables, as shown in Listing 1. $now is the current time for this web request. $sequenceval is simply an incremental counter used to uniquely label incoming requests. $running and $queue store the number of users currently in the server and in the online queue, respectively. $freetogo is a flag used to indicate whether service should be provided to a user yet. $myposition and $mywaittime are set when the request is in the queue, and the user must be informed of how many users are waiting before him or her, as well as the estimated wait time in the queue. The variable $mysequence is obtained from a cookie, and exposes the first restriction of the algorithm: cookies must be enabled in the user’s browser. $mysequence is saved in a cookie when set from $sequenceval in the first access, if the user was put in the queue. It is retrieved from the cookie in each “keep-alive” request to identify the user’s location in the queue. Notice that $sequenceval, $running and $queue are persistent amongst all sessions of the load management algorithm. Users in the Server The variable $running holds information on the number of users in the server. But how is it possible to know how many users are in the server? The answer is that it is not possible. Not without rewriting the whole application and keeping track of each user request during service; and if the user closes the browser, you still need to somehow expire the user’s connection. Another way is to go down further into the web server software, and to determine each live connection associated with each user, and so on. But then your code will become server dependent. Listing 1 29 $now = time(); 30 $sequenceval = $_SESSION[ “shared_sequenceval” ]; 31 $running = $_SESSION[ “shared_running” ]; 32 $queue = $_SESSION[ “shared_queue” ]; 33 $freetogo = false; 34 $myposition = 0; 35 $mywaittime = 0; 36 37 // mysequence is an uniqueid for a user that accesses the server 38 // if mysequence is from a cookie, then the user is in queue 39 if ($_COOKIE[ “mysequence” ]) 40 $mysequence = $_COOKIE[ “mysequence” ];
I decided to change the approach. Instead, I will use the estimated service duration to make an educated guess on when the user will leave. Will it work? Yes, but I will discuss that later on. For the moment, let’s see what’s in $running. $running is a FIFO (first-in, first-out) queue. It holds the times that users are expected to leave the server. Each new entry in $running is created as: $running[] = $now + $servicetime;
If $servicetime denotes how long the user will be in the server, adding it to $now will determine when the user is likely to have left. Furthermore, $running[i] will always be lower than $running[i+1]. When the server’s clock traverses the time in $running[i], the entry is removed from $running, and it is assumed that the user left. The end effect is a FIFO behavior. Listing 2 shows the FIFO algorithm. The expression count( $running ) < $maxusers indicates whether the server is within its capacity. If so, a new entry in $running can be registered. There are two conditions for that to occur: either the $queue is empty, or the current request is next in line to be served. Listing 3 shows the algorithm for admitting a user to the server, which means setting $freetogo to true. While the server is within capacity, count( $running ) increases with new requests and decreases when entries are due. When count( $running ) tops $maxusers, access is restricted to the “online queue”. Listing 3 65 // should the request be put in a waitline or is it free to go? 66 if (count( $running ) < $maxusers && empty( $queue )) { 67 $running[] = $now + $servicetime; 68 $freetogo = true; 69 } 70 71 // if it is not free to go, then check queue 72 if (!$freetogo) { 73 74 if (empty( $myposition )) { 75 // if $myposition is empty, consider it a new request to serve 76 $sequenceval++; 77 $mysequence = “S$sequenceval”; 78 setcookie( “mysequence”, $mysequence ); 79 $queue[ $mysequence ] = $now; 80 $myposition = count( $queue ); 81 } 82 else if (count( $running ) < $maxusers && $myposition == 1) { 83 // if current user is next in line, and the server is within capacity 84 // then move user from queue to server 85 array_shift( $queue ); 86 $running[] = $now + $servicetime; 87 $freetogo = true; 88 }
74 if (empty( $myposition )) { 75 // if $myposition is empty, consider it a new request to serve 76 $sequenceval++; 77 $mysequence = “S$sequenceval”; 78 setcookie( “mysequence”, $mysequence ); 79 $queue[ $mysequence ] = $now; 80 $myposition = count( $queue ); 81 }
54
FEATURE
Implementing Web Server Load Management
Users in the Online Queue The variable $queue stores data about the online queue. If $queue is not empty, it means you have begun managing server load. $queue is also a FIFO queue, and is structured as follows: consider the array pair $key => $value; $key is a sequence number to uniquely identify the user web request; and $value records the last time the web browser refreshed the queue page: $queue[ $mysequence ] = $now;
When a user first requests access and is moved to the $queue variable, $mysequence is derived from $sequenceval
and recorded in a cookie, as shown in Listing 4. If a user closes the web browser and gives up waiting in line, the time stamp in $queue[ $usersequence ] begins to age. After $timeout seconds the entry expires and is purged from the $queue array. Listing 5 shows the code that cleans up $queue. While purging old entries, if the current request is in $queue, its entry is refreshed and its position in line is recalculated ($myposition). As soon as the number of users in the server decreases to below capacity, the next user in line can be served. When this condition is met, and the script receives the refresh request from the candidate user’s browser the code in Listing 6 takes action. Listing 6 says that for a request in $queue, if the number of users probably running in the server is lower than the server’s capacity and this request is the first request in $queue, do the following: Listing 5 52 if ($queue) { 53 $newqueue = array(); 54 foreach( $queue as $usersequence => $lastrefresh ) 55 if ($now - $lastrefresh < $timeout) 56 if ($usersequence == $mysequence) { 57 $newqueue[ $usersequence ] = $now; 58 $myposition = count( $newqueue ); 59 } 60 else 61 $newqueue[ $usersequence ] = $lastrefresh; 62 $queue = $newqueue; 63 }
{ 83 // if current user is next in line, and the server is within capacity 84 // then move user from queue to server 85 array_shift( $queue ); 86 $running[] = $now + $servicetime; 87 $freetogo = true; 88 }
• remove the request from $queue • insert the request’s expected exit time in $running
• mark the request as free to sign in I can now show you how to estimate the time a user will have to wait should the service request from his or her browser be redirected to the online queue. $running holds the times representing when users in the server are likely to leave. Since $running is a FIFO, $running[0] marks the nearest time a user will leave. Hence, the expression $running[0] - $now defines in seconds when entry zero will expire. Therefore, the first user in $queue will be served in $running[0] - $now seconds. Likewise, the second entry in $queue will be served in $running[1] - $now seconds, and so on. Listing 7 shows how $mywaittime is calculated. After processing $running and $queue, and finding $myposition and $mywaittime for the current request (if it’s in $queue), the session data is saved as follows: 101 $_SESSION[ “shared_sequenceval” ] = $sequenceval; 102 $_SESSION[ “shared_running” ] = $running; 103 $_SESSION[ “shared_queue” ] = $queue;
Finally, the flag $freetogo indicates whether the request can enter the server or is in $queue, yielding a service or queue page. The service and queue pages Under normal conditions, when the user navigates to your web server a service page should be displayed marking the beginning of transactions. For a server overload situation, a queue page should be displayed to users, showing their position in line $myposition and the estimated time for service $mywaittime. Listing 8 shows the code necessary for displaying either the service or the queue page. Figure 2 shows a sample queue page. A load monitor Listing 9 contains a separate script used for showing the server’s load status. It does not affect any variable in the session, acting merely as a read-only script. It shows $running and $queue in detail. Figure 3 shows a screenshot from the load status script. The screenshot is a result of the settings and events shown in Figure 4. Figure 2
Listing 7 90 91 92 93 94 95 96 97
if (!$freetogo) { $myposition—; // so far, my position started in 1 $n = count( $running ); $mywaittime = $running[ $myposition % $n ] - $now + $servicetime * floor( $myposition / $n ); if ($mywaittime < 0) $mywaittime = 0; }
November 2003
●
PHP Architect
●
www.phparch.com
55
FEATURE
Implementing Web Server Load Management
Will it work? The background for the online queue algorithm is inspired by queue theory. It assumes that a service must be mediated to prevent the overload of your web server infrastructure. The algorithm assumes that the service has a clear beginning, a set of transactions, and an ending. Service time is defined as the duration of the service, and the algorithm assumes that service time has an average and a standard deviation that follows a normal distribution function. The more users are in the server, the nearer to population average the sample average of all service times will be. This effect will reduce chances of your estimates being wrong. How would you determine whether your situation complies with that? I’ve found that you don’t need to be so picky with details. Just think: “I have a customer satisfaction survey. There’s a prize involved, and the surFigure 3
vey will last for three days. The survey system is integrated with my company’s ERP, and the ERP team will develop the survey using CGI binaries (what a nightmare!). I trust that I will need to have some prevention for server overload. What now?” It’s easier than you think. Decide with your team how many customers can be filling out the survey at the same time—that’s the $maxusers parameter. Then take yourself through the survey and come up with a wild estimate for how long it will take a customer to fill it out—that’s the $servicetime parameter. When the survey begins, keep track of your infrastructure indicators: memory usage, processor usage, network throughput, database load, and so on. At any sign of strain, just lower $maxusers or increase $servicetime to give your server some room. On the other hand, if $queue is too long and your server can take more, increase $maxusers or decrease $servicetime. Keep trying until you have tuned in both parameters to your situation. Algorithm variations This algorithm is for one service, not many services in a website. That’s because each service will probably have a different user service time, and as such you will need to consider that and adapt the algorithm to deal with multiple service times. Also, think where exactly your bottleneck is. There is a well-known website that uses a very similar algorithm to handle the number of users simultaneously downloading files from them. To them, the resource being controlled is network bandwidth. For you, it might be to constrict database access, or to prevent overflow in your web server.
Listing 8 105 if ($freetogo) { 106 setcookie( “mysequence”, false, mktime(0,0,0,10,20,1972) ); 107 echo “Greetings! You have the server!”; 108 } 109 else { 110 ?> 111 112 113 114 Queue Status 115 <meta http-equiv=”refresh” content=”=$refreshtime?>”> 116 117 118 (pid: $mysequence)”; 125 ?> 126 127 128
November 2003
●
PHP Architect
●
www.phparch.com
Acknowledgments I would like to thank Professor Rafael Cancian from Univali (Brazil) for helping me test the suitability of the algorithm with simulation software so as to further understand the sensitivity of the “service time” and “max users” parameters.
About the Author
?>
Rodrigo is a Doctor (in Production Engineering) at the Universidade do Vale do Itajaí in Brazil, where he teaches Computer Science undergraduate courses.
Click HERE To Discuss This Article http://forums.phparch.com/63 56
Load Status <meta http-equiv=”refresh” content=”2”> users in the server: $usertime ) echo “ [$i] “.($usertime - $now).” seconds to leave ”; else echo “ none ”; ?> users in queue: $lastrefresh ) { $waittime = $running[ $pos % $n ] - $now + $servicetime * floor( $pos / $n ); echo “ [$sequence] “.($now - $lastrefresh).” idle. “; echo “using line [“.($pos % $n).”]. “; echo “$waittime seconds to enter ”; $pos++; } else echo “ none ”; ?>
Figure 4
November 2003
●
PHP Architect
●
www.phparch.com
57
P R O D U C T
R E V I E W
PhpED 3.2.1 Published by NuSphere Corporation by Marco Tabini
I
n the PHP world, Integrated Development Environments (IDEs)—or, as the majority of us common mortal refers to them, editors—seem to abound. There are many open-source applications out there of varying quality, as well as a few commercial packages contending for a slice of the proverbial pie. Among the commercial IDEs, the Zend Studio, published by Zend Technologies, and phpED, published by NuSphere Corporation, are perhaps the two most widely used ones. Since we published a review of the Zend Studio in last month’s issue of php|a, it’s only natural that this month we turn our attention to phpED. A First Look Like the Zend Studio, phpED is available on multiple platforms (Windows and Linux as of this writing). The CD we received contained both versions, and featured an easy-to-follow installation process. The package also included an integrated installation of Apache, MySQL, and PHP (and Perl), which the IDE uses to debug and profile applications. Once installed, the application asks you to send an email over to NuSphere in order to complete your registration and obtain a license to unlock the product. I was
“The IDE also includes a complete copy of the PHP Manual”
November 2003
●
PHP Architect
●
www.phparch.com
a little concerned about this process at first, but my response e-mail took less than ten minutes to arrive— impressive even by my standards. The main application interface is not unlike what most professional IDEs use, as you can see in Figure 1. The code editor provides syntax highlighting and code completion features (both for pre-defined language elements and for your own code), but no real-time syntax validation, which can be a plus if you’re making changes to a large file. The syntax highlighting, however, is very complete—it even highlights SQL code. One feature I particularly like here is the concept of “code templates”, which allow you create pre-defined code blocks and then type them into your script by simply pressing CTRL-J and choosing them from a pop-up window. For example, one of the default templates is an if-else block. Pressing CTRL-J and typing ifel creates the following snippet of code: if () { } else { }
There are a myriad of different templates, some of which don’t really sound useful until you actually start using them; insertdb, for example, creates a snippet of code for inserting a new row inside a MySQL database table, complete with error checking. If you’re up to it, you’re even free to create your own templates (or edit the existing ones) through the very complete (though somewhat confusing) Options menu.
58
PRODUCT REVIEW Additional Features In keeping with the current trend in PHP IDEs, phpED is well integrated with CVS; it allows for all the traditional CVS operations (which can be customized in the Preferences menu), and—at least in my case—worked without a hitch even through a secure SSH tunnel. The IDE also includes a complete copy of the PHP Manual, plus several other help guides, which are extremely helpful during the development process. These include an HTML reference, CSS manual, PostgreSQL developer’s guide, and even a CVS reference. Having these at your fingertips really makes a difference, especially in an offline setting. The “code explorer” provides a breakdown of your current script or your entire project (if you have created one), and can be useful for browsing your code at a high level, or for quickly finding a particular function. Additionally—and this is particularly good—phpED features the ability to connect to and manipulate either a MySQL or PostgreSQL database (Figure 2). Now, to a shell-addict, this may not make much difference (but then again, he’s probably still using VIM for all we know), but to someone who wants to take full advantage of a shell it’s truly a godsend. Gone are the days of continuously having to switch between the shell and
PhpED 3.2.1
your editor—or, even worse, having to depend on a web-based management system and all the potential security disasters it can expose. Debugging and Profiling phpED features both an integrated and external debugger (which runs on the local copy of Apache installed by phpED). The debugger works very well, allowing for the execution of a single file or an entire project. As with most debuggers, you can set breakpoints, execute your code one line at a time, “watch” any variables, and have the interpreter evaluate expressions for you. The profiler, which is accessible only once you’ve run a script through the external debugger (which, despite the name I’ve given it here, is completely integrated within the IDE), provides a clear overview of your script’s execution times, and can even produce a nicely formatted graph of the program’s execution results. What I Liked phpED is, without doubt, a very tightly integrated environment. Not only does it sport an impressive set of features, but it also puts them all at the developer’s fingertips (where—one might argue—they belong in the first
Figure 1
November 2003
●
PHP Architect
●
www.phparch.com
59
PRODUCT REVIEW place). With the exception of actually running the code in a production environment, I think I would be quite happy if phpED were permanently etched in my hard drive as the only application I can use to work on my PHP projects. What I Didn’t Like Being the kind of down-to-earth guy I am, I found the user interface a little overwhelming at first, but got used to it quite quickly (and removed the features I didn’t like). Additionally, the Options dialog box is very crowded—while it’s good to be able to customize an application so much, it might have been better to split the dialog box into more manageable chunks. Finally, the version of phpED that I reviewed does not (yet) support PHP5 syntax, although I’m sure that the NuSphere folks will add it very soon.
PhpED 3.2.1
Supported platforms for NuSphere phpED: Windows 98, ME, NT, 2000, and XP Redhat Linux (7.0 and higher on i686) System requirements: 64MB RAM (recommended: 128 MB) 80 MB disk space Price: $298.99 and up Special offer! Buy phpED from the php|a store today and get 10% off the retail price! http://www.phparch.com/store/phped Product Homepage http://www.nusphere.com/products/index.htm
Figure 2
November 2003
●
PHP Architect
●
www.phparch.com
60
T I P S
&
T R I C K S
Tips & Tricks By John W. Holmes
Logging Validation Errors An interesting concept was brought up on the PHP mailing list a while ago that involved logging the validation errors your scripts catch. By now, we all should be properly validating any data that the user provides, right? The last couple of issues had articles and tips on the problems malicious users could create if a proper validation is not done. So now that we’re all doing that, keeping track of what errors are encountered by your validation scripts could prove rather useful. By looking for trends in the errors, you may be able to determine if portions of your site are confusing to your user or have instructions that are hard to follow. If you notice that the same input box is frequently left blank and triggering an error, maybe the input box is not properly or clearly marked as being required. You may notice that users frequently input dates in a different format than you are expecting, so you can adapt your program to also accept that format. This probably isn’t a feature that you want employed all of the time, but as you roll out new applications—or new features to existing applications—a logging feature such as this can help ensure that everything is as clear to the user as you can make it.
A Little Sleep Does Your Login Good I picked up this little tip a while ago; I wish I could remember who told me about it. The tip involves using sleep() upon receiving an unsuccessful authentication attempt. The idea is that before any output is sent back to the user, you have your script sleep for a second or two before sending the failure message. In theory, a malicious user attempting to brute force your authentication now has to wait two seconds for each attempt, effectively shutting them down. Knowing that web servers were designed to serve thousands (hopefully) of pages at the same time, though, there’s nothing stopping the malicious user from trying thousands of passwords at a time and negating the effectiveness of your two-second wait. A malicious user could also realize that if they are not receiving a response after half of a second, then they can assume an authentication failure. While they may actually miss some valid login attempts because of network slowdown, they’ll still be able to try more passwords and reduce your two second wait down to a lower number.
“... a malicious user attempting to brute force your authentication now has to wait two seconds for each attempt...”
TIPS & TRICKS So, the resulting solution from all of that is to make everyone wait, valid authentication or not. Requiring a one or two second wait may be noticeable by some users, but it would only occur during the initial validation. Doing this would mean that the malicious user would also be required to wait the one or two seconds. Even though they still have the ability to try thousands of passwords at a time, making them wait a few secFigure 1 //Given array of: $array = array(1,2,array(‘a’,’b’,’c’)); print_r() output: Array ( [0] => 1 [1] => 2 [2] => Array ( [0] => a [1] => b [2] => c )
Returns “valid PHP code”? I’ve discussed the usefulness of print_r() a few times in this column. If you’re been reading along in the manual, you’ll notice that there are a couple of “sister” functions to print_r(), namely var_dump() and var_export(). An example of the output from each function can be shown in Figure 1. var_dump() produces a little more detailed output compared to print_r() as you get the size of the arrays and strings included in the output. You can also pass multiple arguments to it, making displaying multiple variables easy. However, var_dump() does not have a method to return the output to a string unless you use the output buffering functions. If you look at the output of var_export() and read the manual page on the function, you’ll see that it returns “valid PHP code”. The function itself works just like print_r()—you pass it a variable and an optional boolean value that controls whether the result is displayed or returned to a variable—but var_export() just returns “valid PHP code”. This got me wondering what the usefulness of a string formatted in this manner would be. One use of this function could be for a caching script. Listing 1 shows how you could take the output from var_export() and write it to a file with a little more information. The resulting file could be easily included into your script with include() and the original array would be recreated. The output from var_export() could also be stored all by itself into a file or database column and when retrieved, the eval() function could be used to recreate the array. An easy SQL caching system could be built that would retrieve the results of a query into an array and write the array to a file. Then, the file could be simply included or required into your script to recreate the result of the query, without hitting the database. A scheduled script (such as a cron) could easily clean up files that are over a certain age and if the file does not exist the next time the data is needed, it’s simply recreated with the most recent data from the query. If you have any other uses for the “valid PHP code” returned by var_export(), please let me know!
TIPS & TRICKS Catching Duplicate Entries You’ve probably come across the dilemma of having to insert a certain bit of data into a database and ensure that it’s not a duplicate. A practical example would be a user registration system; you want to insert the username and information into the database, but you need to catch the condition that the username is already being used. The first thing you should do is make your database maintain the data integrity for you and make your column UNIQUE (or its equivalent in your database) as shown in the example table creation below. This will put the database in charge of ensuring there are no duplicate entries in the “username” column, for example. Now you just need to catch the condition when you attempt to insert a name that’s already present. CREATE TABLE Example ( username VARCHAR(25) NOT NULL, UNIQUE(username) );
in question could be inserted into the database during this time by another script and can consequently cause the INSERT to fail. You could fix this by issuing LOCK and UNLOCK queries for the table around the SELECT and INSERT queries. This will prevent any INSERT from occurring between your two queries, but now you are executing four different queries and effectively shutting down the table for the duration of your queries. Since we already know the database is not going to allow a duplicate name to be inserted, a more practical solution would be to just perform a single INSERT query to add the new username. If the username does not exist, the INSERT query is executed and everything is fine. If the username does exist, though, you’ll need to catch the error that is raised and act accordingly. If you’re using MySQL, the mysql_error() and mysql_errno() functions come in handy for doing just this. Other databases will have similar methods for getting an error message or error number (SELECT @@ERROR AS ErrorCode in MSSQL, for example) that you can apply this concept to, also. Listing 2 shows how you can use the MySQL functions to check for an error number of 1062 (duplicate key) occurring after your INSERT. If any other error is encountered, then something else was a problem and you must then decide how to handle the error. If no error occurred, then mysql_errno() will return zero, and you can display a “success” message. You can actually extend this a step further and examine the content of mysql_error() to ensure you’re trapping the correct key having a duplicate. Figure 2 shows a MySQL session where an example table is created that contains three keys. The first key (primary) is the “id” column, the second is a UNIQUE “username” column and the last is another UNIQUE “email” column. You can see that if a duplicate username is inserted into the table, the error number is 1062 and the message references “key 2”. If a duplicate email value is
“...a more practical solution would be to just perform a single INSERT query to add the new username...”
Most “solutions” seen in applications involve first issuing a SELECT query to determine if the name exists. If the name does not exist (no results are returned), then the new username is inserted into the database. If a row is returned from the database it means the username is already in use and the user must choose another. The problem with this method is that a certain amount of time is going to elapse between the SELECT and the INSERT query. It is possible that the username Listing 2 $query = “INSERT INTO Example (username) VALUES (‘John’)”; $result = mysql_query($query); if($error_number = mysql_errno()) { if($error_number == 1062) { echo “Duplicate username, please choose another.”; } else { echo “Unknown error. Message: “ . mysql_error(); } } else { echo “Username successfully added”; }
Figure 2 mysql> CREATE TABLE Example ( -> id INT NOT NULL AUTO_INCREMENT, -> username VARCHAR(25) NOT NULL, -> email VARCHAR(50) NOT NULL, -> PRIMARY KEY(id), -> UNIQUE(username), -> UNIQUE(email) -> ); Query OK, 0 rows affected (0.06 sec) mysql> INSERT INTO Example (username, email) VALUES (‘John’,’[email protected]’); Query OK, 1 row affected (0.01 sec) mysql> INSERT INTO Example (username, email) VALUES (‘John’,’[email protected]’); ERROR 1062: Duplicate entry ‘John’ for key 2 mysql> INSERT INTO Example (username, email) VALUES (‘Mark’,’[email protected]’); ERROR 1062: Duplicate entry ‘[email protected]’ for key 3
November 2003
●
PHP Architect
●
www.phparch.com
63
TIPS & TRICKS inserted, the same error number is returned, but the message references “key 3”. So parsing the value of mysql_error() (which will be identical to what’s seen in the MySQL session) will ensure you’re catching a duplicate of the right key and delivering the correct message. Each database has its own method of returning an error message and/or an error number, so this solution can be adapted to any one of them. It is just a matter of becoming familiar with the error numbers and the syntax of the error messages. The overall benefit of this—and that’s what we’re really after—is that you’re only executing one query no matter if an error is encountered or not. Your users do not have to wait for you to do a SELECT (although the time should be minimal) and then an INSERT. You also do not have any conditions where multiple scripts operating at the same time could corrupt the database or deliver confusing errors back to your script and users. Send Me Your Tips! Now I know you’ve probably just flipped right to the Tips and Tricks column because it’s your favorite, but if you go back and read the other articles, you’ll notice that there is a huge variety of information and topics covered in each issue of php|architect. You may think you’re the only person using that obscure extension or
November 2003
●
PHP Architect
●
www.phparch.com
the confusing image manipulation functions or that latest IDE release, but you’re not. You’ve no doubt picked up a few little tips and/or tricks of your own as you learned how to use something associated with PHP, also. Now is your chance to share with everyone else and send your tip or trick to [email protected] to be included in our next issue. If your tip is published, you’ll get a free electronic version of the magazine (or one added to your subscription).
About the Author
?>
John Holmes is a Captain in the U.S. Army and a freelance PHP and MySQL programmer. He has been programming in PHP for over 4 years and loves every minute of it. He is currently serving at Ft. Gordon, Georgia as a Company Commander with his wife and two sons.
64
by Peter MacIntyre
Programming PHP; by Rasmus Lerdorf & Kevin Tatroe; Published by O’Reilly Paper Back, 469 pages $ 39.95 (US), $ 61.95 (Canada)
B O O K
R E V I E W S
Book Reviews
One of the positive aspects of all the buzz and energy surrounding PHP these days is the availability of so many different specialized books that focus on it. The flip side of this increased specialization, however, is that it’s not quite as easy to find a good “generalist” book that treats the language itself, without focusing necessarily on any particular use (or its association with another particular technology, like MySQL). Programming PHP does not get bogged down in the nitty-gritty of the PHP language, but it does help the
November 2003
●
PHP Architect
●
www.phparch.com
average user to generally understand all aspects of the PHP product. From the introduction of the language constructs to the integration of database access and extending PHP through add-on modules, all the “hot topics” are covered to the point that the reader can be productive with PHP in the first few hours of reading. Even the advanced topics like array handling, object oriented development, web security, and XML integration are all given a good airing in this book. The only slight disappointment in this title is that there are only a few pages devoted to the discussion on the use of cookies versus session state management. This is a topic that is part of a continuing debate on the best way to process data across multiple web pages (like a multi-page web survey), and it could have been given a little more attention here. Whatever the topic in this book if the reader wants to get more technical information, the authors do give ample web address locations or reference other books for further research and reading. If you are just getting into the dynamic web development world or you are considering migrating from another dynamic web product to PHP then Programming PHP is the book of choice to get you up, running, and productive in a short time. Programming PHP is a great overall reference for the fledgling dynamic web developer! ... If you are just getting into the dynamic web development world ... then “Programming PHP” is the book of choice to get you up, running, and productive in a short time.
65
BOOK REVIEWS Managing & Using MySQL by George Reese Paper Back; 402 pages; Published by O’Reilly Price(s): $39.95 (USA), $61.95 (Canada)
Managing & Using MySQL is the book for beginner MySQL’ers. MySQL is the open source database tool that is taking on even the bigger players like ORACLE, IBM, Microsoft, and SYBASE, and winning! Thus, it makes sense to get a handle on this product if you are in the database genre of the Information Technology industry. The authors do a good job at getting the concepts across on a basic and beginner level as that is the target audience of this offering by O’Reilly Publishing. With chapters on installation and initial set up they take you from square one and guide the reader by the hand into the sometimes-scary world of SQL programming. Since MySQL is the “M” in the open source acronym of LAMP, all of these aspects and inter-relationships are explained as well. This gives the reader a good grounding in the overall concept and context of the MySQL product and where it fits in the total open source solution. The “P” in “LAMP” can also mean a few other web
programming languages that begin with that letter, like, say, PERL or PYTHON. This book also gives some attention to integrating the MySQL product with these technologies by devoting one full chapter to each of these “P”’s. Managing and Using MySQL also spends some page space on the necessary topics of performance tuning and security, which is quite necessary as MySQL is heavily used in the development of interactive websites and portals. To have your data being served securely and quickly is the best of both worlds and this book certainly helps show the MySQL developer how to accomplish both. Besides the above-mentioned features, there are also chapters that cover the data types and the built-in functions of the product. These chapters lend themselves to the SQL programmer that already has a good handle on the SQL language in general, and wants to see the subtle changes, if any, in the specifics of the SQL implementation in the MySQL product. There is even some content provided for the more advanced topics of using MySQL, integrating with either the JAVA or the C languages. Although these are more advanced topics, the authors treat these topics in terms that are already familiar with the reader. If the reader has been going sequentially through the book, then these more advanced topics can be taken in stride as a natural progression of what has already been absorbed. The only topic of the day that is lacking in coverage in this book is how MySQL can be involved with XML, and although that is a more advanced topic, it should have been touched on here at least on an awareness level. Managing & Using MySQL is the book of choice for the new user of this open source database tool. Also, it is valuable to the intermediate DBA who may have taken over some MySQL databases in helping them tune the product and make it secure. This book should be on the shelf as a resource for anyone that needs general MySQL knowledge and coverage.
Dynamic Web Pages www.dynamicwebpages.de sex could not be better | dynamic web pages - german php.node
Parrot As you may have heard (or seen), the keynote—given by Sterling Hughes and Thies Arntzen—at the recent php{con was on the subject of PHP and Parrot. For those of you wondering what all the big fuss is, let me try to clear the water. PHP currently runs on the Zend Engine. This is an interpreter that works pretty much exclusively with PHP code. Parrot is a virtual machine (VM) that is meant to support multiple languages. Obviously, in order to run on the Parrot VM, PHP must be compiled down to Parrot code; this is not really that different from how the current Zend Engine works. Parrot, although far from complete or really proven, seems to offer a few benefits over the current Zend Engine, including a much faster runtime, full Unicode support, multiple language interoperability (Perl—CPAN, anyone? Python? Ruby?), and native regular expressions. Imagine, too, what could come from cooperation between PHP, Perl and Python internals developers, rather than competition. This concentration of efforts could lead to less duplication and much more innovation. Still, as I mentioned, Parrot is far from a reality. It has not been released, finalized, or proven in any situation. There are also significant hurdles to overcome if PHP Is to become usable on Parrot. Probably the most notable of those is that none of the Zend extensions that make PHP what it is would run on Parrot as is. Parrot is also not a
PHP technology. It has been developed to support Perl 6, and is being developed pretty much exclusively by the Perl community. So now we wait and see. Sterling and Thies seem to have made some good progress, but much remains to be done. Parrot is a moving target, but could mean some crazy new directions for PHP. To all the internals h4x0rs out there, keep up the good work! You keep things interesting. Are we Boardwalk or Baltic? Another thing to come out of php{con was lots of discussion about the presence of a representative from Microsoft, Brian Goldfarb1. A number of zealots would surely say that he should’ve been tarred and feathered, but it certainly does speak to PHP’s recognition as a player. Whatever the goal, Brian came, saw, and was impressed. Over the past few months, Redmond’s been paying a lot of attention to PHP (and open-source in general). First, we have a couple of articles appearing on Microsoft’s site; one comparing PHP to ASP, and one talking about a migration plan from PHP to ASP. Another recent happening was the invitation of a number PHP and open-source developers to Microsoft’s “ASP.NET Bootcamp” in May. Now, Microsoft is showing up at PHP conferences? Begs lots of questions, doesn’t it? But none so loudly as “What do they want?”2 Honeypot A short time ago, phpkitchen.com posted about a hacking attack that they had been fighting with for some time. The attacker turned out to be gaining access through a forum script (http://www.yabbse.org), which exposes a vulnerability allowing arbitrary commands to be executed on the server With a properly-crafted GET string, the script would execute a set of shell commands locally. While I loathe having to deal with hackers, I tend to find their craft—even script kiddies—fascinatingly imaginative. You should check out the entry on phpkitchen.com3, and make sure that you’re not vulnerable to this same attack. I could go on a rant about security, but I’ll leave that to your system administrators. EasyWindows Check out this PHP installer4 for Windows. It easily allows you to install PEAR and ADOdb, FastCGI for IIS, and Turck MMCache, as well as doing some configuration for you. Very nice work, and not just useful for the beginner.
ver since I started dabbling in PHP, I’ve been a subscriber to several PHP mailing lists. At the beginning, when I though of PHP as little more than C with a few neat additional features (I have since stopped thinking of it that way, although that’s still my motivational speech when I’m trying to sell an old-timer on it), I thought they would help me get a better grasp of the situation. At that time, the lists were getting no more than forty or fifty messages a day, a number that was at the same time manageable and practical, since—generally speaking—there was a wide amount of variety in the questions that were asked and answers that were given. Fast-forward a few years. I recently had a disagreement with my e-mail application—which had decided to reindex my entire Inbox every time I received a new message—that led to my backing up all my mail folders. In the two weeks since my postal apocalypse, my mailing list folder contains some 3,000 new messages, and a cursory read-through indicates that there is an unusual amount of repetition in the questions asked. When wondering why this happens, the typical geek-brained answer is “because people are stupid”. That, of
November 2003
●
PHP Architect
●
course, is not the truth (at least in most cases) and, therefore, there have to be other reasons. One of these, in my opinion, is the chronic lack of research facilities for someone who wants to find out whether a question has been asked on the mailing lists or not. The PHP website does offer a message-browsing interface at http://news.php.net, but this includes no search options, so that your only choice is to scroll through each and every message manually—not very convenient when you’re looking for a particular topic in a list that contains over one hundred thousand messages. The mailing list archives offered by php.net, which are actually part of MARC, are not much more useful. Unlike the news interface (whose original purpose, to be fair, was essentially that of providing an NNTP interface to the mailing lists), MARC does offer a search function, but it is, in my opinion, too limited to be of any use. It is not possible to search multiple mailing lists at once, for example, nor are there many of the features that make webbased search engines great, like keyword highlighting, colour-coding of quoted text and automatic thread presentation. Other search engines on the web (at least those I was able to find) have the same limitations—and are, in
www.phparch.com
some cases, even worse. That leaves the PHP enthusiast with only one truly viable choice: generalpurpose search engines. The problem here is that most search engines are unable to distinguish between the word “PHP” (as in “I need help with this PHP script”) and the extension PHP (as in “this_script.php”). This “small” issue can be the source of incredible amounts of frustration, particularly when you search for something like “PHP crack” (referring to the “crack” library) and ending up on a website (written in PHP) that sells illegal drugs. Driven by general despair, therefore, we set out to write our own archive and search engine dedicated exclusively to the PHP mailing lists, which you’ll find at http://phparch.com/mailinglists. It is by no means perfect, but it does have a few features that should make searching it for information a bit more pleasant. First of all, it is entirely powered by PHP, and the back-end database is an optimized MySQL database on which we have built close to ten different indexes to ensure the highest possible search speed—when you have close to 750,000 messages in roughly 80 different newsgroups/mailing lists, that’s not quite as easy as you may think. The system also threads messages automatically and shows the contents of the thread each message belongs to. This is one area where we still have some work to do, due to the fact that the threading mechanism provided by the e-mail RFCs is a bit spartan and (unsurprisingly) not well supported by all mail clients. If you visualize a message as part of a search result set, your search keywords will be highlighted, and in all cases quoted text (indented with a “>” character) is coloured so that you can more easily navigate the message. I, for one, find that our archives work quite well. If you think of the most frequently asked questions about PHP and search for any one of them, the results are usually very relevant and well sorted. The highlighting and coloring features make it easier to read through the messages, and the threading works well enough to navigating an entire discussion with little or no effort. If this helps a few people find an answer to their questions quickly, then I think we’ve accomplished our mission. php|a