APRIL 2003
VOLUME II - ISSUE 4
php|architect The Magazine For PHP Professionals
Migrating Web Applications to PHP Computer Science Concepts With PHP Writing A Parser And Expression Evaluator
Conquer JpGraph Advanced Features Revealed
www.phparch.com
XML Transformation With PEAR Using PEAR::XML_Transformer
Form Validation From the Outside In A New And Interesting Perspective
Practical Web Services with PHP Plus: Book Reviews, Product Reviews and much more..
This copy is registered to: Liwei Cui
[email protected]
Introducing the php|architect Grant Program As PHP’s importance grows on the IT scene—something that is happening every day—it’s clear that its true capabilities go well beyond what it’s being used for today. The PHP platform itself has a lot of potential as a general-purpose language, and not just a scripting tool; just its basic extensions, even discounting repositories like PEAR and PECL, provide a highquality array of functionality that most of its commercial competitors can’t afford without expensive external components. At php|a, we’ve always felt that our mission is not limited to try our best to provide the PHP community with a publication of the highest possible quality. We think that our role is also that of reinvesting in the community that we serve in a way that leads to tangible results. To that end, this month we’re launching the php|architect Grant Program, a new initiative that will see us award two $1,000 (US) grants to PHP-related projects at the end of June. Participating to the program is easy. We
invite all the leaders of PHP projects to register with our website at http://www.phparch.com/grant and submit their applications for a grant. Our goal is to provide a financial incentive to those projects that, in our opinion, have the opportunity to revolutionize PHP and its position in the IT world. In order to be eligible for the Grant Program, a project must be strictly related to PHP, but not necessarily written in PHP. For example, a new PHP extension written in C, or a new program in any language that lends itself to using PHP in new and interesting ways would also be acceptable. The only other important restriction is that the project must be released under either the LGPL, the GPL or the PHP/Zend license. Thus, commercial products are not eligible. Submit Your Project Today! Visit http://www.phparch.com/grant for more information
TABLE OF CONTENT
php|architect Departments
5
EDITORIAL RANTS
Features
10
By Jason E. Sweat
INDEX
php|architect: A New Community
21
7
Form Validation From the Outside In A New Perspective on Form Validation By Peter James
NEW STUFF
30 35
Advanced Features in JpGraph
The Realization of Freedom Migrating From Proprietary Tools to PHP By Dave Palmer
REVIEW From Electronic Microsystems PostgreSQL Manager and MySQL Manager
41
Practical Web ServicesWith PHP and XML-RPC By Stuart Herbert
66
69
BOOK REVIEWS • MySQL • Administering and Securing the Apache Server
52
exit(0);
61
PHP For Suits: The Neverending Saga By Marco Tabini
April 2003 · PHP Architect · www.phparch.com
Using The PEAR::XML_Transformer By Bruno Pedro
When A Meets B Writing a Parser and Expression Evaluator in PHP By Marco Tabini
3
The designers of PHP offer you the full spectrum of PHP solutions
Serve More. With Less. Zend Performance Suite Reliable Performance Management for PHP
Visit www.zend.com for evaluation version and ROI calculator
Technologies Ltd.
EDITORIAL RANTS
EDITORIAL
php|architect: A New Community Since I first came on board as an editor here (only days after the launch of php|architect’s first issue), I’ve had the distinct pleasure of forming relationships with some of the greatest minds in the PHP community. As a budding Editor in Chief, it has become quite clear to me that our ability to form synergistic, collaborative relationships with our authors is something of an anomaly in the magazine publishing industry. Lucky for me, it’s one that is a welcome change to the authors I’ve interacted with. I’m proud to have a part in creating this apparently new paradigm in publishing, and this seemingly new perspective on the editor-author relationship. I like to think of the editorial process as the creation of what I am now dubbing ‘A New Community’; A new community of people from different walks of life, having vastly different views on how different technologies should work together, different experiences to share, and from disparate cultures and geographic regions of the Earth. A community bonding on the common goals of furthering both the development of PHP as a complete, stable, mature development platform, and the progress of other PHP developers. The end result of this highly iterative, interactive process is what is ultimately delivered to you by php|architect every month. Not just a magazine, but a somewhat interactive experience in itself. Let’s not forget that in addition to working hard to produce the magazine each month, the members of our editorial staff are also consumers of this magazine! As a result, we experience along with all of you the feeling that we are really being taken by the hand through what seems like a tour of a particular project undertaken by a particu-
April 2003 · PHP Architect · www.phparch.com
php|architect Volume II - Issue 4 April, 2003
Publisher Marco Tabini
Editor-in-Chief Brian K. Jones
[email protected]
Editorial Team Arbi Arzoumani Brian Jones Peter James Marco Tabini
Graphics & Layout Arbi Arzoumani
Administration Emanuela Corso
Authors Stuart Herbert, Peter James, Dave Palmer, Bruno Pedro, Jason E. Sweat, Marco Tabini php|architect (ISSN 1705-1142) is published twelve times a year by Marco Tabini & Associates, Inc., P.O. Box. 3342, Markham, ON L3R 6G6, Canada. Although all possible care has been placed in assuring the accuracy of the contents of this magazine, including all associated source code, listings and figures, the publisher assumes no responsibilities with regards of use of the information contained herein or in all associated material.
Contact Information: General mailbox: Editorial: Subscriptions: Sales & advertising: Technical support:
[email protected] [email protected] [email protected] [email protected] [email protected]
Copyright © 2002-2003 Marco Tabini & Associates, Inc. — All Rights Reserved
5
EDITORIAL lar developer. At the end of the tour, we’re handed the bits of code which are out in production somewhere, making things tick. As a consumer, I find this amazingly useful. I hope anyone reading this agrees, or will send me email telling me how we can make things even better. In addition, I invite those who might currently be passive consumers of information to get involved! If you’re reading this, you’re quite likely a PHP developer. Chances are also good that you have a unique opinion or perspective on some aspect of technology that touches us here in the PHP community. We invite you to participate more actively in the ‘New Community’ by sharing your knowledge, thoughts, and ideas with the rest of us. We’ve just released a new version of our author’s guidelines, which can help get you started. Anything that this leaves you wondering about can be addressed by asking our editorial staff at
[email protected]. It could very well be that you haven’t participated yet because you haven’t seen a product offered by php|architect that looks like a good fit for your own work. Maybe you want to write a book, or start a new project of unforeseeable dimensions. I urge anyone in this position to write to
[email protected] with your concerns in this area. It’s quite possible that we have a project in the works that could benefit from your input. The plain and unequivocal fact is that php|architect is growing quite rapidly. This growth stems from the ideas of those collaborating on the direction of php|architect as a diversified company – not just a magazine. For example, php|architect has recently been translated and redistributed in Japanese for that market. This has been a successful venture for all of those involved, as well as the community at large, and we urge any other parties interested in localized versions of php|architect April 2003 · PHP Architect · www.phparch.com
to contact us –
[email protected]. What’s more, php|architect will soon see the release of our first full-fledged book, with more work in this area ongoing. I won’t use this space to divulge the details now, but stay tuned to my monthly editorial for more as things progress. In the wake of Wrox’s disappearance from the publishing world (a result of their parent company’s apparent insolvency), we at php|architect feel some obligation to at least partially fill the gap that will inevitably result in the area of authoritative PHP coverage. From someone who owns several Wrox books, I can sincerely say that their work is well appreciated, and will be sorely missed. We can only hope to do our best to keep up with the needs of a constantly growing developer base. Luckily, we don’t have to have the breadth of coverage in our books that Wrox had in theirs – I have no desire to work on a Java book! These are just a couple of the ideas we’re currently pursuing. Since no entity can claim monopoly ownership over all good ideas, it’s only natural that we reach out and make it known that we are a company which fosters and values ideas. If you have an idea for which an outlet has yet to be identified, we hope you’ll consider contacting us for help in discovering how best to pursue your goals. Now on with the show! I’m excited by what lies between this editorial and Marco’s exit(0) (actually, I kinda like the editorial and exit(0) as well). I hope you can find enlightenment and inspiration from the articles which follow. Let us know your thoughts:
[email protected]
6
NEW STUFF
What’s New!
NEW STUFF
PHP Architect Wo Yomitai! We’re happy to announce the introduction of the Japanese edition of php|architect! Published by Asial Corporation, the best PHP company in Japan, the publication is called PHP Programmer’s Magazine and it provides all the great content of php|a, plus many localized features specific to the Japanese market. Asial (http://www.asial.co.jp) is a Tokyo-based company that specializes in the production of PHP systems and the localization of software and documentation for the Japanese market. You can find the PHP Programmer’s Magazine website at http://www.phppro.jp If you are interested in localizing php|a in your language, don’t hesitate to drop us a note at
[email protected].
The PHING is Loose!
ionCube Announces Special Offer The ionCube standalone PHP encoder is a high performance and complete PHP encoding solution, employing the technique of compiled code encoding to maximize both runtime performance and code protection. Encoded scripts can be run by the free Loader product with a standard PHP server in one of two ways. The Loader can be installed in php.ini, and this delivers the best performance and is compatible with safe mode. For users with no access to php.ini, on many systems the Loader can be installed “on demand” by the scripts themselves. This requires no php.ini edits or server restart. The Base Edition comes with a single user license and, as with all current ionCube products, full support and upgrades are included for Free. To download an evaluation or purchase of the standalone encoder, you can visit the ionCube website at http://www.ioncube.com.
April 2003 · PHP Architect · www.phparch.com
PHING (PHing Is Not Gnumake) is a make tool written in PHP and based on the ideas and concept of Apache Ant. It can basically do anything you could do with a build system like Gnumake, for example, but uses XML for storing the targets and commands. Using XML, Phing avoids problems like the “Space before tab problem” in Gnumake. Current features include processing files while copying (e.g. do Token Replacement, XSLT transformation etc.) and various file system operations. Phing is written in the PHP scripting language and designed to be extensible. Thus, you can easily add the behaviour you need for your project by adding a class (known as tasks). Additionally, Phing is platform independent and runs on Unix like systems and Windows. Phing is currently used to build binarycloud, a PHP Application Framework, and is maintained by its developers. For more information, or to download Phing, visit its website at http://binarycloud.com/phing/.
7
NEW STUFF
eZ publish 3 Released ezPublish 3 is a professional open source content management system and development framework. As a CMS its most notable feature is its revolutionary, fully customizable and extendable content model. This is also what makes it suitable as a platform for general web development. Its standalone libraries can be used for cross-platform, database independent PHP projects. eZ publish is also well suited for news publishing, e-commerce (B2B and B2C), portals, and corporate web sites, intranets, and extranets. For more info, visit the eZPublish website at http://ez.no.
PHP 4.3.2RC1 Released The first public release candidate of the latest version of PHP was posted for download on the PHP.Net website earlier this month. The new version includes several bug fixes and a few new features compared to version 4.3.1, which was released in February in response to a CGI-related security issue. For more information, visit the PHP website at http://www.php.net.
phpOpenTracker 1.1.1. Is Unleashed! phpOpenTracker is a framework solution for the analysis of Web site traffic and visitor behaviour. It features a logging engine that, either invoked as a Web bug by an HTML image tag or embedded with two lines of code into your PHP application, logs each request to a Web site into a database. One installation can track an arbitrary number of Web sites. Through its API, you can easily April 2003 · PHP Architect · www.phparch.com
access the gathered data and perform complex operations on it (for instance, the analysis of your visitors’ click paths). For more information, visit the phpOpenTracker website at http://www.phpopentracker.de/.
MySQL AB Launches MySQL 4.0, Announces Certification Program MySQL AB, producers of the popular MySQL database management system, have announced the release of version 4.0 of their flagship product, which is now officially ready for production. Meanwhile, they have started development of version 4.1, which will include such long-awaited goodies as subqueries. Through the MySQL certification program, MySQL software developers can earn one or more formal credentials that validate their knowledge, experience and skill with the MySQL database and related MySQL AB products. The MySQL certification program consists of several unique certifications. The first, which is now generally available, is called the MySQL Core Certification. The Core Certification provides MySQL users with a formal credential that demonstrates proficiency in SQL, data entry and maintenance, data extraction for reporting and more. The MySQL Professional Certification, which will be available in Q3 of this year, is for the more experienced MySQL user who wants to certify his or her knowledge in MySQL database management, installation, security, disaster prevention and optimization. MySQL Core certification will be a prerequisite for taking the Professional Certification exam. MySQL also plans to offer a MySQL PHP Certification by the end of the year, which is designed for the MySQL and PHP developer who wants to simultaneously certify his knowledge of MySQL and of the PHP Web development tool. In addition, a MySQL DBA Certification, a top-level certification for the most accomplished MySQL gurus, will be offered in 2004. If you are interested and want to know more, check out the MySQL website at http://www.mysql.com. php|a
8
FEATURES
FEATURES
Advanced Features in JpGraph
By Jason E. Sweat This article originated as a case study in the newest book in the Wrox Handbook series: PHP Graphics (http://www.wrox.com/books/1861008368.htm). This material was omitted in the final publication of the book, and was modified for presentation here.
J
pGraph (http://www.aditus.nu/jpgraph/) is a PHP class library that easily enables you to generate professional quality graphs using a minimal amount of code. This article is a case study illustrating some of JpGraph’s advanced features; specifically, it covers the following: • a generalized methodology for JpGraph script development • the evolutionary process of developing a graph (in contrast to presenting only the final product) • the use of server-side caching with JpGraph for performance • the use of Client Side Image Maps (CSIM) to implement “drill-down” functionality in your graphs
Installation and Environment The easiest way to get started with JpGraph is to download the source, available at http://www.aditus.nu/jpgraph/jpdownload.php. Next, unpack the source archive into a directory in PHP’s include path. Now you can modify the paths to your installed fonts, as well as to the cache directory. These settings are found in jpgraph.php. To verify that your installation is working, view the April 2003 · PHP Architect · www.phparch.com
testsuit.php file in the examples directory. This page generates over 200 example graphs using JpGraph, and allows you to review the code for each of them. If you’re just learning JpGraph and exploring its capabilities, you will find the manual very handy. It’s available at http://www.aditus.nu/jpgraph/jpdownload.php, and has both a narrative text and an excellent class reference. You might also want to visit the JpGraph support forum at http://jpgraph.fan-atics.com/. All of the scripts in this article were developed and tested using PHP 4.3.0 (with the built-in GD2 library) running as a module under Apache 1.3.27 on RedHat Linux 7.2. The code in this article was developed using MySQL (http://www.mysql.com/) as a database, and ADOdb (http://php.weblogs.com/adodb) as a database abstraction layer. To this end, all of the scripts include a common file called phpa_db.inc.php, shown in Listing 1. REQUIREMENTS PHP Version: 4.0.4 minimum, 4.1 recommended O/S: Any Additional Software: JpGraph, GD Enabled PHP
10
FEATURES
Advanced Features in JpGraph
Listing 1: phpa_db.inc.php
tor, where the index of the returned array for each row is the value of the first column. The examples below assume that you have the truetype 'Arial' font installed. If this is not the case, JpGraph will emit an error to this effect. An easy way to get around this is change all of the references to the FF_ARIAL font constant to FF_FONT1. FF_FONT1 is a built-in system font, and while it won't look as pretty as FF_ARIAL, it will allow you to view the examples. That should be enough to get you through the examples ahead. Let's take a look at the case study.
db_inc.php
This include file creates an ADOdb connection object named $conn which is used to access the database throughout the rest of the scripts. You'll want to change the Connect() call's parameters to reflect your setup, including the name of the database you create for the examples. ADOdb needs to be installed in a directory in PHP's include path. For readers unfamiliar with the ADOdb API, I'll give a very brief overview of some of the basic functionality. In ADOdb you can fetch the results of an SQL statement using $rs = $conn->Execute($sql_statement, $bind_array);
If your query is successful, this method will return an ADOdb resultset object. If not, the method will return false. Two resultset methods that are useful are GetArray() and GetAssoc(). GetArray() will return a vector of rows where each row is an associative array of 'COLUMN' => 'VALUE'. The GetAssoc() method will return an associative array, instead of a vec-
April 2003 · PHP Architect · www.phparch.com
Case Study Our case study will look at the sales data for the ABC Company, a fictitious manufacturer of widgets. ABC makes everything from the economical $12.99 Widget B through to the ultra-deluxe $1,499.50 Widget E. This study focuses on the sales in the continental United States, in which the company is divided into four regions. The company sells to resellers via three channels: the Internet, a call center and various retail outlets. You are asked to create a graph depicting the sales in units and dollars. You'll need to further split this graph by catalog item for each sales region, and make a comparison to the forecasted sales. ABC also requires that you create a graph showing the year-to-date sales by channel for each region. They need to be able to navigate quickly between the two graphs. Database Design Six tables compose ABC's data model. The central table is abc_sales. This table tracks information about sales, including the time, the sales channel, the location, the item, the quantity sold, and the revenue generated.
11
FEATURES The other tables making up the data model are abc_catalog, abc_channel, abc_forecast, abc_region, and abc_state_region. The abc_catalog table provides a surrogate key for particular items customers can purchase, a description of the item, and the current unit price of the item. The abc_channel table provides a surrogate key, name and description of the market channel an item can be purchased through (web, phone or retail in this study). The abc_forecast table stores a forecasted sales plan by item, channel, region and month. For each of these "slices" of data, the quantity and revenue expected are also stored. The abc_region table stores the surrogate key and description for each of the company's sales regions. The abc_state_region table is a mapping between state abbreviation code and the sales region to which the state belongs. The SQL used to create the tables, along with 'INSERT' statements to populate the smaller tables, is included in this article's source directory in the mysql_ddl.sql file. As for the other tables, the sales and sales forecast data for the year can be simulated using the scripts abc_gen_sales.php and abc_fcst_ins.php, respectively. abc_gen_sales.php needs to be run first, as abc_fcst_ins.php depends on that data it creates.
Advanced Features in JpGraph Listing 2: Generating a graphical error 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
$region_id = check_passed_region('region'); if (!$region_id) { graph_error('region parameter incorrect'); } function check_passed_region( $parm ) { global $regions; if (array_key_exists($parm,$_GET)) { $val = $_GET[$parm]; if (array_key_exists($val, $regions)) { return $val; } } return false; } function graph_error($msg) { $graph = new CanvasGraph(WIDTH, HEIGHT); $t1 = new Text($msg); $t1->Pos(0.05, 0.5); $t1->SetOrientation('h'); $t1->SetFont(FF_ARIAL, FS_BOLD); $t1->SetColor('red'); $graph->AddText($t1); $graph->Stroke(); exit; }
region_grapherror.php
Note: These scripts were originally written in December of 2002. The data is simulated as if the current date is December 15, 2002 in order to show graphs with nearly complete data for the year. Because it is now 2003, the scripts have been modified to query and show the prior year.
Sales vs. Forecast Graph Our first task is to create a dynamic graph comparing the number of units sold and the revenue of those sales to the forecasts made for each region. This information is considered proprietary by the company, and is not intended for public distribution. They would like the chart to indicate this. In developing PHP scripts to generate graphs using JpGraph, I have found the following four-step process useful: 1. Retrieve and manipulate the data for plotting 2. Create the Graph object and set general graph properties (like colors and plot axes) 3. Create the Plots to add to the Graph object 4. Finalize the Graph object and output the April 2003 · PHP Architect · www.phparch.com
graph Following this process for graph development, you first need to retrieve the sales and forecast data. To see how you use ADOdb to retrieve the data for these graphs, please review lines 26-122 of the abc_reg_sales_graph.php file. The focus of this article is on graphing, so the database retrieval and graph data construction will not be covered in detail. Because the graphs being produced are by region, your script will need to accept and validate a parameter for the region. If the value passed is a valid region, assign it to the variable $region_id. If it is not valid, generate an error. There could be a little problem here. Since the output of this script is a graph, your site will most likely refer to this script using an HTML tag, like so: If your script outputs a text message (like a PHP error), this would result in an invalid image, and the only thing a user will see is the missing image symbol where your graph was supposed to be. The code shown in Listing 2 validates the region, and shows how to create a "graphical" error message. This code assumes you have already queried the database for valid regions, and stored them in a global array called $regions. It is also assumed that you have included
12
FEATURES
Advanced Features in JpGraph Listing 3: Composing a single $graphData array
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
$graphData['f_qty'] = array(); $graphData['labelX'] = array(); for ($i=0,$j=count($salesData); $iSetScale('textlin'); suited for graphing. This often means making a series of zero-indexed arrays of single data series'. Rather $b1 = new BarPlot($graphData['qty']); $l1 = new LinePlot($graphData['f_qty']); than have different global arrays for each series, I prefer to have a single associative array named $graph->Add($b1); $graphData that contains indices naming each of the $graph->Add($l1); $graph->Stroke(); individual arrays. The code in Listing 3 assembles these pieces into the $graphData array, which will be used first_graph.php for graphing. Now let's take a look at some of the data by drawing your first graph. This graph Figure 1: Your first graph will compare the number of units moved to the forecasted number of units that should have moved. Listing 4 shows the code for this. This code highlights the lightweight nature of the JpGraph API. As you enhance the appearance of your graphs, the code will expand, but the lines of code required to produce a functional graph really are minimal. These few lines of code complete the four-step process outlined earlier. In step 2 you create and configure the Graph object. April 2003 · PHP Architect · www.phparch.com
13
FEATURES
Advanced Features in JpGraph
In step 3, you create a BarPlot object and a LinePlot object. You complete the process in step 4 by adding the plots to the graph and using the Graph::Stroke() method to output the graph. Calling Graph::Stroke() with no arguments will make JpGraph stream the image directly back to the browser from the PHP script. The resulting graph is shown in Figure 1. The regular style line graph does not really seem appropriate for this graph, since the data is not really changing over the course of the month (forecasts are fixed for the entire month). The forecasts might be better represented using the 'step' line graph. Let's incorporate this change and, while we're at it, take a look at the revenue numbers instead. You can do this by changing the two plots as shown in Listing 5. The output is shown in Figure 2. The next step is to look at both units and revenue on the same graph. Management is more concerned with meeting the revenue forecast, so forecast revenue should be the only line graph superimposed on the chart (as opposed to forecasted units). There are two challenges here. First, by looking at the two charts, you can see that units and revenue are definitely on different scales. Second, it would be nice to figure out a way to have the last step of the line graph plot over the
December bar. To represent units and revenue you will need to use the second Y-axis feature of Jpgraph. Ideally you would use a grouped bar graph. JpGraph, however, does not allow us to add bars from different scales to the same grouped bar plot. What you can do is use a little deception to accomplish this. Make them two different grouped bar plots, and then trick JpGraph into believing that they are next to another bar plot on the same scale by adding a plot to each with all zero values. The effect here is that one scale has the zero-value plot on the right, pushing the real plot to the left, while the other scale has the zero-value plot on the left, pushing the real plot to the right. When they are finally combined on the graph, they will appear to be a single grouped bar plot, but will actually be scaled on their respective axes. You will need some new data to accomplish this effect, which we can get in Listing 6. Listing 5: Plots to make the graph in Figure 2. $b1 = new BarPlot($graphData['rev']); $l1 = new LinePlot($graphData['f_rev']); $l1->SetStepStyle(); $l1->SetColor('darkgreen'); $l1->SetWeight(3);
second_graph.php
Figure 2: Using a step-style line plot
Listing 6: Creating a zerio-value plot. for ($i=0,$j=count($graphData['labelX']); $iy2axis->SetLabelFormatCallback ('y_fmt_dol_thou'); function y_fmt_dol_thou($val) { return '$'.number_format($val/1000); }
the plot area, axis titles and location of the legend is needed. Also, we need make sure that viewers are aware this graph is for internal purposes only. One technique for marking an image as proprietary is to add a background image to it. In this case, we'll use the string "ABC Co. Proprietary" turned diagonally and repeated. Note the color is much darker than you would normally want to see on a "watermark". You can adjust this with the Graph::AdjBackgroundImage() method, which can adjust the brightness, contrast and saturation of the image prior to use in the graph. This can save you the effort of doing this externally in an image-processing program. See the img/abc-background.png Listing 8: Code to generate the graph in Figure 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
$graph->SetY2Scale('lin'); $graph->SetY2OrderBack(false); //generate the individual plots $b1 = new BarPlot($graphData['qty']); $b2 = new BarPlot($graphData['rev']); $b2->SetFillColor('lightgreen'); $b1z = new BarPlot($graphData['zero']); $b2z = new BarPlot($graphData['zero']); $l1 = new LinePlot($graphData['f_rev']); $l1->SetStepStyle(); $l1->SetColor('darkgreen'); $l1->SetWeight(3); //create the grouped plots $gb1 = new GroupBarPlot(array($b1, $b1z)); $gb2 = new GroupBarPlot(array($b2z, $b2)); //add the plots to the graph object $graph->Add($gb1); $graph->AddY2($gb2); $graph->AddY2($l1);
format_callback.php
third_graph.php
Figure 3: Grouped bar plot on different scales
April 2003 · PHP Architect · www.phparch.com
15
FEATURES
Advanced Features in JpGraph
file in this article's source directory for an example of what this background image could look like.
“Coupling JpGraph with PHP's database access capabilities provides you with a powerful toolset for the generation of dynamic graphs on the web.”
Note: There is a conflict with the Graph::AdjBackgroundImage() method and the GD2 library that comes bundled with PHP 4.3.0. If you have upgraded to this version, you will have to fall back to the method of adjusting the image with an editor until the conflict is resolved.
You can use this image as the background by using the code shown in Listing 10. You can use the code in Listing 11 to add the finishing touches like graph and axis titles, and to adjust the legend placement. The output of this code is shown in Figure 5. For performance reasons, you decide to implement the JpGraph image-caching mechanism on this graph. This caching is accomplished by saving a copy of your graph as a file on the server. Instead of generating the image on-the-fly, this cached file is streamed back (if the cached copy is still valid). Note that the web server must have write access to the directory in which the cached images are saved. Instead of creating the Graph class instance with just the width and height parameters, you will need to also pass in a name for the cached image, a timeout value in minutes (how long the image is valid), and a final parameter telling JpGraph to continue to stream the images. This means that you will still use the PHP script itself as the image tag's 'src' parameter. The code necessary to implement caching is shown in Listing 12.
Listing 9: Code to generate stacked bar graphs in Figure 4. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
$colors = array('pink', 'orange', 'yellow', 'lightgreen', 'lightblue'); $abqAdd = array(); $abrAdd = array(); for($i=0,$j=count($items); $iSetFillColor($colors[$i]); $b1->SetLegend($items[$i]['item_desc']); $abqAdd[] = $b1; $b2 = new BarPlot($graphData[$key]['rev']); $b2->SetFillColor($colors[$i]); $abrAdd[] = $b2; } $ab1 $ab2 $b1z $b2z
= = = =
new new new new
AccBarPlot($abqAdd); AccBarPlot($abrAdd); BarPlot($graphData['zero']); BarPlot($graphData['zero']);
$gb1 = new GroupBarPlot(array($ab1, $b1z)); $gb2 = new GroupBarPlot(array($b2z, $ab2)); $graph->Add($gb1); $graph->AddY2($gb2);
fourth_graph.php
Figure 4: Using stacked bars
April 2003 · PHP Architect · www.phparch.com
16
FEATURES
Advanced Features in JpGraph Listing 10: Code to use and adjust the background image.
if (USING_TRUECOLOR) { $graph->SetBackgroundImage('img/abc-background_prefade.png', BGIMG_FILLFRAME); } else { //AdjBackgroundImage only works with GD, not GD2 true color $graph->SetBackgroundImage('img/abc-background.png', BGIMG_FILLFRAME); $graph->AdjBackgroundImage(0.9, 0.3); }
background_image.php
Listing 11: Code to finalize the graph in Figure 5. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
$graph->title->Set(date('Y')." Sales for {$regions[$region_id]} Region"); $graph->title->SetFont(FF_ARIAL, FS_BOLD, 12); $graph->SetMarginColor('white'); $graph->yaxis->title->Set('Left Bar Units Sold'); $graph->yaxis->title->SetFont(FF_ARIAL, FS_BOLD, 10); $graph->yaxis->SetLabelFormatCallback('y_fmt'); $graph->yaxis->SetTitleMargin(48); $graph->y2axis->title->Set('Right Bar Revenue ( $ 000 )'); $graph->y2axis->title->SetFont(FF_ARIAL, FS_BOLD, 10); $graph->y2axis->SetTitleMargin(45); $graph->y2axis->SetLabelFormatCallback('y_fmt_dol_thou'); $graph->xaxis->SetTickLabels($graphData['labelX']); $graph->legend->Pos(0.5, 0.95, 'center', 'center'); $graph->legend->SetLayout(LEGEND_HOR); $graph->legend->SetFillColor('white'); $graph->legend->SetShadow(false); $graph->legend->SetLineWeight(0);
fifth_graph.php
Now if the same graph - for the same region - is requested more than once within 24 hours (our timeout value of 60 minutes * 24 hours) , the cached version will be streamed back to the browser, and no code after the 'new Graph()' line will be executed. This means that in order to maximize the gains from caching you will want to move the Graph object instantiation prior to any expensive database queries.
Listing 12: Code to implement graph caching. define('GRAPH_NAME', 'abc_reg_sales'); $graphName = GRAPH_NAME.$region_id.'.png'; $graphTimeout = 60*24; $graph = new graph(WIDTH, HEIGHT, $graphName, $graphTimeout, true);
cache_graph.php
Figure 5: Completed regional graph
April 2003 · PHP Architect · www.phparch.com
17
FEATURES
Advanced Features in JpGraph
Region-by-Channel Graph The second type of graph you were asked to create shows the sales for each region by channel, and needs to provide an easy way of navigating to the first graph you constructed. This type of report means viewing the information in proportions, so a pie graph may be effective. Again, please review lines 22-82 of the abc_map_graph.php file in this article's source directory to understand the database queries and array creation for the following graphs. Listing 13 shows the code necessary to generate the pie graph shown in Figure 6. You can expand on the use of background images that was introduced in the first set of graphs, and add additional information to your graph. Consider a map of the United States showing the divisions of each region for the ABC Company. If you use this map image as a background for your graph, you can actually place the pie charts on each of the regions to make it clear what region the pie chart represents. See the img/abcregions.png file in this article's source directory for the example background image used here. To use this example, you will need to add a couple of more lines to the $graphData construction loop to allow for dynamic placement of the pie charts for each region: $graphData['r'.$rIndex]['map_x'] = $regionData[$i]['map_x']; $graphData['r'.$rIndex]['map_y'] = $regionData[$i]['map_y'];
Now let's look at the entire code, shown in Listing 14, to generate the graph in Figure 7. The company's final request was to be able to drill down from these pie charts to the regional sales data
graphs you constructed earlier. You can implement this feature using Client Side Image Maps (CSIM). CSIM is an HTML technology allowing you to specify regions of an image to associate with a hyperlink. To implement CSIM for this chart, you will need to make the CSIM targets (hyperlinks) and image alts (tips for the user). First we'll define a constant containing most of the link to drill down to. define('DRILL_GRAPH', 'abc_reg_sales_graph.php?region=');
Now in the $graphData loop we'll populate the targets and alts: $graphData['r'.$rIndex]['targets'][] = DRILL_GRAPH.$regionData[$i]['region_id']; $graphData['r'.$rIndex]['alts'][] = "Click for more information regarding {$regions[$rIndex]['region']} sales.";
Listing 13: Code to generate the pie chart in Figure 6. $sliceColors = array('lightgreen', 'pink', 'lightblue'); $graph = new PieGraph(WIDTH, HEIGHT); $graph->title->Set($regions[$region]['region'] .' Region'); $graph->subtitle->Set('Sales by Channel since ' .GRAPH_START); $p1 = new PiePlot($graphData[$pickRegion]['rev']); $p1->SetLegends($graphData[$pickRegion]['label']); $p1->SetSliceColors($sliceColors); $graph->Add($p1); $graph->Stroke();
simple_pie.php
Figure 6: Simple pie chart
April 2003 · PHP Architect · www.phparch.com
18
FEATURES
Advanced Features in JpGraph Listing 14: Code to generate region graph in Figure 7.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
$graph = new PieGraph(WIDTH, HEIGHT); $graph->SetBackgroundImage('img/abc-regions.png', BGIMG_FILLFRAME); for ($i=0; $iSetCenter($graphData[$pickRegion]['map_x'], $graphData[$pickRegion]['map_y']); $p1->SetSize(PIE_SIZE); $p1->SetLabels($graphData[$pickRegion]['revFmt']); $p1->SetSliceColors($sliceColors); if (!$i) { $p1->SetLegends($graphData['label']); } $graph->Add($p1); } $graph->legend->Pos(0.9, 0.85, 'center', 'center'); $graph->Stroke();
region_pie_graphs.php
Because there is now HTML information (the CSIM) in addition to the binary image content, you can't stream it back to the browser like you normally would. The CSIM and the image depend on each other too much. In order to make this work, you need to cache the image. This allows you to output the image map, as well as the image tag used to fetch the cached image. Instead of using the JpGraph cache outlined above, we'll use another form of image caching and store it in a place where we can request it directly. To accomplish this you will need to create a directory called img immediately below the script directory. This directory must be writeable by the web server. When you are creating the graph, treat it as if you were going to stream the image. In the pie chart loop,
add the CSIM information like so: $p1->SetCSIMTargets( $graphData[$pickRegion]['targets'] $graphData[$pickRegion]['alts'] );
To output the graph, use the code in Listing 15. This code instructs JpGraph to output the image to a file called img/abc_channel_graph.png. You then fetch the image map generated by the graph into the $imgMap variable. The last print statement should be incorporated into a larger valid HTML document, but here you can see the coupling between the image map and the image tag. The image tag specified allows the
Figure 7: Using a background to show regions
April 2003 · PHP Architect · www.phparch.com
19
FEATURES
Advanced Features in JpGraph
image to use the generated map. Key Concepts The following concepts, related to graphing and JpGraph, were introduced or emphasized in this case study: • the use of the JpGraph Bar, Line and Pie plot types
• the use of the JpGraph image caching - for both performance, as well as to facilitate use of the CSIM feature • the use of a background image as part of the charts information content (pie chart location on the sales by region chart) • the use of CSIM for drill-down capability on graphs
• the use of stacked and grouped bar graphs • the use of an alternative scale on the Y axis • the use of a callback function to perform formatting of labels • the generation of error messages in an image • the creation of a "watermark" using a background image Listing 15: Code to handle CSIM's.
Conclusion JpGraph is a lightweight API that allows you to quickly generate professional looking graphs. Coupling JpGraph with PHP's database access capabilities provides you with a powerful toolset for the generation of dynamic graphs on the web. This article has introduced you to some of the more advanced features of JpGraph like caching, background images and Client Side Image Maps. Hopefully you are now familiar enough with these technologies for you to consider using PHP and JpGraph for your next data mining project.
define('IMG_DIR', 'img/'); $graphName = IMG_DIR.'abc_channel_graph.png'; $graph = new PieGraph(WIDTH, HEIGHT); //the rest of the graph code... $graph->Stroke($graphName);
About The Author
$mapName = 'ABC_Region_Drill'; $imgMap = $graph->GetHTMLImageMap($mapName); print
April 2003 · PHP Architect · www.phparch.com
Listing 11: Third namespace example script output. THU MAR 13 00:08:28 2003
56
FEATURES As a result of output buffering, everything after the call to the constructor is treated as input to the transformer. This means that you need to be careful of how you use this driver because there’s no way to take control after you call the constructor. This driver makes it possible for both the script and the XML document to reside in the same file. You can also include the XML document from within your PHP script, or generate XML content based on some other data. Listing 12 demonstrates the use of this driver, and Listing 13 shows the output. Cache output driver The bundled Cache driver improves transformation performance by caching the results. You can change the cache behavior using the constructor parameters, which are then passed to the cache engine. This driver uses PEAR::Cache_Lite, so please read its documentation to see which parameters affect its behavior. You’ll also, obviously, need to make sure that it’s installed. If not, you can install it using the PEAR installer, like so: pear install Cache_Lite Listing 12: OutputBuffer driver example script.
Using PEAR::XML_Transformer The cache’s lifetime is based on an ID that can be specified as the second parameter of the [transform()] method. When this ID is not specified, the driver builds one from an MD5 checksum of the entire XML document. What the above means is that if you change the XML document, the cache is expunged and the document is re-transformed. It also means that you can control the cache lifetime by changing the XML document in a specific way, like inserting a time stamp, or simply by changing the ID that you pass in. This driver is excellent if you are transforming files for a large audience, like a web site. If content is not supposed to change at every transformation, you can rely on this driver to improve performance. Please note that disk space usage increases if you are transforming many different documents. For every document, the driver needs to save its cache on disk, usually in some temporary directory. Take a look at Listing 14 to understand how to use the Cache driver. This example uses the XML document from the previous “Hello World” example, and caches its output. The document is transformed only the first time the script is called. After that, the result is obtained from its cache. The output is shown in Listing 15.
Listing 14: Cache driver example script
1 23 24 25
1
Listing 13: OutputBuffer driver example script output
Listing 15: Cache driver example script output
Hello Bruno
April 2003 · PHP Architect · www.phparch.com
Hello Bruno
57
FEATURES Transforming files on-the-fly What if you want to transform files without needing to intervene? I will show you two ways to make a browser request to an XML file (with a .xml extension) and output its transformation’s result, instead of the original document. This is a good solution when you are deploying a transformation solution to an audience of XML content creators and you don’t want to bother them with the details behind your code. By using PHP’s auto_prepend_file configuration directive you can set a script to be automatically run before the script being requested is run. If this script is using the OutputBuffer driver described earlier, all you have to do is start the transformation engine (by calling the class constructor) and wait for the content to be read. This process can be automated using a custom Apache configuration, so that: 1. Files with the .xml extension are treated as PHP files. 2. PHP’s auto_prepend_file is set to the script responsible for the transformations. The example shown in Listings 16, 17 and 18 demonstrates this behavior. The “.htaccess” file contains all Apache specific configurations, the “.transform.php” file contains the transformation features, and the “example.xml” file is our example document. Both “.htaccess” and “.transform.xml” start with a dot (“.”) on purpose. This way, these files appear as “hidListing 16. .htaccess file. AddType application/x-httpd-php .xml php_value auto_prepend_file .transform.php php_value short_open_tag off
Listing 17: Prepended example script. 1
April 2003 · PHP Architect · www.phparch.com
Using PEAR::XML_Transformer den” to the browser. Listing 19 shows the output. Note: If we don’t include the short_open_tag ini directive in the .htaccess file, and your server allows the use of short tags (
Bruno Pedro is a systems engineer with ten years' experience in database-related applications. Since 1995 he has been developing web applications. He can be contacted at
[email protected].
Click HERE To Discuss This Article http://www.phparch.com/discuss/viewforum.php?f=16
Listing 22: Continued from Page 59 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62
function genXML($html, $namespaces, $tagName, $xmlHeader = '') { if (!empty($html)) { $xml = htmlentities($html); foreach ($namespaces as $namespace) { $xml = preg_replace('/<' . $namespace . ':(.*?)>/e', 'strtr("", array_flip(get_html_translation_table(HTML_ENTITIES)))', $xml); } $xml = $xmlHeader . '' . $xml . ''; return $xml; } return null; } $transformer = new XML_Transformer(); $transformer->overloadNamespace('greeting', new Greeting, false); $usedNamespaces = array_keys($transformer->_callbackRegistry->overloadedNamespaces); $html = implode('', file('example.html')); $xml = genXML($html, $usedNamespaces, 'htmlMask'); $result = $transformer->transform($xml); if (!empty($result)) { echo preg_replace('//', '', $result); } ?>
April 2003 · PHP Architect · www.phparch.com
60
FEATURES
FEATURES
When A Meets B
Writing a Parser and Expression Evaluator in PHP By Marco Tabini "Let's have a high-level look at some of the more interesting concepts underlying the creation of an entirely new programming language, starting with a brief tour through the creation of a simple parser and expression evaluator."
P
erhaps it’s the freshness in the air, or the fact that five consecutive months of snow outside will drive just about anyone crazy, but springtime always brings new ideas to my mind. Granted, not all of them are good—like, for example, applying that super-strong fertilizer to my lawn last year and then having to mow twice a week for the rest of the season—but at least they’re new. Every once in a while, I manage to actually have a good idea, from which something reasonably useful results. This year, as reported in my weblog at http://blogs.phparch.com, I came up with the idea of writing a PHP-to-C compiler in PHP. I know I’m not the only one who has decided to do this; I have, in fact, spoken frequently with at least a couple of people who have been researching the same kind of application, and I’m happy to report that we are all pursuing completely different solutions. (If you care to know, I’m happy because that means we’ll be able to tackle the problem from very different angles, which will ultimately provide the best solution possible. Yes, it’s called competition—although it’s entirely friendly in this case). In my approach, I have decided to use PHP5 as the interpreter for various reasons, not the least of which is
April 2003 · PHP Architect · www.phparch.com
that objects are finally passed by reference rather than by value, which makes programming with them a lot easier. Mostly, however, I chose to use PHP because it already includes a tokenizer that is capable of breaking down a PHP source file into its essential components (e.g.: variables, operators, and so forth). This can be a rather complicated task, and, in my opinion, it’s best left to the PHP interpreter. Parsing Source Code This doesn’t mean, however, that writing a parser and expression evaluator is not fun. In fact, I’ve always thought of it as one of the most pleasant forms of programming in existence. Because of their nature, expression evaluators in particular are very elegant and take advantage of many “interesting” concepts of computer science, such as recursive algorithms, without actually abusing them. This article illustrates a simple (but practical) REQUIREMENTS PHP Version: 5.0 Additional Software: None
61
FEATURES approach to writing a PHP script capable of parsing user input in an arbitrary format and executing commands that span multiple lines of code. In short, I’ll be showing you how to write a simple interpreter for a language that is completely arbitrary in nature (rather than being tied to a particular format, like C or PHP). First of all, we’ll divide our interpreter into three parts: 1. The tokenizer 2. The parser 3. The executor The role of each portion is extremely clear-cut. The tokenizer (also called a lexer or scanner), as I mentioned earlier, takes on the task of breaking down the source code into its fundamental bits. As far as I’m concerned, this is usually the most complex part of the interpreter, because it involves a lot of regular expressions (which are never fun to write) and a lot of careful thought. The parser reads the tokenizer’s output and turns it into a set of instructions that are sorted according to the exact order in which they must be executed, normally called bytecode source. This is where the real translation between what humans can write and what a computer can understand takes place. It should be noted that tokenizers and parsers are usually generated using compiler-compiler tools such as Lex and Yacc (or their open-source counterparts Flex and Bison) which can take a semantical description of the language and turn it into a set of functions (usually written in C) that can “automagically” perform the translation between source and bytecode. These tools dramatically cut down the time required to write a compiler, and usually produce a highly optimized lexer/parser library. Finally, the executor (or the interpreter proper) is responsible for reading the output of the parser and executing the commands it contains. Separating the parser from the executor accomplishes two goals. First of all, it is possible to parse a source file once and execute it many times, so that the overhead created by the parser can be taken out of the picture and the overall performance of the interpreter can be increased (sometimes substantially). This is, for example, the approach taken by PHP accelerators, such as those made by Zend or ionCube. In addition, a separate parser means that the executor’s code can be extremely focused on simply performing operations, without having to worry about the intricacies of the language at a semantical level. The Problem With Humans When trying to understand how compilers work, the first step is always to realize that compilers act in a substantially different way compared to the human brain. April 2003 · PHP Architect · www.phparch.com
Writing a Parser and Expression Evaluator in PHP This results in the need for techniques that are also substantially different from the ones humans normally apply. A “program” is, in the end, nothing more than a sequence of operations. My use of the term “operation” here is not limited to what people usually consider operations, like addition, subtraction and the like. It includes function calls, if-then-else statements, logical operations, bitwise operations, and so forth. The problem is that humans are used to writing operations in a format that is not self-contained, but that assumes a great deal of rules that have been defined a priori. For example, consider the following expression: 3 + 5 * 2 = ? Because of the way we’ve learned to evaluate expressions, we know that it is not sufficient to simply execute the operations as they appear as you read the input from left to right. If we did that, we’d come up with something like the following: 3 + 5 = 8 8 * 2 = 16 Clearly, unless you’re into Hollywood accounting, the result of this calculation is completely wrong. That’s because, by convention, we apply an established set of rules of precedence which clearly define the order in which each operation must be performed. In simple arithmetic, for example, the multiplication operation takes precedence over addition, so that the expression above should really be interpreter as follows:
“...expression evaluators are very elegant and take advantage of many interesting concepts of computer science.”
5 * 2 = 10 3 + 10 = 13 Notice how the order of the operators in the second line must remain the same. 3 + 10 is equal to 10 + 3 because addition is commutative, but if we were performing a subtraction or a division, changing the order of the operands would have also affected the final result of the evaluation. Naturally, we can also use parentheses to force the execution of operations in an order that differs from what is normally dictated by the rules of precedence: (3 + 5) * 2 = 16 How do you explain this to a computer? First of all, you create a notation that a computer can understand.
62
FEATURES
The Polish Connection While working on symbolic logic, Polish mathematician Jan Lukasiewicz decided that his chosen field of study was complicated enough without having to deal with operator precedence and pointless parentheses. He came up with a different notation—commonly referred to as the Polish Notation or PN—that simply places the operators in front of their operands rather than between them. For example: + * 5 2 3 This literally means “add the result of multiplying five by two to three”, which is exactly what the expression above indicated. The main difference, however, is that the Polish Notation is much easier for a computer to understand. In fact, we could rewrite the expression— yet again—as follows: add (multiply (5, 2), 3) This looks dangerously close to a set of function calls that would be very easy to execute for an interpreter. Because we have completely removed any and all ambiguities from the expression, there is no need for establishing any rules of precedence, and we have completely eliminated the need for parentheses. “...Reverse Polish There is, however, still one problem: it is impossible to Notation has very easily convert the standard significant repercus- notation to PN by reading one character at a time from sions, because it is an expression written using now really easy to the former. In fact, you still need to be able to see the rewrite an expreswhole expression before you sion using a tool can start reorganizing the in PN form. As a that computers know operations result, although a major step how to manipulate forward in making things easier for our interpreter, the really well: the PN is far from being perfect. stack.” This inadequacy must not have escaped Australian mathematician (and philosopher) Charles Hamblin, who, in 1957, came up with a novel idea: to turn the Polish Notation around, so that the operators appear at the end rather than at the beginning: 3 2 5 * + Such a simple change—which gave birth to the Reverse Polish Notation or RPN—has very significant repercussions, because it is now really easy to rewrite an April 2003 · PHP Architect · www.phparch.com
Writing a Parser and Expression Evaluator in PHP expression using a tool that computers know how to manipulate really well: the stack. Evaluating Expressions Transcribing an expression in RPN is a simple and elegant process. First of all, you must establish the precedence rules that govern your operations, as the parser must know about them. In our case, we use the simple rules shown in Figure 1 to handle addition, subtraction, multiplication, division and assignment (since we’ll be Figure 1: Operator precedence in our language (highest precedence first) *, /
Multiplication and division
+, -
Addition and subtraction
=
Assignment
using variables in our mini-language). Next, the algorithm works by using two stacks. The operator stack is used to store operators as they are encountered until the relationships between them can be univocally identified in accordance with the precedence rules. The data stack is used to store data elements as they are encountered in the expression. The algorithm follows these steps: 1. If the input is empty, pull all the remaining operators from the operator stack and push them in the result stack, then exit. 2. Retrieve the next input token 3. If the token is an operator: A. While the top operator in the operator stack has a higher precedence than the current operator, pull it from the operator stack and push it in the result stack B. Push the current operator in the operator stack 4. If the token is an open parenthesis, push it on the operator stack 5. If the token is a closed parenthesis, pull all the operators from the operator stack and push them on to the result stack until you encounter an open parenthesis. Pull the open parenthesis from the operator stack and discard both parentheses. 6. If the token is not an operator, simply push it on the result stack
63
FEATURES
Writing a Parser and Expression Evaluator in PHP
Let’s make an example. Consider—yet again—this expression: 3 + 4 x 5 = ? If we apply the algorithm above to this, the first step would be to move the operand ‘3’ to the result stack: Input: Operation stack: Result stack:
+4x5 (empty) 3
Next, we find the operator ‘+’, which we simply push on the operation stack, since it is empty: Input: Operation stack: Result stack:
4x5 + 3
The next token is ‘4’, which goes straight on to the result stack: Input: x5 Operation stack: + Result stack: 3, 4 We now come across ‘x’. Since the only operator on the operator stack has a lower priority, we do not move it to the result stack. We do, however, move the current operator to the operator stack: Input: Operation stack: Result stack:
5 +, x 3, 4
Finally, we come across the token ‘5’, which we push onto the result stack. Since the input queue is empty, we pull the operators on the operator stack and push them in the result stack: Input: Operation stack: Result stack:
(empty) (empty) 3, 4, 5, x, +
Note how the operations appear reversed on the result stack because they have been pulled from operation stack using a LIFO operation. In order to execute these commands, we will have to simply apply a recursive algorithm that pulls information from the result stack and applies whatever operation is necessary to it: 1. Pull an operator from the stack April 2003 · PHP Architect · www.phparch.com
2. Pull the right operand from the stack. 3. If the operand is an operator, then recurse to step #1 and put the result in the right operand slot. Otherwise, simply put the operand in the right operand slot. 4. Pull the left operand from the stack. 5. If the operand is an operator, then recurse to step #1 and put the result in the left operand slot. Otherwise, simply put the operand in the left operand slot. 6. Execute the operation and return the result It’s important to notice that, because of the way things are arranged on the result stack, the first operand that is pulled out of it goes in the “right” slot, rather than in the “left” slot. Once again, this makes little difference when you’re dealing with commutative operations, but it will as soon as you start using divisions and subtractions. Additionally, note that we assume here that all operations have two operands. This is not always true; for example, the unary minus (for example a = -b) only has one operand (although they can cunningly be rewritten as binary operators—e.g.: a = 0 – b), and the short-form if-then-else statement (a ? b : c) can be written as a ternary operation (but is usually treated as a special case).
"When trying to understand how compilers work, the first step is always to realize that compilers act in a substantially different way compared to the human brain."
Defining Our Own Little Language Before being able to parse our language, we must obviously define it. Although there is a mathematical way to do so, it’s really as easy as this: 1. Commands are separated by semicolons. 2. ‘Whitespaces’ are defined as the space character, the tab character and the newline character. Whitespaces are ignored. 3. An ‘Operator’ is one of the character combinations defined in Figure 1. 4. An ‘Integer’ is a sequence of numeric characters between ‘0’ and ‘9’ delimited on both sides by either a Whitespace or an Operator.
64
FEATURES
5. A ‘Float’ is at least one numeric character, plus the decimal dot character ‘.’ and at least one numeric character, delimited on both sides by either a Whitespace or an Operator. 6. An ‘Identifier’ (our variable) is a string of alphanumeric characters (plus the underscore) of arbitrary length that does not start with an underscore or a numeric character. 7. The “echo” special syntactical element consists of the word “echo” followed by an expression. A very important aspect of this description is that it is entirely lexical in nature. While we have identified all the elements that make up the language and the basic syntactical relationships between them, we did not describe the function that each element performs. That function is actually left to the parser. Identifying Tokens The job of identifying the tokens that make up our language is left, as I mentioned earlier, to the lexer. A general-purpose lexer works by running the current input string through a set of regular expressions and returning the token type associated with the one that returns the longest match against the input queue. In our case, however, I will cheat a little and actually break up the entire input source according to a set of regex based on the various language elements. Next, we run each token returned by the breakup through each individual regular expression and stop at the first one that matches. We can get away with this because our language is extremely simple in nature—a more complex one would require the lexer to simply identify the basic elements—operators, identifiers and numbers—and then leave it to the parser to actually try to understand what they are. When looping through the various regular expressions, it’s important to establish the precedence which each syntactical element has. Since we stop at the first regular expression that matches our input, we must ensure that, for example, we check for the ‘echo’ token before we look for an identifier. Putting It All Together Finally, it’s time to write some code. Listing 1 (included in the package) shows you the source code of the simple lexer/parser/interpreter that I have built. As you can see, it can be roughly divided in four parts: first of all, the script defines the syntactical and lexical elements of the language. Next, the tokenize function causes the source code to be broken up in its individual elements and passed on to evaluate(), which is our parser. The parser automatically breaks up the input in April 2003 · PHP Architect · www.phparch.com
Writing a Parser and Expression Evaluator in PHP lines—although this is not necessary, it makes it easier for the interpreter to simply have to worry about running commands, without actually dealing with breaking across lines. Finally, control is handed over to the interpreter, stored in the execute function, which simply goes through the stack returned by evaluate() for each line of code and performs each operation in sequence. Embedded in the source of the script is also a simple program in our language designed to show you how things work. I have also left a “debug” command on line 250 that outputs the operations as they are executed by the interpreter—this can be useful for understanding exactly what happens internally as our source code is being executed. If you run the script above through the CLI version of PHP, you should receive an output similar to the following: OPERATION: OPERATION: OPERATION: OPERATION: OPERATION: OPERATION: 800
a = b = c = a 8 * abc
10 100 2 c b = 800
Where To Go From Here The script I presented in this article only provides very minimal error handling and syntactical analysis. In other words, if you make a mistake in your source code, chances are that you won’t get an error message—the script will simply behave in an unexpected manner or crash. Clearly, that’s not an acceptable behaviour for a user-grade application, although adding error checking (at least on a basic level) is a trivial exercise. If you remain interested in parsers and compilers, you may want to check out Alan Knowles’ web site at http://www.akbkhome.com. Like myself and John Coggeshall (who is reporting his progress on his website at http://www.coggeshall.org), Alan has been doing some research in the feasibility of PHP-based compilers—and he actually developed a lexer and parser generator in PHP.
About The Author
?>
Marco is the Publisher of (and a frequent contributor to) php|architect. When not posting bogus bugs to the PHP website just for the heck of it, he can be found trying to hack his computer into submission. You can write to him at
[email protected].
Click HERE To Discuss This Article http://www.phparch.com/discuss/viewforum.php?f=17
65
BOOK REVIEWS
BOOK REVIEWS
For Your Reading Pleasure
MySQL by Michael Kofler Published by Apress $39.95 (US) 659 Pages After reading through a few of their books, I’ve come to the conclusion that Apress doesn’t like to publish ordinary books. My review of Jason Gilmore’s PHP book last month left me with perhaps the most positive impression of a beginner’s PHP handbook yet. This month’s subject is MySQL, a book written by Austrian Michael Kofler and translated into English by David Kramer (interestingly enough, Jason Gilmore seems to have participated in the creation of this book as well, as
April 2003 · PHP Architect · www.phparch.com
he is mentioned as the editorial director). While most of the MySQL-related books I’ve seen try to introduce the reader to the database system (often stopping on the way to explain what a database system is and how it works), this book introduces MySQL to a reader who already has a clear understanding of what databases are and how they work. Therefore, you will find no half-hearted attempt at explaining the theory of database design (beyond the needs of establishing some ground rules used throughout the book), which cannot possibly be all discussed in a single book—let alone in a book whose focus is already that of describing the intricacies of a particular platform. Don’t get me wrong—there are plenty of good database design introductions in many books out there. However, proper database design is a complex matter that is difficult to tackle in the few pages that the author of a book on a specific database system can afford. Thus, I’m happy to say that the author of MySQL made the decision of simply staying 66
BOOK REVIEWS
For Your Reading Pleasure
with the homonymous topic and focusing on it as much as possible. The book is roughly divided into three parts. The first is a short introduction that outlines the features and limitations of MySQL. In keeping with the Apress tradition, this portion of the book also describes the test environment used for the numerous examples offered through the remaining chapters. The second part of MySQL, entitled “Fundamentals”, deals with the day-to-day usage of MySQL: security, monitoring, database creation and data extraction through the use of SQL queries. In addition, the author provides two chapters made exclusively of working examples in PHP and Perl, and one chapter that focuses on MyODBC. The final section of the book introduces a
number of administrative concepts, such as backups, database migration, logging and data import/export. In addition, the last chapter of the book focuses on advanced topics, like replication, transactions, full-text search functions and the different types of tables available to MySQL users. At the end of the day, MySQL is one of the best MySQL books that I have seen, mostly because it presents its topics clearly and in a completely unpretentious fashion, placing great emphasis on the value of the examples it provides.
Administering and Securing the Apache Server
the surface, the book contains all the elements that are required to get to know Apache up close and personal: after a three-chapter introduction to web servers in general (and to Apache in particular), the author proceeds to tackle the typical Apache-related topics that we have all gone through while learning how to manage “the beast”: installation, basic configuration (both by direct manipulation of httpd.conf and through the usage of the ApacheConf utility), access control, virtual hosts, CGI scripts and modules. A third portion of the book deals with “advanced” topics, such as improving performance and security, using Server-Side Includes and URL mapping. Despite its picture-perfect layout, however, Administering fails, in my opinion, to properly address its intended audience of intermediateto-advanced developers. The level of depth with which each topic is tackled (and, in some cases, the topics themselves) would be more appropriate for a beginner-level reader who has never had any experience with running any web server—let alone Apache—before. In particular, the author often takes a didactic approach to illustrating the various concepts
by Ashok Appu Published by Premier Press 422 pages $49.99 (US) $77.95 (Canada) £36.99 (U.K.) As you may know, Apache is (and has been for many years) the world’s most popular web server. And being the world’s most used web server has its advantages, like, for example, plenty of research into how to make your configuration as solid as possible, both from a performance and security perspective. Administering and Securing the Apache Server attempts to fit itself into both categories, although with somewhat mixed results. On March 2003 · PHP Architect · www.phparch.com
67
BOOK REVIEWS that makes the book feel more like a reference than a how-to manual (the problem being, of course, that there are plenty of references for Apache available already). For example, the book dedicates only four pages to mod_rewrite, essentially listing the basic concepts of its usage and the configuration directive needed to include it in an Apache configuration. As anyone who has ever tried to use mod_rewrite will testify, however, that’s only a minimal part of what is really required to make URL rewriting possible. In fact, the really fun part comes when you start trying to give each of your rewriting rules the proper precedence, all the while making sure that they do not rewrite themselves (unless you want them to) and make Apache go into an infinite redirection loop. Similarly, the section on performance improvement gives several useful tips, but fails
April 2003 · PHP Architect · www.phparch.com
to address the strategic aspect of performance planning: benchmarking and load testing, using multiple-server configuration instead of virtual servers, and so forth. Overall, Administering works well for a beginner who is just learning the ropes of Apache and wants a more systematic approach to understanding its functionality. However, if you’ve been banging your head against the wall for a while trying to make Apache work better, rather than simply “work”, and are looking for an advanced resource on creating the ultimate Apache configuration, this book may not be for you.
?>
68
exit(0);
exit(0);
PHP For Suits: The Neverending Saga By Marco Tabini
A few days ago, after my return from the PHP Conference held in Montreal (and organized by the very capable PHP Quebec team, whom I’d like to thank for inviting us), I posted a news item on my weblog (available here: http://blogs.phparch.com/index.php?m=200303#101) with a few thoughts on the status of PHP from a business perspective. Technology-wise, I think PHP is doing very well. PHP5 is being developed at a nimble pace, and the new feature set is shaping up as the harbinger of major changes in the way we’ll develop PHP code in the near future. Several people, including myself, are starting to work on applications that would have seemed strange, like the integration of PHP within the .NET framework and several attempts at writing a true PHP compiler. From a business perspective, my point of view is that PHP needs a big shot in the arm. Businesspeople find no compelling reason to bet their enterprise vision on PHP; the reduced costs are somewhat offset by a lack of strategic direction and an “anything goes” approach that makes it all but impossible to tell where the language will be even a few months down the road. Naturally, PHP insiders know that this vision of our platform is not entirely true. The advantages of using PHP go well beyond its cost, and what looks like a maverick approach to its development on one hand makes the language dynamic and efficient on the other. Nonetheless, even though PHP is possibly one of the most popular web development platforms in existence, you’ll rarely find big businesses (and publicly-traded companies in particular) advertising their use of it. Unfortunately, when it comes to justifying any strategic decision to your
April 2003 · PHP Architect · www.phparch.com
69
exit(0);
board of directors, you need business reasons. You’ll likely be sitting in front of bankers, lawyers and academic types that do not understand (and do not want to understand) technology, so that any choice made at a technical level must be backed by a good amount of businesstalk. Alas, businesstalk is best provided through marketing materials, which PHP sorely lacks. As I’ve said more than once before, this is simply the result of having no large backer that is ready to pour out the money needed to produce it. Look at all of the other technologies, open-source or not, currently popular within the enterprise industry and you’ll always find a big name associated with it: Sun and Java, Microsoft and .NET. Even Linux has only become popular in big business thanks to the likes of IBM and other huge firms who have taken an interest in it. Somewhat to my surprise, my blog entry generated a bit of buzz; several people wrote me to share their thoughts, and some others logged their comments on their respective weblogs. The most humorous comment came from Michael Kimsal, whose idea that PHP is not as popular in the enterprise as it could be because its name starts with the letter P (like Perl and Python) is almost as eerily realistic as it is funny. John Lim, whose PHP Everywhere weblog (available at http://php.weblogs.com/) is perhaps one of the best resources available for PHP enthusiasts, thinks that PHP is still too immature to worry about business support. My thought here is that PHP has been around for quite some time now, and it will be at its best when PHP5 comes out. Perhaps, it would be a mistake to simply wait for these things to “just happen.” It’s very hard to find a solution to this problem, and I’m sorry to say that I do not have one to pull out of my hat (at least currently). Possibly, a consortium funded by PHP-centric companies might be able to raise the business profile of PHP. Given how fragmented and competitive the PHP market currently is, however, this would be an arduous task, particularly considering that anything produced by the consortium would have to be released back into the community, thus giving its backers no direct return for their investment in it. If you have any ideas, I’m sure we’d all like to hear them.
April 2003 · PHP Architect · www.phparch.com
70