This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
PHP Unit Testing Agile software development with PHPUnit
Industrial strength MVC Building a reusable development framework with Open Source tools
Getting a grip on LDAP A coder's introduction to working with directory services
Implementing search with Lucene Integrating a Java search engine API into your PHP site
Use OOP to manage your forms
www.phparch.com
Implementing object-oriented form libraries that promote uniformity and reusability
The WURFL project Extreme cross-platform WAP development with PHP This copy is registered to: Liwei Cui [email protected]
Plus: Tips & Tricks, Product Reviews and much more...
Introducing the php|architect Grant Program This is the LAST month to submit your project for approval. The deadline is June 18th! Hurry and submit your proposal. As PHP’s importance grows on the IT scene—something that is happening every day—it’s clear that its true capabilities go well beyond what it’s being used for today. The PHP platform itself has a lot of potential as a general-purpose language, and not just a scripting tool; just its basic extensions, even discounting repositories like PEAR and PECL, provide a highquality array of functionality that most of its commercial competitors can’t afford without expensive external components. At php|a, we’ve always felt that our mission is not limited to try our best to provide the PHP community with a publication of the highest possible quality. We think that our role is also that of reinvesting in the community that we serve in a way that leads to tangible results. To that end, this month we’re launching the php|architect Grant Program, a new initiative that will see us award two $1,000 (US) grants to PHP-related projects at the end of June. Participating to the program is easy. We
invite all the leaders of PHP projects to register with our website at http://www.phparch.com/grant and submit their applications for a grant. Our goal is to provide a financial incentive to those projects that, in our opinion, have the opportunity to revolutionize PHP and its position in the IT world. In order to be eligible for the Grant Program, a project must be strictly related to PHP, but not necessarily written in PHP. For example, a new PHP extension written in C, or a new program in any language that lends itself to using PHP in new and interesting ways would also be acceptable. The only other important restriction is that the project must be released under either the LGPL, the GPL or the PHP/Zend license. Thus, commercial products are not eligible. Submit Your Project Today! Visit http://www.phparch.com/grant for more information
2
TABLE OF CONTENTS
php|architect Departments
INDEX
5
EDITORIAL RANTS
Features
9
Building a reusable development framework with open source tools By Jason E. Sweat
Rant Mode: On - the PHP/MySQL 'Platform'
7
NEW STUFF
Industrial Strength MVC
24
Agile Software Development With PHPUnit By Michael Hüttermann
38
REVIEW SourceGuardian Pro By Peter James
59
Lucene Integrating a Java search engine API into your PHP site By Dave Palmer
REVIEW PHPEdit By Peter James
70
30
TIPS & TRICKS By John W. Holmes
42
Tailoring W@P sites with WURFL By Andrea Trasatti
49
Getting a grip on LDAP By Brian K. Jones
73
exit(0); Worlds Apart By Marco Tabini
62
Object-oriented Form Management With PHP By Marco Tabini
June 2003 · PHP Architect · www.phparch.com
3
The designers of PHP offer you the full spectrum of PHP solutions
Serve More. With Less. Zend Performance Suite Reliable Performance Management for PHP
Visit www.zend.com for evaluation version and ROI calculator
Technologies Ltd.
EDITE O D R IT IA O LRR IAA LNTS
EDITORIAL
Rant Mode: On - the PHP/MySQL 'Platform' Recently, it has come to my attention that there are some informational channels that have taken to calling ‘PHP/MySQL’ a ‘platform’, in the same vein as ASP.NET, J2EE and the like. This, in my opinion, is nothing short of a travesty. I will not name names (for the most part) regarding individual culprits, because it would only give them publicity. However, there is a certain developer’s website which fails to list PHP without MySQL by its side. It has ‘PHP & MySQL Tips and Tutorials’, ‘PHP & MySQL Apps and Reviews’, and a couple of other departments devoted to the PHP/MySQL ‘platform’, without a single hint that PHP can be used in other ways. In addition, documentation for a certain large company’s commercial IDE also refers to this popular duo as a ‘platform’. Probably the most surprising offenders in the perpetuation of this stereotype are the PHP conference organizers. MySQL plays such a prominent role in the talks and tutorials at PHP conferences that, if you go to one with no experience in PHP, you would leave thinking that PHP is primarily an interface to the MySQL database! This is unfortunate, to put it lightly. Truthfully, it disgusts me. Unfortunately, there are a number of conditions which currently exist in the world of PHP that could arguably be used to justify the actions of these groups. For example, have you done a search for ‘PHP MySQL’ at Amazon lately? I did this recently and
Editorial Team Arbi Arzoumani Brian Jones Peter James Marco Tabini
Graphics & Layout Arbi Arzoumani
Administration Emanuela Corso
Authors Andrea Trasatti, Brian K. Jones, Dave Palmer, Jason E. Sweat, Marco Tabini, Michael Hüttermann, Peter James php|architect (ISSN 1705-1142) is published twelve times a year by Marco Tabini & Associates, Inc., P.O. Box. 3342, Markham, ON L3R 6G6, Canada. Although all possible care has been placed in assuring the accuracy of the contents of this magazine, including all associated source code, listings and figures, the publisher assumes no responsibilities with regards of use of the information contained herein or in all associated material.
counted 16 books devoted solely to the use of PHP with MySQL as a data source! Here is the very real truth: MySQL is NOT the only data source PHP is capable of working with. There are scores of developers who need to know this. Pass it on. Quite honestly, I’m tired of downloading applications from Freshmeat and elsewhere which require MySQL, only to find that they also contain PHP code to implement some feature that should be offloaded to (or better, built into) the database! This extra code introduces bugs, makes it more difficult to maintain, and inevitably slows down the application, and in the process makes PHP look unnecessarily slow and bulky. Note that this is not a rant targeted at people who are using MySQL because they have properly evaluated their needs and found it to be the best tool for the job. Nor is it aimed at newbies using MySQL as an introduction to the world of data-driven development. I’m simply trying to enlighten some poor souls who might think that MySQL is the only choice they have when it comes to using PHP for their development needs. PHP has native support for Sybase, Oracle, DB2, Informix, MS SQL Server, and other databases (yes, there are other databases). Aside from databases, PHP has native support for alternative sources of data such as SNMP agents and LDAP directories. In short, PHP is a very capable development platform without the help of MySQL. In this month’s issue of php|architect, you can capture a glimpse of a couple of June 2003 · PHP Architect · www.phparch.com
these different data sources in action. As promised, I’ve written a coder’s overview of using PHP with LDAP. It’s a very highlevel, gentle discussion, light on code and long on cold, hard facts you’ll need to know if a client’s environment ever forces you to code against an LDAP directory. In addition, Jason Sweat returns this month with a look at using different ‘ready made’ open source tools and frameworks to lighten the load on enterprise application developers. In the article, you’ll be able to get a feel for the kinds of things you can do with some of the other databases out there (Jason’s article uses PostgreSQL, in particular). As always, php|architect will strive to bring you the information you’d expect from any publication of our kind. This will include MySQL, of course. However, we’ll also try to debunk any myths or misstatements regarding PHP that exist out in the wild, like the erroneous labeling of PHP and MySQL as some sort of unified ‘platform’. As always, your opinions on this and anything you find in the pages of php|architect are of great interest to all of us here – so make you voice heard in our inboxes or the forums at the php|architect website. Enjoy!
6
NEW STUFF
What’s s New!
NEW STUFF
PHP and Java?
F
rom the rumor mill department—we've heard that something big is about to happen between Java and PHP, and that it will be announced at the JavaOne Conference on June 9th in San Francisco. We do not yet know what the announcement will be about—but rest assured that we have unleashed our hounds. Keep an eye on our website on June 9th for more information!
+ MySQL Beta Certification Exam MySQL AB announced this month that the beta version of the MySQL Professional Certification exam is available. With successful completion of the Professional Certification beta exam, you can earn a valid MySQL Professional Certification -- the most advanced MySQL credential -- to demonstrate strong proficiency in working with the MySQL database. Through the MySQL certification program, MySQL software developers can earn one or more formal credentials that validate their knowledge, experience and skill with the MySQL database and related MySQL AB products. This program now includes two certifications: MySQL Core Certification and MySQL Professional Certification. The MySQL Core Certification provides MySQL users with a formal credential that demonstrates proficiency in SQL, data entry and maintenance, data extraction for reporting and more. The MySQL Professional Certification is for the more experienced MySQL user who wants to certify his or her knowledge in MySQL database management, installation, security, disaster prevention and optimization. For more information, visit MySQL.com.
June 2003 · PHP Architect · www.phparch.com
=
?
Pear Info Package Pear announced the release of a new info package. This package generates a comprehensive information page for your current PEAR install. The format for the page is similar to that for phpinfo() except using PEAR colors. The output has complete PEAR Credits (based on the packages you have installed) and will show if there is a newer version than the one presently installed (and what it's state is). Each package has an anchor in the form pkg_PackageName - where PackageName is a casesensitive PEAR package name. Visit PEAR.php.net to download the new package.
Mozilla Firebird Mozilla.org has announced the release of Firebird 0.6. Mozilla Firebird is a redesign of the Mozilla browser component, similar to Galeon, K-Meleon and Camino™, but written using the XUL user interface language and designed to be cross-platform. This latest version includes:A New theme, Redesigned Preferences Window, Improved Privacy Options, Improved Bookmarks, Talkback Enabled, Automatic Image Resizing, Smooth Scrolling, MacOSx Support and much more. For more information, or to download, visit Mozilla.org.
7
FEATURES
FEATURES
Industrial Strength MVC Building a Reusable Development Framework With Open Source Tools By Jason E. Sweat
In the May issue, “An Introduction to MVC Using PHP” showed you the general background and a simple demonstration script of the Model-View-Controller pattern. This article aims to take you to the next step: applying these principals in a realistic application.
Introduction This article assumes that you have either read the aforementioned “An Introduction to MVC Using PHP” article, or that you are already somewhat familiar with the MVC pattern, OO programming in PHP, and have at least looked at the Phrame examples. The previous article highlighted proper use of the MVC pattern, with business logic in the Model classes, presentation logic in the View classes, and application flow directed by the Controller classes (ActionController, Action, ActionForms and ActionForwards in Phrame). Where the previous article only stored data in the session, this article steps it up a notch towards the “real world” by making extensive use of a database.
The Application To give this article a little more “real world” flavor, I would like to start with a hypothetical set of requirements for the application. The application is a management system for hyperlinks. The people who commissioned the application have identified three key sets of requirements: users, administrators and infrastructure. User: • The user will access this application as a web site • The list of links will be organized into groups
June 2003 · PHP Architect · www.phparch.com
• The main link list will contain all of the links in the application on a single page, so they can all be printed at once • Each page of the application will contain the current date • The user will be able to view a summary of all the link groups, and will be able to jump directly to that group on the main listing of links Administrator: • The Admin will be able to maintain the links using the web site • The site will be able to detect the Admin, and display a link to the editing pages as appropriate • The Admin will be able to add, modify or delete both link groups and links. The Admin can change the sequence that both groups and links withing groups are presented in the application, ease of use is also important REQUIREMENTS PHP Version: 4.0.6 O/S: Any Database: PostgreSQL 7.3 Additional Software: Phrame, ADOdb, Smarty, Eclipse
9
FEATURES • Security should be in place to prevent unauthorized users from becoming the application administrator, or from using the administrative functions without being authorized as the administrator. Infrastructure • Security is a serious concern, in particular, and credentials used by PHP scripts to access the database should have the bare minimum rights required to perform the tasks required (in case the web site code is ever compromised) • The application needs to be “future proof”, specifically this web application might not be the only client and/or the only source/editor for links in this application • This application will transition to other resources for maintenance, so it is important that it is well structured for both flexibility and ease of maintenance • The application should be designed so HTML designers can alter the appearance of the application without changing any source code • The data should never become corrupted, i.e. Links should never refer to a group that does not exist • To assist in debugging problems, the system should track the date and time at which groups and links are both created and modified
Industrial Strength MVC A quick review of these requirements can tell us a few things. The fact that the application is basically a web site means that PHP is certainly a leading candidate for implementation. The requirement for transitioning to other resources to maintain the application, and the desire for a robust and flexible framework, push us in the direction of implementing an MVC-style application. The fact that “each page” must have a date stamp, and possibly a link to the editing pages, tends to indicate we should establish some sort of a site rendering framework. This is often implemented with headers and footers in templates. Finally, with websiteindependent business logic, strict referential integrity, queries across multiple tables (at least if we model links and groups in separate tables) and date columns to be modified on each SQL request, it looks like we have moved somewhat beyond MySQL’s fast retrieval of simple queries sweet spot, and another RDBMS will be required.
Developing the Application To review the development of this application, I think it is appropriate to take a look at the overall infrastructure first, which dovetails into Models. Next, reviewing how the Controller (Phrame) implements application flow in this application, and lastly how the views are implemented in this application. Infrastructure The first decision is what language the web application will be constructed in. Since this is a magazine devoted to PHP, I think that is an appropriate choice ;) The choice of PHP allows us to pull out our now familiar bag
Nobody...
As the publishers of Ian's Loaded Snapshot we know OSCommerce!
Hosts OSCommerce Better!
100's of OSCommerce powered sites rely on our years of experience with OSCommerce, direct
We Guarantee It! PHP, mySQL and Curl Optimized for OSCommerce Free Shared Certificate & Integrated SSL Server 20+ Contributions Pre-Installed on MS1 Release Web Mail and Web Based File Manager Full FTP and phpMyAdmin Access Free Ongoing Hands-On Support Web Stats by Urchin Reports Free Installation and Configuration
USE PROMO CODE: phpa Get an Extended Free Trial and Free Setup! June 2003 · PHP Architect · www.phparch.com
866-994-7377 or [email protected] www.chainreactionweb.com www.chainreactionweb.com/reseller.
10
FEATURES of tricks: Phrame for MVC, ADOdb for database abstraction and Smarty for templating. The second decision is what the persistent data store for this application will be. I don’t think there would be much room for disagreement in saying a database is the most appropriate technology here. The project requirements indicate that referential integrity, triggers, views and stored procedures will be needed in the database. There are a variety of databases that have these capabilities; Oracle, Microsoft SQL Server, Sybase and SAPdb, to name just a few. To make this article more accessible to readers, I am going to use the most popular open source database that supports these features: PostgreSQL. There is an interesting requirement to have applications other than this PHP web application be a possible source and/or consumer of these links. This implies that if we coded the business logic for this application in PHP, it would have to be reimplemented in whatever the other application ends up being developed in. This could also lead to possible differences in the implementation logic, and would be overall a bad design choice. Fortunately, there is an alternative available to us: code the business logic for the system into the database itself. This means the business rules are implemented in a single location, accessible by any applications needing to view or modify this data. By making data available to client applications only through views, and modifications to the data only performed using stored procedures, we can also address some of the security requirements. For long term maintenance, performance and flexibility, we will implement ADOdb for a database abstraction layer. Continuing to build on what we learned regarding the Phrame implementation of an MVC in PHP, we will again use Phrame for this example application. We can implement several of the “look and feel” design requirements by adoption Smarty templates is our views, and having a common “header” and “footer” template inclusion for common elements. These details will be covered further in the section of the article dealing with Views. With all of these infrastructure decisions in place, we can now visualize our application as a “stack” of technologies. Viewed from this perspective, our application can be depicted as illustrated in figure 1. This figure is useful to help enforce some of the concepts in our design. The figure identifies the conceptual building blocks of the application in blue, the implementing technology in green and the specific project, library or application used in this example in yellow. Which portion of the Model-View-Controller design pattern each of the application blocks is most closely associated is depicted on the left. Moving from the bottom of the stack up, the techJune 2003 · PHP Architect · www.phparch.com
Industrial Strength MVC nology implementing our blocks is the database. You should note the majority of the business logic is implemented in the database, thus extending the Model portion of our framework into the database itself. The next block up in our application stack is the database abstraction layer. In this case, I have implemented this project in ADOdb, but you easily could substitute PEAR::DB, Eclipse, DBX or whatever else is your favorite db abstraction layer (or even code using the Figure 1
11
FEATURES native PHP db calls, eliminating the abstraction layer benefits of long term portability, simplified calling conventions and overall flexibility). The green bar also denotes the shift in implementing technology from the database to our scripting language of PHP. NOTE: A developer with long term flexibility, or a desire to completely isolate model business logic (perhaps because the Model can be used in multiple applications) might choose to implement the business logic in a web service. In this case, the application model classes would be implemented as web service clients.
Your Model classes make use of the database. Remember from the previous article that only Model classes should access your persistent data store. The Model classes are also where you can implement data validation, error handling, and other rules of your business logic. The application flow is directed by “Herein lies the power of refthe Controller, erential integrity: the database which coincidentally is the middle is doing housekeeping for us. of our application As PHP programmers, we can stack. This project is implemented in focus on manipulation and Phrame, but you presentation of the data withcould substitute out worrying about corrupting any of the other projects menthe data model with our SQL tioned in the prior statements.” article, or roll your own Controller. The chief role of the controller is to delegate the user’s choice of actions to the appropriate Models or Views. In this application, we have also implemented security at this level, as each restricted action requires validation that the user is in fact an administrator prior to performing the action. Views then perform the task of interacting with the application Models to extract the data required for the user. Views may need to transform this data in order to make it fit with the presentation technology used in your application. For this example, we continue to make use of the Smarty template engine, as we did in the prior article. The last application block is the HTML that is transmitted to the user’s browser via HTTP. This really is part of our applications View logic as well, because your application has no functionality without the pages being rendered. At this point the green bar on our figure depicts the change in the implementing technology from PHP to the user’s browser. June 2003 · PHP Architect · www.phparch.com
Industrial Strength MVC Models Given the decision to place a good deal of our business logic in the database, this is a good starting point for the section on models. There are some preliminary items I should cover. First of all, the database this was implemented on is PostgreSQL 7.3. I created a user named linkdbo, with the ability to create databases, who will be the database owner for our links database. I created another user called linkuser, who will have minimal rights, and will be the user accessing our data from PHP. Two groups were created, links_admin and links_user. Groups are the Postgres equivalent of roles, and are a convenient way to assign rights to groups of users. It is a good database programming habit to always implement your security through roles. NOTE: To prepare your Postgres db for this example, login as linkdbo to the links database and run these scripts from the code bundle in the following order:
link_group_ddl.sql link_ddl.sql link_views_ddl.sql Let’s start with the tables. In our application, we want to track links, and have them organized into groups. In a normalized database design, this implies that we need two tables, one for the links and one for the groups they belong two. This first table is for the link groups. DROP TABLE link_group CASCADE; CREATE TABLE link_group ( link_group_id serial PRIMARY KEY, group_name varchar(50) UNIQUE NOT NULL, group_desc varchar(255)NULL, group_ord integer NULL, date_crtd timestamp(0) with time zone DEFAULT CURRENT_TIMESTAMP, date_last_chngd timestamp(0) with time zone DEFAULT CURRENT_TIMESTAMP ); GRANT ALL ON link_group TO GROUP links_admin; GRANT ALL ON link_group_link_group_id_seq TO GROUP links_admin;
For readers who are not familiar with the Postgres syntax, there are a few nuances to pay attention to here. First of all the link_group_id field is declared as type serial with a constraint of PRIMARY KEY. The serial type is a shortcut for creating a sequence in the database, and selecting the next value from the sequence as the default to populate the field when performing and insert operation. The PRIMARY KEY constraint enforces that the field must be unique and not null. The next
12
FEATURES item of interest is the date_crtd field with a constraint of DEFAULT CURRENT_TIMESTAMP. This constraint means that any time a record is inserted, and this field’s value is not specified, it will instead be created with the current data and time. The last two GRANT statements designate our security. What is most interesting here is that which is conspicuous by it’s absence: the links_user group has no rights at all - not even SELECT - to the link_group table. This fact is an important consideration to remember as we address function security later on. DROP TABLE link CASCADE; CREATE TABLE link ( link_id serial PRIMARY KEY, link_group_fk integer REFERENCES link_group ON UPDATE CASCADE ON DELETE NO ACTION NOT NULL, name varchar(50) NOT NULL, url varchar(255) NOT NULL, link_desc varchar(255) NULL, link_ord integer NULL, date_crtd timestamp(0) with time zone DEFAULT CURRENT_TIMESTAMP, date_last_chngd timestamp(0) with time zone DEFAULT CURRENT_TIMESTAMP ); GRANT ALL ON link TO GROUP links_admin; GRANT ALL ON link_link_id_seq TO GROUP links_admin;
In the links table, a new type of constraint is introduced: REFERENCES. This constraint is how Postgres implements referential integrity. In this case, we have specified that this field will match the primary key from the link_group table. With just this portion of the constraint alone, you will never be able to insert rows into the link table without an appropriate value for the link_group_fk (fk stands for foreign key). We have also qualified this constraint to further clarify the expected behavior of this relationship. ON UPDATE CASCADE means if the link_group_id changed for any reason on a row in the link_group table that was referenced in the link table, all of the associated links would also change (we have no intention of doing this in the application, but it does not hurt us either). ON DELETE NO ACTION means that the database will prevent any SQL statement that tries to delete a row from link_group that is referenced by one or more links from happening. Herein lies the power of referential integrity: the database is doing housekeeping for us. As PHP programmers, we can focus on manipulation and presentation of the data without worrying about corrupting the data model with our SQL statements. Security on the link table, as with the link_group table, grants no SELECT privileges to the links_user group. How is it that we will be able to query the database for this data? The answer is views, which are basically a pre-defined SELECT statement that appears as if it were another table of data you can query. The folJune 2003 · PHP Architect · www.phparch.com
Industrial Strength MVC lowing SQL statements define a view to retrieve information regarding link_groups. DROP VIEW groups; CREATE VIEW groups AS SELECT lg.link_group_id ,lg.group_name ,lg.group_desc ,lg.group_ord ,count(l.link_id) AS link_cnt ,max(l.date_crtd) AS link_add ,max(l.date_last_chngd) AS link_upd FROM link_group lg LEFT JOIN link l ON (lg.link_group_id = l.link_group_fk) GROUP BY lg.link_group_id ,lg.group_name ,lg.group_desc ,lg.group_ord ORDER BY lg.group_ord; GRANT ALL ON groups TO GROUP links_admin; GRANT SELECT ON groups to GROUP links_user;
Here we have now granted SELECT rights to links_user, so this view is available to query in our PHP scripts. This view also provides some summary information regarding links associated with each link group by doing a LEFT JOIN (selecting all link_groups, and links where they match) and using aggregate functions like count() and max(). The requirements we reviewed earlier specified having fields to capture timestamps for both the creation and the last update times for each row. We saw how the DEFAULT CURRENT_TIMESTAMP constraint could be used to automatically populate the date_crtd field, but how can you have the database automatically update the date_last_chngd field where rows are updated? The answer is to use a database trigger. DROP FUNCTION trig_upd_dates() CASCADE; CREATE FUNCTION trig_upd_dates() RETURNS TRIGGER AS ‘BEGIN new.date_last_chngd := now(); RETURN new; END; ‘ LANGUAGE ‘plpgsql’; CREATE TRIGGER link_group_upd BEFORE UPDATE ON link_group FOR EACH ROW EXECUTE PROCEDURE trig_upd_dates();
In Postgres, the creation of a trigger involves two steps: creating a function, and setting the trigger to use the function. In this case, the trig_upd_dates() function changes the value of the date_last_chngd field to be the current timestamp (the result of the now() function) in the row to be updated. The CREATE TRIGGER statement then implements the function for each row that is updated.
13
FEATURES
NOTE: Postgres has another kind of stored procedure that is activated like a trigger called a rule. Rules are used when the trigger needs to interact with another table. An example of this kind of functionality might be to have an audit table tracking changes to an important base table, in which the rule on the base table inserts values into the audit table as updates take place. Having this kind of programmatic logic in the database frees the PHP developer from having to implement much of the data oriented business logic in the scripts. Now that we have seen how to view data, and how the database itself tracks some of our data requirements, the question still exists-how do we modify the data without rights to the table? The answer is to use functions, and in particular, to take advantage of the ability to define security for a function that executes as the person who created the function, rather than the user of the function. We can walk through one example of a function that modifies data in the link table: DROP FUNCTION chgrp_link(INTEGER, INTEGER); CREATE FUNCTION chgrp_link(INTEGER, INTEGER) RETURNS INTEGER AS ‘ DECLARE ch_link_id ALIAS FOR $1; ch_group_id ALIAS FOR $2; max_ord INTEGER; linkrec link%ROWTYPE; grouprec link_group%ROWTYPE; BEGIN SELECT INTO linkrec * FROM link WHERE link_id = ch_link_id; IF FOUND THEN IF linkrec.link_group_fk = ch_group_id THEN RAISE NOTICE ‘’link % is already in group %’’,ch_link_id,ch_group_id; RETURN 0; END IF; SELECT INTO grouprec * FROM link_group WHERE link_group_id = ch_group_id; IF FOUND THEN SELECT INTO max_ord count(1) FROM link WHERE link_group_fk = linkrec.link_group_fk; IF linkrec.link_ord < max_ord THEN PERFORM ord_link(ch_link_id, max_ord); END IF; SELECT INTO max_ord count(1) FROM link WHERE link_group_fk = ch_group_id; UPDATE link SET link_group_fk = ch_group_id ,link_ord = max_ord + 1 WHERE link_id = ch_link_id; RETURN 1; ELSE RAISE EXCEPTION ‘’no group with id % found’’,ch_group_id; RETURN 0; END IF; ELSE RAISE EXCEPTION ‘’no link with id % found’’,ch_link_id; RETURN 0; END IF; END; ‘ LANGUAGE ‘plpgsql’ SECURITY DEFINER;
June 2003 · PHP Architect · www.phparch.com
Industrial Strength MVC
This one is going to take some explanation, so here we go! First of all, near the bottom we see LANGUAGE ‘plpgsql’, so the language this function is written in is plpgsql. This procedural SQL language is distributed with Postgres, and you can read the Postgres documentation for instructions on how to install and use plpgsql (http://www.postgresql.org/docs/view.php?version=7.3&idoc=1&file=plpgsql.html). The CREATE FUNCTION statement defines the name of the function and the parameters it takes (Postgres supports function overloading-multiple functions with the same name but different input parameters-but don’t worry, I didn’t use any), in this case, the function accepts two integer values as parameters. NOTE: The remainder of the function definition is enclosed in single quotes. This means that if you want to use quotes in your function, you have to remember to escape them! In the DECLARE section, we can create local variables we want to use (Postgres is statically typed, so unlike PHP, you must declare a variable and its type prior to use). We can also create more useful names for the input parameters than “$1” as evidenced by the use of ALIAS. By looking at the DECLARE section we can see that the first integer parameter is the id of the link we want to change, and the second is the id of the group we want to change the link to. The rest of the function is in the statement delimited by BEGIN and END. The first step in our function is to validate the link requested to change actually exists. We perform this step by attempting to select the row from link with the user specified id into the variable linkrec. The next statement checks to see if a record was found. If it was, we move on with the next step, otherwise, if you look near the bottom of the code where the else branch for that check is, we RAISE EXCEPTION with an error message (quotes escaped, as noted above). By raising an EXCEPTION, the sql statement will result in an error and no result set will be returned. We can use this fact to trap for errors in the PHP code, and this will be covered later in the article. Now that we know that the link with the correct id
“This application implements another design pattern-the Factory Pattern-to retrieve a specific subclass of a View base class.”
14
FEATURES exists, the next thing we check is if we are asking to change the group to an identical value. If so, we RAISE NOTICE, that we were asked to essentially do nothing, and RETURN 0. Because we raised a NOTICE, a result set will still be returned as a result of this function call. Assuming the link group we are changing to is not the same as the existing link’s group, the next step is to validate that the requested link group we want to change to exists. This is performed in a similar fashion to checking for the existence of the link. If we do not find the link, it is time to RAISE EXCEPTION again. Since that all checks out, we are almost ready to update. But first, we need to see if this link is the last link in it’s group. If not, we move it to the end (using another function we have already defined - ord_link). This is done so that the sequence of links within a group does not get a gap in it. We will also determine the end position in the new group, so we can position the link there. Then we perform the actual UPDATE statement, and RETURN 1, indicating success. This may all look like a lot of work - is it really worth it? Remember this is all part of the requirements for data integrity in our project. Consider this from the perspective of coding in a PHP script. Would you want to code and run all of the above logic in PHP, or simply execute SELECT chgrp_link(1,2)? The last line is SECURITY DEFINER, which specifies the function should run with the security of linkdbo, rather than the user executing the function. This is what allows us to log in through PHP as link_user, have no access to the base tables, and yet still modify the data. NOTE: This feature was added in the 7.3 release of PostgreSQL. The function should run on a lesser version of Postgres with minor modifications, however; you will have to grant SELECT, UPDATE, INSERT and DELETE privileges to link_user, defeating the security purpose of these functions and views. The rest of our database API is similarly defined with functions, and is summarized in Table 1. You can review the scripts in code/mvc/sql for the implementation of all of the tables, triggers, views, functions and sample data used in this application. With all that database work out of the way, we can finally get back to the subject we all know and love: PHP! Let’s move up the application stack a block or two and dig into the Model classes, which will actually be accessing the database code we just developed. The classes that access the database break down pretty easily in this application to two classes-Links and Groups. In addition, we will want to model the user of our system, primarily for security (to determine if this particular user is an application administrator). Lastly,
June 2003 · PHP Architect · www.phparch.com
Industrial Strength MVC Table 1: Database API Functions
Data Access links
View providing details of individual links and the groups that they are associated with.
groups
View providing details of link groups, including summary data regarding associated links.
Data Manipulation Function/Description
Parameters
add_link_group adds a new link group to the database
Varchar - group name Varchar - group description
upd_link_group updates an existing link group
Integer - link group id Varchar - group name Varchar - group description
del_link_group remove an existing link group
Integer - link group id
ord_link_group change the sequence of an existing link group
Integer - link group id Integer new order sequence
add_link add a new link to the database
Integer - link group id to associate Varchar - name for the link Varchar - url for the link Varchar - description for the link
upd_link update an existing link in the database
Integer - link id to modify Varchar - name for the link Varchar - url for the link Varchar - description for the link
del_link remove an existing link from the database
Integer - link id to remove
ord_link_group change the sequence of an existing link within all links associated in the group
Integer - link id Integer new order sequence within the group
chgrp_link change the group a link is associated with
Integer - link id Integer - new link group id
15
FEATURES we will have a model for Errors, similar to the previous article. An excerpt from Groups.php can be found in Listing 1. I like to create constants for the SQL statements I intend to use in the class. To avoid potential name space conflicts, I generally prefix the constants with the name of the class (or an abbreviation of the class name if it is long). The heredoc syntax is used for additional clarity on the the multi-line SQL statements. One item to consider is how to deal with values that change at runtime (substituting values like dates or ID fields into the statement). How can you make a constant flexible Listing 1: Excerpt from Groups.php define('GROUPS_INFO_SQL', <<<EOS SELECT * FROM groups WHERE link_cnt > 0 EOS ); define('GROUPS_DETAIL_SQL', <<<EOS SELECT * FROM groups EOS ); define('GROUPS_ADD_SQL', 'SELECT add_link_group(?, ?)'); // ... additional SQL statements defined class Groups { // ... constructor and vars defined function GetInfo($pbEmpty=false) { global $go_conn; $s_sql = ($pbEmpty) ? GROUPS_DETAIL_SQL : GROUPS_INFO_SQL; $o_rs = $go_conn->Execute($s_sql); if ($o_rs) { return $o_rs->GetArray(); } else { trigger_error(DB_OOPS."\n" .$go_conn->ErrorMsg()); return false; } } function Add($psName, $psDesc) { global $go_conn; $a_bind = array($psName, $psDesc); $o_rs = $go_conn->Execute(GROUPS_ADD_SQL, $a_bind); if ($o_rs) { return true; } else { trigger_error(DB_OOPS."\n" .$go_conn->ErrorMsg()); return false; } } // ... additional functions defined }
June 2003 · PHP Architect · www.phparch.com
Industrial Strength MVC enough to handle this? There are two easy approaches I have used: format the constant for processing with [sprintf()], or user ADOdb bind variables. The latter method is shown in the example code and described below. Next we define the Model class itself. Groups::GetInfo() and Groups::Add() are representative examples of model methods. Each uses a global ADOdb connection object. This database connection is established in the application setup file (links_setup.php). Groups::GetInfo() next selects the appropriate SQL statement based on how it was called, and then executes the SQL and stores the results in a result set object. Next, we check for valid execution. If so, we return the result set as an array, otherwise, we trigger an appropriate error message. Groups::Add() is similar, but adds the concept of a bind array. Each of the ‘?’ in the SQL statements will be substituted in order with values from the array. This is an example of the handling of dynamic runtime data with a constant SQL statement mentioned above. The Links model is similar to the Groups model. I encourage you to review the code bundles code/mvc/app/models directory for the full PHP scripts. The concept of the User model is essential to understanding the security within this application, so the whole User.php script is presented in Listing 2. This Model class again defines some constants to be used in the class definition. The User class has three methods User::IsAdmin(), User::SetAdmin() and User::ValidateAdmin(). The User::IsAdmin() method checks for whatever conditions we determine qualify a user as an administrator, and returns a boolean value based on the result of these checks. In this case, I have implemented logic that says an administrator is anyone who: • is browsing from a particular subnet, • is browsing from the localhost, or • has passed a cookie to the application with the name ‘c_links_admin’ and a particular hash value. The User::ValidateAdmin() method makes use of the IsAdmin() method to check the current user. If the user is not an administrator, then we trigger an error and redirect to a safe location in our application. This method can now be used anywhere in our application where this type of validation is necessary. The User::SetAdmin() essentially implements a password check. If the correct password is passed to this method, it will drop a cookie with the correct name and value to pass the IsAdmin() method checks. This method, coupled with the AdminLogin Action, allows us to have a “backdoor” entry into the system as an administrator using a url like:
16
FEATURES http://example.org/links/links.php?action=AdminLogin&pw =letMeIn. You could also code in a login page and use the posted password to pass to this method. Controller This application makes use of Phrame, and is therefore, in many respects, very similar to the example presented in the previous article. One main difference is the implementation of a “default action”. In my experience, I have found it to be the case that if no explicit action is specified, then the “default action” to show a Listing 2: User.php define('USER_LOCAL_SUBNET', '192.168.10.'); define('USER_LOCAL_HOST', '127.0.0.1'); define('USER_ADMIN_PASS', 'letMeIn'); define('USER_ADMIN_VAL', md5('links application administrator')); define('USER_ADMIN_COOKIE', 'c_links_admin'); class User { function IsAdmin() { $s_user_ip = $_SERVER['REMOTE_ADDR']; $b_admin = false; if ( USER_LOCAL_SUBNET == substr($s_user_ip, 0,strlen(USER_LOCAL_SUBNET)) || USER_LOCAL_HOST == substr($s_user_ip,0, strlen(USER_LOCAL_HOST)) || (array_key_exists(USER_ADMIN_COOKIE, $_COOKIE) && USER_ADMIN_VAL == $_COOKIE[USER_ADMIN_COOKIE]) ) { $b_admin = true; } return $b_admin; } function SetAdmin($psPassCheck) { if (USER_ADMIN_PASS == $psPassCheck) { //Set Cookie for 30 days SetCookie(USER_ADMIN_COOKIE , USER_ADMIN_VAL , time()+30*24*3600 , '' , $_SERVER['HTTP_HOST'] ); return true; } else { appl_error('Invalid Administrator Password'); return false; } } function ValidateAdmin($psMsg='You have requested an action reserved for application administrators. Access denied.') { if (!User::IsAdmin()) { appl_error($psMsg); header(ERROR_VIEW); exit; } } }
?> June 2003 · PHP Architect · www.phparch.com
Industrial Strength MVC view is implied. The revised bootstrap file (links.php) reflects this (Listing 3). Our revised bootstrap file is now down to four active lines of code. require_once ‘links_setup.php’; includes the libraries, establishes global variables and defines functions used in the application. The next ‘if’ statement implements the “default action” discussed above. If no action is currently defined, it is explicitly set to “ShowView”. Next we create our global controller. Finally, since we are now always processing an action, we always call the ActionController::Process() method. One thing I really liked about adding the ShowViewAction was the elimination of all the procedural code to determine which view to show. This action class is covered in more detail in the section of the article dealing with Views in Listing 4. Another important piece of code to review is this application’s extension of the MappingManager class (introduced in the previous article). This class is defined in the code bundle code/mvc/app/LinkMap.php file. The content of the LinkMap classes constructor function is shown above. This class uses the default options from the MappingManager class. We define three forms for the application. The links form is a pure instance of the ActionForm class, and is therefore similar to the form I showed you in the previous article’s example application. In this application, we have some more significant work to do in processing form data, and both the link editing and group editing pages have a specific extended ActionForm class devoted to them. The first mapping defined is for the default “ShowView”. No forwards are required because this action will terminate in the generation of HTML for the client anyway. The next mapping shows an example of an action with multiple forwards. The first, “index” has no forward path specified, so it will use the mapping default of APPL_BASE.’index’. The second, “edit”, specifies APPL_BASE.’groupedit’ as the forward path. Listing 3: Revised bootstrap file (Links.php). //application setup require_once 'links_setup.php'; //set default action if none specified if (!array_key_exists(_ACTION, $_REQUEST)) { $_REQUEST[_ACTION] = 'ShowView'; } //create Phrame controller $go_controller = new ActionController( $go_map->GetOptions()); //release control to controller for further //processing $go_controller->Process($go_map->GetMappings(), $_REQUEST);
17
FEATURES These are used in the action based on success or failure of the login action. On success, you would forward to “edit” and allow the administrator to edit the application, otherwise, you should just forward the user to the index page with an error message indicating the failed login attempt. Listing 5 is the actual code for the LoginAction::Perform() method that executes what I just described. The rest of the mappings defined are fairly typical of what I see in most applications developed with this methodology-Actions are associated with a specific view in the application, and they generally have just a single forward that returns the user to the view that originated the action. As a matter of style, I tend to group all of the mappings associated with a single view together, as shown for both the group editing and the link editing actions. The last subject to be covered in relation to the Controller is the customized form classes. The simplest way to make an editing page is to have it perform “record at a time”, i.e. you might go to the “editgroup” view and pass a parameter of group_id=1. The “editgroup” view would have all the fields you can modify on the record available as inputs in a form, and you would typically have a hidden input with the group_id of the record being edited. Under this style of application, the user would have to go back to a listing of groups and select another group to edit to make multiple changes.
Industrial Strength MVC
“Each of these tasks now has a location within your framework, and you can make a modification like this, which essentially amount to a new application requirement, without breaking any of the previously implemented functionality and requirements.” Listing 5: Login action. function &Perform(&$poActionMapping, &$poActionForm) { $s_password = $poActionForm->Get('pw'); if (User::SetAdmin($s_password)) { $o_action_forward =& $poActionMapping->Get('edit'); } else { $o_action_forward =& $poActionMapping->Get('index'); } return $o_action_forward; }
FEATURES To make things easier for the user, you can implement “table at a time” editing, which is what is shown with the GroupForm (code/app/GroupForm.php) and UpdGroupAction (code/app/UpdGroupAction.php) classes in Listing 6. Instead of a single hidden input for group_id, you will instead make a hidden input array that is populated with all of the group_id’s for the table as you iterate over them in the edit view. Instead of having an input like you will instead code the group_id into the name for all of the input fields:
Industrial Strength MVC name”>. All other inputs will be named similarly. You will want to create an easy way to iterate over these inputs in your action, and use a model class update method for each of the different groups posted. The Phrame controller will “load” your form class with the $_REQUEST array. It does this using the ActionForm::PutAll() method. This is the method overridden in the GroupForm class above, in which a Phrame ArrayList object is created and stored in the GroupFrom class. This ArrayList is created in the PutAll() method and a Phrame ListIterator is retrieved using the GetList() method. You can see this ListIterator being used in the ‘while’ statement in the UpdGroupAction::Perform() method. While the ListIterator still has values, we
Listing 6: “table at a time” editing classes. class GroupForm extends ActionForm { var $_moUpdList; function PutAll($paIn) { Parent::PutAll($paIn); $a_list = array(); $a_loop = $this->Get('groups'); if (is_array($a_loop)) { for ($i=&new ArrayIterator($a_loop); $i->IsValid(); $i->Next()) { $i_upd_key = (int)$i->GetCurrent(); $a_add = array( 'link_group_id' => $i_upd_key ,'group_name' => stripslashes($this->Get('group_name'.$i_upd_key)) ,'group_desc' => stripslashes($this->Get('group_desc'.$i_upd_key)) ); $a_list[] = $a_add; } } $this->_moUpdList =&new ArrayList($a_list); } function &GetList() { return $this->_moUpdList->ListIterator(); } } class UpdGroupAction extends Action { function &Perform(&$poActionMapping, &$poActionForm) { User::ValidateAdmin('You must be an administrator to Update Groups'); $o_group =& new Groups; $o_list = $poActionForm->GetList(); while ($o_list->HasNext()) { $a_vals = $o_list->Next(); $o_group->Update($a_vals); } if (!$o_group->IsChanged()) { appl_error('Please change a value before updating.'); } $o_action_forward =& $poActionMapping->Get('edit'); return $o_action_forward; } }
June 2003 · PHP Architect · www.phparch.com
19
FEATURES extract the next value as $a_vals and use this array of values as a parameter to the Groups::Update() method. It is important to note that we can not access this method statically, because we are tracking in a class variable whether any of these updates actually changed the database. This is checked in the statement if (!$o_group->IsChanged()) so we can warn the user if they are wasting our time submitting an update form with no changes! NOTE: You should also note that some of the security for this application is implemented in this action. The first statement in the Perform() method is User::ValidateAdmin(‘You must be an administrator to Update Groups’);. This statement will trigger an error message and redirect to a public view if the user is not an administrator. You can be confident that any code after this statement will only be used by the administrator of the application. Any actions you want similarly secured should contain a call to User::ValidateAdmin() as the first line of your Perform() method.
Industrial Strength MVC Views The View component of the MVC architecture is the area that has changed the most from the example presented in the previous article. I have done significant refactoring to several iterations of application, and what I am presenting here is what I have arrived at as a very workable solution to integrate Smarty into Phrame. In a nutshell, there is a ShowViewAction::Perform() method (shown in Listing 7) initiated for every page a user will view. This application implements the Factory Pattern to retrieve a specific subclass of a View base class. The same method creates a Smarty object, initializes the view with both Smarty and the Action’s Form object, checks security, and assigns global values for the application. The View::Render() method is then executed to load view specific values and generate output for the user. The view Factory Pattern, is implemented pretty much by the book (“Design Patterns” that is). What we want at runtime is a specific subclass of the View class. There is a ViewFactory class that you must extend in your application to make a concrete view factory. You need to override the
Listing 7: ShowViewAction::Perform() function &Perform(&$poActionMapping, &$poActionForm) global $gb_debug; $o_view_factory =& new LinksViewFactory; $o_smarty =& new Smarty; $o_smarty->autoload_filters = array(//'pre' => array('trim', 'stamp'), 'output' => array('trimwhitespace')); $s_view = strtolower($poActionForm->Get('view')); $o_view =& $o_view_factory->Build($s_view); $o_view->Init($o_smarty, $poActionForm); //security check switch (get_class($o_view)) { case 'indexview': case 'listview': $b_restricted = false; break; default: $b_restricted = true; } if ($b_restricted) { User::ValidateAdmin('You must be an administrator view this portion of the application'); } //any default assignments $o_smarty->Assign(array( 'view' => $s_view ,'view_link' => APPL_BASE ,'action_link' => APPL_ACTN ,'action' => _ACTION ,'admin' => User::IsAdmin() ,'debug' => ($gb_debug && User::IsAdmin()) ? true : false )); //render the template $o_view->Render(); exit; }
June 2003 · PHP Architect · www.phparch.com
20
FEATURES ViewFactory::_GetViewClass() method. This method takes a single argument - the requested view and must return a valid view subclass name. The easiest way to implement this is a case statement, with a default to your “index” or “main” view. The only other assumption made by the View Factory is that the subclass is defined in a file in the views subdirectory, with the class name and the php extension. The _GetViewClass() method for LinksViewFactory is shown in Listing 8. Listing 8 class LinksViewFactory extends ViewFactory { function _GetViewClass($psView) { switch(strtolower($psView)) { case 'list': $s_ret = 'ListView'; break; case 'groupedit': $s_ret = 'GroupEditView'; break; case 'linkedit': $s_ret = 'LinkEditView'; break; case 'index': default: $s_ret = 'IndexView'; } return $s_ret; } }
NOTE: Both the ViewFactory and View classes are only referenced from the ShowViewAction class, and are therefore not really a part of Phrame. I include them in the Phrame lib directory because they are abstract enough to use for multiple projects, and therefore are useful to have in the common library directory. How do the view subclasses work? The View::Init() method take the Smarty object and the ActionForm object, both by reference, and assigns them to class vars. This is important, especially in the case of Smarty, because assignments made to the Smarty object after initialization are still present in the $this->_moTpl var when used later in the View::Render() method. The Render() method calls a Prepare() method (where each subclass will assign view specific data), then handles errors, and displays the subclasses Smarty template. There are only two things to do for each subclass of View to make another view for your application: assign the template to the $_msTemplate var, and implement a Prepare() method. Listing 9 is a sample view class for the groupedit view. You might want to take a look at how the templates
June 2003 · PHP Architect · www.phparch.com
Industrial Strength MVC in this application are organized. Each view-specific template calls {include file=”header.tpl”} as the first statement and {include file=”footer.tpl”} as the final statement. These give the site the common “look and feel” with the header.tpl handling the site title and errors, and the footer.tpl handling the timestamp, navigation and some debugging code. This style of layout allows you to easily add common elements like site navigation. Remember that any common template variables can be assigned in the ShowViewAction::Process() method.
Debugging Phrame Applications It is worthwhile to note some of the debugging tools I have left in the code. I used these techniques in developing this example, and they might help you in developing your own Phrame based applications. The first habit I try to enforce is to code all my debugging routines in such a way that they will not take effect in production. This is done in case I forget to remove the debugging code when I migrate my source to the production location; I would not have to re-migrate. The second affect I try to achieve is to have reasonable looking output to work with (in some of my CSS2 absolute positioning layouts, a simple echo statement in the wrong location can get hidden behind other divisions). Listing 9 define('GROUPEDIT_VIEW_TEMPLATE', 'groupedit.tpl'); require_once 'models/Groups.php'; require_once 'models/Links.php'; class GroupEditView extends View { var $_msTemplate = GROUPEDIT_VIEW_TEMPLATE; function Prepare() { $a_groups = Groups::GetInfo(true); $a_links = array(); for($i=&new ArrayIterator($a_groups); $i->IsValid(); $i->Next()) { $a_group = $i->GetCurrent(); $a_links[] = Links::GetByGroup( $a_group['link_group_id']); } $this->_moTpl->Assign(array( 'title_extra' => 'Editing Groups' ,'group' => $a_groups ,'link' => $a_links ,'group_opt' => Groups::Options() ,'test' => var_export( Groups::GetInfo(true), true) )); $this->_mbPrepared = true; } }
21
FEATURES The easiest way I have found to achieve these results is to dynamically determine at runtime if we should be in debug mode. In the links_setup.php script, the global variable $gb_debug is can be set to (strpos($_SERVER[‘SCRIPT_FILENAME’], ‘public_html’)>0) ? true : false; to dynamically detect if the script is running from a user’s public web directory (a sign the script is in development in my environment). The same variable can be coded to false to simulate the production environment. All debugging outputs should be conditional on this boolean, i.e. if ($gb_debug) { var_dump($foo); }. Another very simple means of viewing the state of variables in your system is to trigger the appl_error() function by hand. If you want to see the state of a simple variable (number or string), you can write something like if ($gb_debug) appl_error(‘foo=’$foo);. This technique is useful because the message shows up in a conveniant location (the application error box) and the information can be captured in the processing of an Action::Perform() method and displayed after the forward to the appropriate view. Sometimes you may want to dump a larger variable, for example, one of the data arrays you retrieve from a model. These can sometimes be hard to look at in the error box, so an alternative is to assign the var_export($array, true); value to a template variable named test, and then in the footer.tpl, detect if we are in debugging mode and output <pre>{$test}. At this same point, I often enable the Smarty debugging console. I recommend reviewing this handy feature from the Smarty project documentation. One final debugging comment. The user defined error handling is very powerful, and absolutely required for this framework where error messages must be queued across multiple browser requests (as in any action -> forward sequence). While this mechanism is nice, it has one major problem, if you get a PHP fatal error, you will end up with a blank page rather than the default PHP error message (The PHP manual clearly states the custom error handlers will not handle fatal errors, but apparently it passes them anyway...?). To alleviate this problem, I added the potential for a constant named DISABLE_PHRAME_ERROR_HANDLER. Modifications were made to the Phrame ActionController class to detect if this constant is defined and not set to the boolean false. When this is the case, the normal application error handling will not be enabled and PHP fatal errors will be visible as normal. If you end up with this “blank page” phenomena, rather than doing “Zen Debugging”, define the above constant as a test, in case you have accidentally introduced a fatal error somewhere in your scripts. You should note that if this constant is defined, output is June 2003 · PHP Architect · www.phparch.com
Industrial Strength MVC always generated, thus disabling the application’s ability to process and then forward.
Future Directions Where can you go from here in modifying this application? Well first of all, the table list of links is pretty boring, perhaps you could edit the links.tpl file and generate a nicer looking layout (perhaps with some CSS positioning). You might want to extend the groups data model to include an image source for a more graphical flair to the list. In this case, you are altering something pretty fundamental to the application, so you would need make sure you hit all the blocks in the application stack where it is affected: alter the link_group table, add img_src to the add_group and upd_group plpgsql functions, add the column to the groups views so the PHP database user can query the data, the Groups Add and Update methods to handle processing of the new field, to the Add and Update actions to process and add to the groupedit.tpl forms so we pass the value. Lastly, add the img tag to the links.tpl file to display for the user. This might sound like you are altering a significant portion of the system, but remember that your code is now well organized into compact function oriented blocks: you need to store the data somewhere, you need to be able to securely access and modify the data, you need to be able to edit the data as an administrator and you need to retrieve and display the data for the user. Each of these tasks now has a location within your framework, and you can make a modification like this, which essentially amounts to a new application requirement, without breaking any of the previously implemented functionality and requirements. What else could be altered? You might want to create a “link popularity” feature, i.e. Measuring the number of times users have followed the links. How can this be accomplished? First of all, you can’t link directly to the sites, because you would have no way of knowing when the user clicks on a link. Instead, you would create a “ViewLink” action, that would bump your count for the link and then redirect the user to the link. You might add an admin mode that would check for broken links. You might add a “Submit a link” form, giving your end users the capability to add links. This feature might further require you to change the data model to include a “pending” status flag so the administrator could approve submitted URLs. If performance was a consideration, you might want to investigate Smarty’s caching capabilities. You would definitely want separate cache ids for regular and admin users. You might also consider writing unit tests for your code, especially your model classes and the Action::Perform() methods you have implemented.
22
FEATURES
Industrial Strength MVC
Summary What I have tried to present is the foundation for an enterprise strength PHP application architecture. Building on the strengths of the MVC design pattern by implementing Phrame, we have fortified this with a good database design, database abstraction in the PHP Model classes, and implemented Views using Smarty templates. Once familiar with this kind of application architecture, you can deploy effective web applications by writing rock-solid Model classes, Action::Process(), View::Prepare() and Smarty templates. Deploying MVC based PHP applications addresses many common functional requirements: robust, flexible, maintainable, secure. These two articles and the example code provided have been a whirlwind tour of PHP features, some covered in depth and others just mentioned or touched on briefly (or even assumed). Here is a selection of some of the PHP features, functions and concepts we have applied in this article and example: • the MVC design pattern • practicing separation of business logic, application flow and presentation logic • the Phrame PHP implementation of the Jakarta Struts MVC controller • Object Oriented programming in PHP • creating abstract base classes • using static methods of classes • using the PostgreSQL database • coding in plpgsql, a procedural SQL language • using a database abstraction layer (ADOdb) • practicing good security habits • using templates to separate presentation logic (Smarty) • writing custom Smarty variable modifiers
• using PHP’s session to store data • using cookies to store data • applying the Factory design pattern (ViewFactory) • array manipulation • HTTP redirection • using web standards (well formed xhtml, valid CSS) If you have the luxury of having people on your development team with SQL, PHP and HTML coding skills, I think you can see where the MVC design pattern will nicely break down into areas that suit each developers skill set. On the other hand, if you are solely responsible for an application from start to finish, perhaps following this example of coding the database, PHP and templates will allow you to adjust your own mental framework as you change hats during the development of the project. When developing your own applications, I hope the application stack diagram from this article, the MVC technology figure from “An Introduction to MVC Using PHP”, and the examples provided in these articles will give you the tools necessary to design and implement your own MVC web application. Happy Coding!
About The Author
?>
Jason has been an IT professional for over ten years. He is currently an application developer and intranet webmaster for a Fortune 100 company. He has written several tutorials and articles for the Zend website, and has recently contributed to the Wrox “PHP Graphics” handbook. He resides in Iowa with his wife and two children. Jason can be contacted at [email protected].
Click HERE To Discuss This Article http://www.phparch.com/discuss/viewforum.php?f=24
Connect with your database Publish your data fast with PHPLens PHPLens is the fastest rapid application tool you can find for publishing your databases and creating sophisticated web applications. Here’s what a satisfied customer, Ajit Dixit of Shreya Life Sciences Private Ltd has to say: I have written more than 650 programs and have almost covered 70% of MIS, Collaboration, Project Management, Workflow based system just in two months. This was only possible due to PHPLens. You can develop high quality programs at the speed of thinking with PHPLens
Visit phplens.com for more details. Free download. June 2003 · PHP Architect · www.phparch.com
23
FEATURES
FEATURES
Agile Software Development With PHPUnit By Michael Hüttermann
Are you a responsible project manager who feels depressed due to failed projects? Are you a developer frustrated with defective applications and project stress? Perhaps agile software processes are the cure you’ve been waiting for. Introduction It is unfortunate that so many software projects are not successful. There can be many different reasons for this and, of course, some circumstances cannot be prevented. By relying on the experiences of other people, however, many common problems in the software development process can be mitigated. Agile software processes are “best practices” that have been identified through experience. In this article I want to introduce the agile approach and its benefits for PHP developers. I focus primarily on patterns and examples from extreme programming, which is one of the most prevalent agile methods. With a little background, we’ll set out to discuss unit testing in detail. We’ll look at what unit testing is, what the advantages are, and how to implement unit testing in the PHP world. The problem Process models are used to manage software development. Without some sort of model, software development is chaotic. The bigger projects and project risk are, the more necessary a sound process model becomes. One of the most popular models is the “waterfall” model. In the waterfall model, developers step through each phase successively. Planning and analysis comes first, then implementation, and so on.
June 2003 · PHP Architect · www.phparch.com
Although this model has many derivatives and implementations, a pure waterfall model would theoretically forbid planning or design once the implementation phase has begun. This has led many developers to deem the waterfall approach cumbersome and inert for many projects. It is fundamentally inflexible. This makes late change requests and new features generally very difficult to integrate. The solution Generally speaking, agile approaches are lightweight process models that focus on the result and on the customer. They allow you to directly profit from the experience gained through years of successful and not-sosuccessful software development. One of the key deliverables of agile methods is that changes are always welcome, ensuring customer acceptance. There are a number of agile methodologies, including “Scrum”, “Crystal”, and “Extreme Programming”. Extreme Programming, or XP, was introduced by Kent Beck and has definitely received the most attention. XP is a process model focusing on small incremental releasREQUIREMENTS PHP: version 4.3+ PHPUnit: version 0.5+ Code Directory: agilemethods
24
FEATURES es, and iterative development. Over several iterative cycles more and more features are added to the product, but even the first iteration contains real functionality. Customers are able to run through mini acceptance tests, and can offer feedback very early on. This incremental release cycle prevents misunderstandings, and keeps projects on the right track. There are a number of best practices that XP promotes. Some of these include pair programming, simple design, continuous integration, and test-driven development. We’ll explain each of these briefly, and then delve into the last one in depth. Pair programming XP identified that information exchange between developers is very important. Pair programming is a very extreme way of achieving this exchange, but it has a number of advantages. One advantage, of course, is continuous knowledge transfer. This knowledge transfer means that other developers are able to fix and extend code in any module (also known as collective code ownership). Another advantage of pair programming is sanity checking. While one person is coding, the other is looking at and checking the code being produced. They discuss strategies, have fun, and are more productive than working alone. Simple design Another XP practice is to maintain simple designs. This means only implementing the features we currently want, and only in the easiest way. This way, we place strict focus on the functionality requested by our customers, and don’t lose ourselves in trying to anticipate complex future enhancements. Along with this, we should not try to reinvent the wheel. In the case of PHP web development, for example, it may be simpler and result in a better quality end product to use the Smarty template engine or existing PEAR packages, rather than trying to roll our own templating system. Continuous integration Let’s assume we are using the waterfall model. The coding begins and proceeds in a more or less uncoordinated manner while developers create their modules. Shortly before final code freeze they are asked: “Are you finished? Does your code work?”. “Sure,” they answer, “I implemented the template engine here, and there is the database abstraction. Also, the business logic is complete.” At that time all single modules are frozen and integrated, resulting in a big bang. The single modules may work, but the interaction between them doesn’t. And this may happen shortly before release! The solution for this is continuous integration. We freeze our code as often as possible, and integrate. Small releases and pieces are more manageable. The best case is that the result of each integration cycle is a June 2003 · PHP Architect · www.phparch.com
Agile Software Development With PHPUnit runnable version. The worst case is that bugs prevent the integration. At least we know about them now and can fix them, rather than finding out about them at the end of the cycle. Above all, we learn by integrating the product. It will not be a single event we are afraid of; it will be routine. We get a good feeling for our application, and no big surprises await us at the end of the project. Test-driven development Now that we’ve introduced some of the patterns used in XP, the remainder of this article will focus on arguably the most important pattern: test-driven development. As developers code their modules, they test (hopefully!). Usually, this becomes more debugging than real testing. Using PHP’s echo or die statements manually takes a lot of time and is really bug hunting, not testing. Sure, we may use DBG or the Zend Studio Debugger to lessen the burden, but again this is not really testing. Another problem with this “echo or die” type of manual testing is that we often have to add extra code to our module in order to test it. Thus, you change the module you want to test. An even worse case is that testing would be skipped completely. Now, integrating these non-tested modules results in that big bang I mentioned earlier. How can we prevent all of this? One approach is to apply the “decorator” pattern to protect our unit (module) code and encase it with the tests. “Decorator” is a design pattern discussed by Erich Gamma, et al in the landmark Design Patterns book. In the decorator pattern, an object (or unit) is extended with additional functionality. Instead of coding the new functionality inside the unit, though, we leave the unit unchanged and add a wrapper around it, which adds the new functionality. This approach has the advantage that the underlying unit is left unchanged, basic, and re-usable. Only the additional functionality is special for this use case. The decorator can also add further re-usable modules, such as debugging or logging. In our case, we’ll decorate our unit with the test functionality, and refer to this functionality as “unit testing”. Unit tests are informal functional (black box) tests normally executed by the developers of code. They are often quite low-level and test the behavior of special software components such as classes, modules, functions, and so on. We use unit tests while practicing test-driven development. Test-driven development means that we code our unit tests first. No unit code is written before its test. Units are as finely-grained as makes sense. We may write a unit test for a single method, for a whole module, or for any other kind of component. The smaller the component is, the better. Returning to the PHP templating system example, you might write a set of
25
FEATURES tests for the template engine. This “unit” would likely be much too functionally broad to properly test. A better unit granularity might be each page component, such as headers or footers. What advantages does test-driven development offer? The first benefit is that we must think about the module before starting to write its code. This ensures self-reflection about the unit, which is sure to improve quality. Another benefit of test-driven development is timely bug discovery. Developers very rarely deliver bug-free code. The later code defects are discovered, the more time it will take to find the bug responsible. Fixing bugs at late stages is often costly in terms of time and effort. If we test during or directly after development, the code is still fresh in the developer’s mind, and changes are easy to make. Test-driven development also allows us to spend less time testing. This may sound counter-intuitive, but the extra effort in the beginning pays out in the long-term. Unit tests are generally automated and repeatable, which is very different from the traditional “echo or die” approach. Repeating an automated unit test many times is comfortable and fast. Doing this manually would cause much stress, especially when we are under time constraints - and we are always under time constraints! Another big advantage of test-driven development is that we do not need to touch our unit code. Although the unit test is generally highly coupled to the unit, the actual code of the unit and the unit test are separated. We can feel secure about the fact that we are testing the actual unit, not a modified testing version. Test-driven development may sound uninteresting and boring, but developers can actually feel challenged to write sophisticated unit tests for the modules. This could be a satisfying and stimulating experience in itself. Thinking about the test also improves the module’s design. The coding of the unit and its test is an iterative process. This iteration happens because it is hard to anticipate the whole test environment from the beginning. The unit and its test code should not be treated separately. The test code is part of the package. If you are afraid of totally developing the test classes before writing the module, you may start by developing them in parallel. The units are integrated once all of the necessary unit tests are passed. No integration can start if one unit test is not passed. If this is enforced in the first and successive integrations, it will minimize the number of bugs found during integration. Bugs that appear during integration can be harder to track down because it may be difficult to determine where they originated. Whether you write your unit tests from scratch, or use a framework, test-driven development is a must. June 2003 · PHP Architect · www.phparch.com
Agile Software Development With PHPUnit Testing frameworks Once you are sold on the idea of making unit tests, you should consider adhering to a standard. This is especially true among groups of developers. In this case, the usage of a testing framework might make sense. But what is a “framework”? Let me (technically) define a framework as an object model which can be extended (normally by inheritance) to suit the custom application’s needs. A testing framework provides guidelines and best practices in order to write and run tests smoothly. Writing tests is easier because the general software infrastructure is already available. Testing frameworks have the following benefits: • consistency: within a framework, all unit tests generally work the same way. • maintainance: frameworks should be more or less bug-free and supported. • break-in time: frameworks usually enable new developers to get up to speed quickly. • automation: frameworks are usually able to run tests automatically. PHPUnit The framework we’ll use to demonstrate unit testing is called PHPUnit, and is part of the PEAR project. PHPUnit is an instance of XUnit, which is a general framework enabling module authors to write repeatable tests for their modules. Kent Beck, XUnit’s creator, defined a basic approach for unit testing, including four basic patterns. The first pattern is the creation of a common “test fixture”. A test fixture is a configuration for the test. It does the setup and teardown of any entities (variables, temporary databases, etc) needed to perform our test. This is like preparing a sandbox for our test to play in, then raking it over again when we’re finished. The second pattern is creating a “test case”. A test case stimulates a fixture in some predictable way. The third pattern, the “check”, tests for these predicted results. Our test cases are aggregated into the fourth pattern, which is the “test suite”. The test suite contains a set of test cases that are all run together. Let’s look more deeply at PHPUnit, and how you can use it in your applications. Installation PHPUnit’s source code and documentation can be found at the PEAR website (http://pear.php.net/package-info.php?pacid=38). Let’s assume an Apache and PHP configuration with PHP 4.3 or higher. By using PHP 4.3+ we benefit from the fact that PEAR is a stable part of the official PHP distribution, and we may also use the PEAR installer. To retrieve and install this package, we browse to the PEAR installer executable (called “pear”) in our file sys-
26
FEATURES tem (if it is not already part of our PATH). This could be, for example, under /usr/local/lib/bin beside the PHP executable. If we call the PEAR installer like so: pear install PHPUnit
we should fetch and install the PHPUnit package (see Listing 1).
Note: You may need root access on your machine to install PEAR packages. If you receive an error about PEAR_CONFIG_SYSCONFDIR, simply run the following PHP script:
The example In order to illustrate the features and use of PHPUnit let’s go over a simple example. We have to generate a complex ASCII file automatically, and we know the content it must have. A good test for the success of this operation might be to compare the generated file against a template file or string. This can easily be done using PHPUnit. In our simple case the file (and the template) consists of one short string Since we are practicing test-driven development, the first step is to write the test class. Our example test class is shown in Listing 2. What does the script do? First, it includes the PHPUnit PEAR package. Then it defines the test class CompareTest, which extends PHPUnit_TestCase. PHPUnit_TestCase is a fundamental PHPUnit class that provides us with testing functionality. It contains methods for running the test, for building the result object, and abstract methods for setting up and tearing down the fixture. In PHPUnit a “test case” is a box consisting of tests sharing the same fixture. So what does our subclass do? The constructor (CompareTest()) defines the new PHPUnit test case.
Agile Software Development With PHPUnit The setUp() method does some setup work for our fixture. All test methods should use our fixture. If a method does not use our fixture, it probably doesn’t belong in this class. Our fixture simply reads our template file into a member variable. Our next method is called after the test methods are executed. This method is called tearDown(), and simply unsets our fixture variable. Next, the test methods follow. Each method whose name begins with “test” is a test method. In our case we have two methods, testCompare() and Listing 2 1 PHPUnit_TestCase($name); 8 } 9 10 function setUp() { 11 $file = "template.txt"; 12 $handle = fopen ($file, "r"); 13 $template = fread($handle,filesize($file)); 14 $this->template = trim ($template); 15 fclose ($handle); 16 } 17 18 function tearDown() { 19 unset($this->template); 20 } 21 22 function testCompare() { 23 $this->generated = "1234-4321"; 24 $this->assertTrue($this->template == $this->generated, "generation not equal to template"); 25 } 26 27 function testConcatenate() { 28 $this->con_template = $this->template. "whatever"; 29 $this->con = "1234-4321-whatever"; 30 $this->assertEquals($this->con, $this->con_template); 31 } 32 33 } 34 ?>
testConcatenate(). The first one checks if the generated file is correct, and the second one verifies a concatenation. It is important to understand that all test methods should be independent of the others. This means that we can execute each test method separately from the others. Let’s look at testCompare() a little closer. In the interests of brevity, $this->generated simulates the output of our complex generation process. The method assertTrue() is an assertion provided by PHPUnit. It simply verifies that a given boolean condition delivers TRUE. Other assertions provided by PHPUnit are shown in Listing 3. assertFalse() checks a given boolean condition to be FALSE. assertNull() ensures a variable is NULL, and assertNotNull() does the opposite. assertEquals() checks that a variable is equal to another value. assertSame() ensures a variable is pointing to an expected object. assertRegExp() checks that the value of a given variable is matched by a regular expression. But what happens with our test now? We need another script to use the test class. Listing 4 shows how triggering our test might look. A PHPUnit_TestSuite object is created. The PHPUnit_TestSuite class is a container for grouping different test cases into one logical unit and providing access to the test results. Once the object is created, we add the test to it explicitly. Now we can run the test. While the tests are running, the results are put into a PHPUnit_TestResult object, which Listing 4 1 2 3 4 5 6 7 8 9
we display as a string. The PHPUnit_TestResult class contains the basic functionality to store test successes and test failures, and to provide access to these results. We run our test and receive TestCase comparetest->testCompare() passed
Great! We also want to check if the concatenation is equal to our template value. We need to add this second method to our test suite. Adding the second test manually could be done with $suite->addTest(new CompareTest(‘testConcatenate’));
Applying several different test methods through this manual approach could quickly become tedious. Let’s try it a different way. Take a look at Listing 5. If we create a PHPUnit_TestSuite instance while passing a test case class name, PHPUnit will automatically collect all test methods available and execute them. In our case the name of the test case class is CompareTest. We have also specifed HTML output rather than string, which makes the output more browser friendly. Now, when we call the script, we’ll see this output (displayed in the browser): TestCase comparetest->testcompare() passed TestCase comparetest->testconcatenate() failed: expected 1234-4321-whatever, actual 1234-4321whatever
Listing 5 1 2 3 4 5 6 7 8
toHtml(); ?>
Listing 3 Function
Description
assertTrue($condition, $message)
verifies if the boolean $condition is TRUE, otherwise throwing $message
verifies that the expected value is equal to the delivered value.
assertSame($expected, $delivered, $message)
verifies if $expected is pointing to $deliverd
assertRegExp($expected, $delivered, $message)
verifies if $delivered matches the regular expression $expected
June 2003 · PHP Architect · www.phparch.com
28
FEATURES Oops, what is that? We received an error message concerning our second test method. The two strings are not equal. If we look into the code (from Listing 2) we can see that we added an extra dash. The changed line now looks like this: $this->con = “1234-4321whatever”;
Okay, let’s try it again. We call the test suite testing both test methods, and we get: TestCase comparetest->testcompare() passed TestCase comparetest->testconcatenate() passed
Our test suite finished successfully! Further considerations If we decide to refactor our application in the future, we can easily perform a regression test. Refactoring and regression are also major patterns in the XP world. Refactoring describes changing (and/or extending) the design of existing code without changing its functionality. On the outside, the refactored module still has the same functionality, but inside the code may be completely reorganized and (hopefully) better designed. A regression test is a way to ensure that the old functionality of a module or application is still the same, and is still available without errors after a new version is released. Over time, we can build up a collection of unit tests which ensures the bug-free functionality of our software, regardless of the changes that are made to it. We saw that a group of test cases can be bundled into a test suite. This makes it very easy to set up a large suite of tests (tens, hundreds, thousands, or more), and enable a fast regression check. This can also allow us to examine interdependencies between units very quickly. If we change or extend a module and run a regression test, we may detect a bug in our software exposed by our changes. Perhaps we didn’t expect this new bug in a seemingly unrelated part of our application. This allows us to better understand how the software works, which is especially valuable when you didn’t write it.
Agile Software Development With PHPUnit Agile processes provide many patterns which may help improve our development process, but they do not claim to have invented something new. They simply identify best practices and try to maximize their potential. Many of these methods you might already know under another name or already do implicitly. Agile processes are really just a pool of best practices which can be used and combined where necessary and useful. You do not have to switch the total process to XP or Scrum, but upon detecting weak points you might do well to consider using some agile methods. Test-driven development is one of those agile methods that fits almost any situation and environment. If you do not want to go all out with PEAR PHPUnit or other alternatives, you should still consider developing some easy test cases yourself. My experience is that the use of these methods ensures the efficient development of (nearly) bug free applications. Who can resist that?
References http://www.agilealliance.org http://www.extremeprogramming.org http://www.controlchaos.com http://www.refactoring.com http://pear.php.net/package-info.php?pacid=38 http://sourceforge.net/projects/phpunit/ E. Gamma, et al., Design Pattern : Elements of Reusable Object-Oriented Software, Addison-Wesley, 1995 http://c2.com/cgi/wiki?TestingFramework http://www.xprogramming.com/testfram.htm
Conclusion Both traditional and agile approaches have their advantages. On the one hand, a waterfall model offers a more traditional and structured approach. On the other hand, agile approaches provide flexibility and a better “time-to-customer” value. In this article I’ve introduced some best practices of XP, and how PHP development may profit from them. Agile processes exist in every part of the software lifecycle. Therefore, the parts I’ve introduced are only a small sampling of the plethora of available agile tools.
June 2003 · PHP Architect · www.phparch.com
About The Author
?>
During the last years Michael designed and developed B2C web applications on different platforms. Now he is busy with enterprise application development. His email is [email protected].
Click HERE To Discuss This Article http://www.phparch.com/discuss/viewforum.php?f=25
29
FEATURES
FEATURES
Integrating a Java search engine API into your PHP site By Dave Palmer Tired of writing a new search engine every time you start a web project? Lucene, the open-source Java search engine API, and PHP may be the solution you've been waiting for.
I
gnoring the counsel of friends, family and loved ones, and turning a blind eye to the continuous threats of physical, mental and emotional violence by the publishers of this fine magazine, I am back to spread more joy and wisdom. I'm joking of course… I don't have any friends! Okay, all kidding aside I would like to start this article off with a proclamation: SELECT foo FROM bar WHERE foobar LIKE ‘%bla%’ is stupid and useless.
What am I talking about? I'm talking about rolling our own search engine (which, for me, is always an after thought - "oh yeah, we need a way to search this stuff"). How many times have you, in an attempt to just get a "search engine" implemented, hacked up some variant of the above SQL statement, knowing full well that it would never satisfy the requirements you have for a search feature. Writing search engine functionality is not easy, as you are dealing with a lot of ambiguities and also dealing almost exclusively with human (freeform) input. A lot of care and attention should be taken when developing search functionality, as the process of searching textual information is inexact and fraught with pitfalls such as:
June 2003 · PHP Architect · www.phparch.com
•Determining what is searched •How to search through large volumes of information •How to deal with search terms •How to optimize a search •How to rank results Just dealing with these core problems can (and does… take it from me, I've gone through this) balloon the scope of a search feature into an all-consuming project unto itself. Now, I don't know of too many professional developers who are big fans of re-inventing the wheel, so let me introduce you to Lucene (http://jakarta.apache.org/lucene/docs/index.html). Hello, Lucene Lucene is a member of the Apache Jakarta Project. Jakarta is an umbrella project that is host to the Apache Foundation's Java projects (such as Ant and Tomcat). REQUIREMENTS PHP Version: 4.2+ with Java Extensions & MySQL O/S: Any Additional Software: Lucene Search Engine API Code Directory: lucene
30
FEATURES Doug Cutting originally developed Lucene in his spare time during 1997 and 1998, and is now assisted by a whole team of volunteers. Lucene is an open-source indexing and search engine API. It's written in Java and boasts a rather rich set of features, including support for indexing of static files as well as database queries. With Lucene, one can index any type of static file, assuming that your indexing application is capable of parsing the file's content. I know you are all scratching your heads and thinking to yourself "hey, isn't this a magazine about PHP?". Of course it is, but there's nothing more powerful than a PHP application supported in the middle tier or backend by Java. With this article I will show you how you can implement Lucene with your PHP applications and provide your end-users with a feature rich search engine. I'll also show you how to do this all in a fraction of the time it would've taken to build something from scratch. The first thing to establish is that Lucene is NOT an "out-of-the-box" product (such as Verity). Lucene is an API. This means that the thing you download provides your "client" application with an interface into Lucene's internal workings. It is up to the developer to actually implement the two main components required in order to leverage Lucene's capabilities: the indexer and the searcher. The indexer is the component that is responsible for creating the index, or catalog of things that will be searched. When a search is performed, it's never performed against the actual documents or database objects, but against the index of those things needing to be searched. This not only vastly improves performance of the search, but it also decouples what is searched from the actual targets of a search. The indexing "engine" (as it is commonly known) contains a series of calls to Lucene's Document object where "references" to your data (whether it's a query row or a text file) are stored. A repository on the file system is created, representing the index. Lucene's Document object is best thought of as a container. It's an object that represents an "entity" in the Lucene universe. Every item that is indexed "resides" in a Document object. The searcher is the actual search engine or the "thing" that accepts user input (via a search phrase), and performs a search on a specified index. The search engine is the front-end component that parses the search phrase, performs the query, and returns the results back to the search interface. If you were to be brave (or foolish) enough to actually try to write the searcher and indexer from scratch, you would embark on a journey fraught with peril. The functions required to do this work are numerous, and an amazing amount of testing is required in order to deploy these components. Lucene fills this niche rather nicely . Yes, you should have some experience with Java, as Lucene's API is written in Java, but don't let that June 2003 · PHP Architect · www.phparch.com
LUCENE deter you from getting to know (and love) Lucene. Hopefully with the help of this article you'll just be able to take my hacking and make it your own! Before we go any further, let's review the environment and other requirements you'll need to satisfy in order to use Lucene on your server. You'll need PHP 4.2 or greater with Java support. This support can be either compiled in (using the --with-java=DIR directive when running 'configure' on Linux), or the php_java.dll enabled on Windows. In addition to enabling the Java PHP extension, you'll also have to make configuration changes in your php.ini file (configuring Java support under the Java directive). You will also, of course, need the Lucene JAR file (http://jakarta.apache.org/builds/jakarta-lucene/release/v1.2).
Lucene comes packaged as Java JAR file which you simply need to download, and store some place on your local file system. Once you have the JAR file saved, you need to add its location to the java.classpath setting in your php.ini file. Essentially what we will be developing is an indexing and search engine using Lucene by coding two Java classes: an Indexer class and a Searcher class. The goal here is to give you, the intrepid reader, the basic knowledge of how one may integrate Lucene into any application that requires a search engine. For the sake of this example I created a fictional "links" database where URL's are stored.. Please see Listing 1 for the SQL creation script. Listing 1 # Host: obione # Database: PHP_Articles # Table: 'links' # CREATE TABLE `links` ( `lid` int(11) NOT NULL auto_increment, `name` varchar(50) NOT NULL default '', `url` varchar(50) NOT NULL default '', `description` varchar(100) NOT NULL default '', PRIMARY KEY (`lid`) ) TYPE=MyISAM;
The indexer The first component we will implement will be the indexer. This is the piece that you will use to index the content you wish to make available to your search engine. Lucene's indexing API makes this rather simple, but you need to have a basic understanding of Lucene's methods and the context in which they are used. In our example, the IndexEngine (see Listing 2) is a Java class that runs on the command line. Use the included batch files (for Windows) or shell scripts (for *nix) to compile the classes, and run the Indexer on the command line. There are several things you need to consider prior
/** * @author Dave Palmer */ public class IndexEngine { public static void main(String[] args) throws Exception { System.out.println("Preparing to index links database..."); index(getConnection()); System.out.println("Index complete"); } private static void index(Connection conn) throws Exception { String sql = "select lid,name,url,description from links"; String indexPath = "/path/to/index/file"; Analyzer analyzer = new StandardAnalyzer(); IndexWriter writer = new IndexWriter(indexPath,analyzer,true); PreparedStatement pStmt = conn.prepareStatement(sql); System.out.println("Executing query..."); ResultSet rs = pStmt.executeQuery(); int count = 0; int interval = 250; long timeout = 50; System.out.println("Preparing to build index..."); while (rs.next()) { if (count == interval) { java.lang.Thread.sleep(timeout); count = 0; } else { count++; } System.out.println("Adding link: " + rs.getString("name")); Document d = new Document(); d.add(Field.Text("lid", rs.getString("lid"))); d.add(Field.Text("name",rs.getString("name"))); d.add(Field.Text("url",rs.getString("url"))); d.add(Field.Text("description", rs.getString("description"))); writer.addDocument(d); } writer.close(); } private static Connection getConnection() throws Exception { Class.forName("com.mysql.jdbc.Driver").newInstance(); String url = "jdbc:mysql://db_host/db_name"; String user = "db_username"; String pass = "db_password"; System.out.println("Preparing connection with URL: " + url); System.out.println("database user: " + user); return DriverManager.getConnection(url, user, pass); } }
June 2003 · PHP Architect · www.phparch.com
32
FEATURES creating an index. Things like what "fields" you want to include in your index, what "media" you are indexing (are you needing to index a database query, text files, PDF's, MS Word documents, etc.) and finally, how do you want each "field" to be indexed. Lucene provides several ways to index fields you specify. For example, you may decide to index a field, but not actually store the content with your index in order to control the size of your index. One of the goals of creating an index is to keep your index lean and mean and efficient. This means not overburdening it with too much data. It's really a balancing act and not one solution fits all. Determining what should be indexed and the fields you should be including in your index may be a bit of the old trialand-error. For the sake of simplicity, this example will index content living in a MySQL database. This will show you the power of Lucene, and how you can truly optimize the querying of your databases through Lucene. Let's take a brief look at how the indexer works. It first creates a JDBC connection to the database, and executes a simple query: SELECT lid, name, url, description FROM links
This says I want all links and all columns present in my result set. With this line: ResultSet rs = pStmt.executeQuery();
we create a new ResultSet object. In Java parlance a ResultSet is very much like an array of rows representing the successful execution of a database query. What we need to do is loop through each element of the ResultSet and create a new Lucene Document object. The Document object, if you remember, is the primary object that is used to represent a single item of content that will be included in the index we are creating. To this Document object I will add the columns in the ResultSet. It is important to note that Lucene affords the developer a lot of flexibility in how columns are indexed. For example, you may opt to use Lucene's "UnStored" method of indexing a column. This means that the column will be included in the index, but its actual content is not stored in the index. This means you can index large amounts of content without actually storing any of that content in your index. You'll notice these lines in the ResultSet loop: d.add(Field.Text("name", rs.getString("name"))); d.add(Field.UnStored("description", rs.getString("description")));
LUCENE lines here basically say we want to store the given column from the database in the index.. The "Field" interface in Lucene has several methods to designate how a column will be indexed. The method "Text" tells Lucene that the column is to be indexed and stored in the index. Now, let's say our "description" column is meant to hold a large amount of textual data. If this is the case, we might not want to store that data twice (once in our actual database where it lives, then a second time in our actual index). With Lucene we don't have to store the actual contents of our column with the index. We could have said this: d.add(Field.UnStored("description", rs.getString("description")));
The last thing we need to do in our ResultSet loop is actually add the Document object we created to what is called a "writer." The "writer" is the object in Lucene that generates the index on the file system. One of the stronger benefits of Lucene's indexing capabilities is that you do not need to destroy or "purge" the index in order to re-index (Verity, for example, requires this). Simply run the index engine again, and the index is updated. The Searcher Now that we have an index generated, it sure would be nice to be able to search this index! This is where PHP comes in, with Java's helping hand. If you know me, and I'm sure 100 percent of those reading this don't, you know that I like to decouple any application code from the web tier that is not specifically responsible for display. In keeping with this multi-tiered approach we'll construct a small Java class that will serve as our searcher object (the thing that queries the index). PHP will be used to display the search form, and display the results. Take a look at Listing 3. Once our PHP frontend hands over the keywords from our search interface, the Listing 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
This is the important bit in creating an index. Both Continued on page 34 June 2003 · PHP Architect · www.phparch.com
33
FEATURES
LUCENE
Listing 3: Continued from Page 33 17 /** 18 * @author Dave Palmer 19 */ 20 public class SearchEngine { 21 protected IndexSearcher searcher = null; 22 protected Query query = null; 23 protected Hits hits = null; 24 25 public SearchEngine() {} 26 27 public String search (String index, String matchType, String queryString) 28 throws Exception 29 { 30 try { 31 32 if (index == null || index.equals("")) 33 throw new Exception ("Index cannot be null or empty!"); 34 if (matchType == null || matchType.equals("")) 35 throw new Exception ("matchType cannot be null or empty!"); 36 if (queryString == null || queryString.equals("")) 37 throw new Exception ("query string cannot be null or empty!"); 38 39 searcher = new IndexSearcher(IndexReader.open(index)); 40 41 Analyzer analyzer = new StopAnalyzer(); 42 43 StringBuffer qStr = new StringBuffer(); 44 qStr.append("name:\"" + queryString.trim() + "\" "+matchType+" "); 45 qStr.append("url:\"" + queryString.trim() + "\" "+matchType+" "); 46 qStr.append("description:\"" + queryString.trim() + "\" "); 47 48 query = QueryParser.parse(qStr.toString(), "name", analyzer); 49 hits = searcher.search(query); 50 51 int count = hits.length(); 52 if (count == 0) { 53 return "<wddxPacket version='1.0'><string>No matches found for: "+queryString+""; 54 } else { 55 Hashtable results = new Hashtable(); 56 Hashtable metaData = new Hashtable(); 57 metaData.put("hits", new Integer(count).toString()); 58 metaData.put("query", queryString); 59 60 results.put("meta_data", metaData); 61 Vector rows = new Vector(); 62 for (int i = 0; i < count; i++) { 63 Document doc = hits.doc(i); 64 65 Hashtable row = new Hashtable(); 66 String score = ""; 67 score = new Float(hits.score(i)).toString(); 68 69 row.put("score", score); 70 row.put("lid", doc.get("lid")); 71 row.put("name", doc.get("name")); 72 row.put("url", doc.get("url")); 73 row.put("description",doc.get("description")); 74 rows.addElement(row); 75 } 76 results.put("rows", rows); 77 WddxSerializer ws = new WddxSerializer(); 78 java.io.StringWriter sw = new java.io.StringWriter(); 79 ws.serialize(results, sw); 80 return sw.toString(); 81 } 82 } 83 catch (Exception ex){ 84 throw new Exception ("SearchEngine.search >> exception: "+ex.toString()); 85 } 86 } 87 }
June 2003 · PHP Architect · www.phparch.com
34
FEATURES first thing we do is build our search string. Lucene's syntax is a bit on the complex side, but with that complexity we are provided with a query parser that is very powerful. For the sake of this example we'll just use something simple and understandable. You'll notice these lines in our Searcher class: StringBuffer qStr = new StringBuffer(); qStr.append("name:\"" + queryString.trim() + "\" " + matchType+" "); qStr.append("lurl:\"" + queryString.trim() + "\" " + matchType+" "); qStr.append("description:\"" + queryString.trim() + "\" ");
Here we build the query string. As part of good Java programming, we never concatenate strings, and instead, build a StringBuffer object. The idea here is to specify what columns we want to search, coupled with our query term (keywords) and the type of match (AND exact, OR loose). Yes, there are lots of more complex ways to build a query string, but this is a good way to get your feet wet. The basic syntax is as follows: [index column name]:"[search phrase]" [match type]
We can append as many columns to this query as we need in order to broaden our search. Because we are dealing with human readable strings, and computers obviously aren't human, we need to be able to translate our human-readable search string into something a computer can deal with. One such method is to "tokenize" a string. Tokens are individual elements of a string such as a word, a space, or a character, etc. In the land of Lucene, we use things called "Analyzers" to tokenize our search string. There are several prepackaged analyzers that come with Lucene. These analyzers can be used to tokenize the query string in different ways in order to satisfy different types of searches. For the sake of simplicity I use the StopAnalyzer. The StopAnalyzer is useful for filtering out "stop" words (words typically not very useful for searching). The StopAnalyzer also implements the LetterTokenizer, which tokenizes words on non-letter characters as well as normalizing text to lower-case. So, back in Listing 3, we have our Analyzer that breaks up our strings into computer-readable fragments, which is then fed into our Query Parser along with our actual query string. Once we have created our query parser object, we can execute our search using this line: hits = searcher.search(query);
The "Searcher" object has a method called "search" which accepts a query parser object as a parameter and
June 2003 · PHP Architect · www.phparch.com
LUCENE returns a "Hits" object. The "Hits" object contains the records found for this query. We can use the "length" method on our Hits object to decide if we should proceed on to the next step: building a result object to give back to our PHP client.
“... there's nothing more powerful than a PHP application supported in the middle tier or backend by Java.” Assuming we have results to work with, we'll just go right to the interesting bit. Because PHP and Java can't really share complex data types, we need to use WDDX serialization. WDDX is a universal XML markup sublanguage that enables disparate programming languages to share complex data structures over different platforms - find out more at http://www.openwddx.org. The result object that we'll pass back to PHP contains rows (Vectors) of associative arrays (Hashtables). Each Vector holds a search result, with the result's fields in the associated Hashtable. We also create a "meta data" Hashtable that will contain our hit count and the query we were given. Here's an illustration of what this object may look like: Search Results Hashtable Meta-data : Hashtable Query: foo bar Hits: 3 Rows : Vector Row 1 : Hashtable URL: http://foobar.com lid: 100 name: foo bar description: this is foo and bar Score: 0.990000 Row 2 : Hashtable URL: http://bla.com lid: 101 name: Bla dot com description: this is bla Score: 0.980000 Row 3 : Hashtable URL: http://more.foo.com lid: 102 name: More Foo description: this is a lot of foo Score: 0.850000
35
FEATURES In order to build our result object, we loop through our Hits object. The Hits object is indexed so we can just pull out a Lucene Document object by using our loop index. Once we have a Lucene Document, we can then pull out the interesting bits for our result object. As we said before, Lucene uses its Document object to represent content, whether its content being indexed, or content being returned from a search. Lucene's Document object, for the sake of oversimplifying it, is really like an associative array. It contains "keys" with
LUCENE "values." The keys represent the columns you included in your index, and the value represents that actual content. So, in our index, we created a column in our index called "name". In our Document object we would have a key name called "name" and its value would be the name of our link. Once our object is complete, we serialize it. Serializing simply means to convert a data structure into something that can be transported from one platform to the next. The end result of this serialization will
7 8 This is a sample front-end for a Lucene search engine implementation. 9 This PHP front-end instantiates a Java object, then executes the 10 search() method and returns a WDDX packet which can 11 then be deserialized by PHP and display a search results page. 12 13 14 search($index,"OR",$query); 20 21 $rs = wddx_deserialize($result); 22 $meta = $rs["meta_data"]; 23 $rows = $rs["rows"]; 24 25 ?> 26 Your query of: 27 28 $name $desc "; 38 } 39 ?> 40 44 45 51 52 55 56 57 58
June 2003 · PHP Architect · www.phparch.com
36
FEATURES
LUCENE
be a WDDX "packet". A WDDX packet is simply an XML representation of our data structure (an object in this case). This way, we simply pass a string back to PHP. Displaying the results Now that our PHP script has a WDDX packet containing our search results, we can work on their display. If you look at Listing 4, the code is pretty straight forward. We create a new Java object so that we can actually execute the search. The Java object we create is an instance of our SearchEngine class (Listing 3). We then call our "search" method passing in three parameters: index, match type, and search phrase. The "index" parameter contains the full path to where we store the index we created earlier using the IndexEngine, and enables us to pass in the index we want searched. This makes it handy if we keep several indices. The "match type" contains either "AND" or "OR". Finally, the "search phrase" parameter contains the search phrase entered by the user . You'll notice that when we call our search method:
get a percentage. Not a bad way to rank the relevance of each row. Wrapping up Well, not too bad, eh? Lucene enables us lazy developers to implement a very powerful search engine and relegate the "SELECT * FROM foo WHERE bar LIKE '%foobar%'" statement into soon-to-be-forgotten history. Lucene's clean API, powerful query parsing, and flexibility means that creating a search engine no longer needs to be a project by itself and can be easily integrated into just about any application. This article has really only scratched the surface of Lucene's capabilities, but perhaps that's one of Lucene's strengths. You don't need to know every feature, every nook and cranny in order to use it in a real application. I've used other products like Verity, and have become fed up with the limited functionality. With Lucene there are no limitations and no compromises made. You index your data the way you want it indexed, and you display the results the way you need them displayed. No more constraints and no more ridiculous licensing entanglements.
“Because PHP and Java can't really share complex data types, we need to use WDDX serialization.”
$result = $obj->search($index,"OR",$query);
we are passed back a WDDX packet. Before we can use our search results, we need to deserialize the WDDX packet into a native PHP data structure, like so:
Now that we have our native PHP structure, we can display our search results to our user! This is just a matter of looping over the rows, and displaying each one.
You'll also notice we deal with the "score" column. Lucene's relevance scoring is presented as a floating point number that can be multiplied by 100 in order to
June 2003 · PHP Architect · www.phparch.com
About The Author
?>
Dave is a professional geek specializing in java/j2ee, php (naturally), and perl development which is just a cover for his real passion for spending large sums of money on home recording and musical equipment and generally making a nuisance of himself. it should also be noted that his /. karma is currently "positive" which will surely fall.
Click HERE To Discuss This Article http://www.phparch.com/discuss/viewforum.php?f=26
37
REVIEWS
REVIEW
SourceGuardian Pro By Peter James
Quick Facts Price: SourceGuardian – Obfuscation for US$150 SourceGuardian Pro – Obfuscation and portable encryption for US$250 Trial Edition: Free time-limited demos – not quite fully functional http://sourceguardian.com/downloads/i ndex.php Description: GUI and command-line runs on windows, but the code runs on windows, linux, freebsd Homepage: http://www.sourceguardian.com
June 2003 · PHP Architect · www.phparch.com
A
lthough there are a handful of products on and off the market for PHP source protection, I found SourceGuardian particularly interesting. SourceGuardian is one part obfuscation, and one part encryption. There are two things that separate it from the pack. One I’ve already mentioned: obfuscation. The second is portable encryption. In other words, the encryption does not require any changes to your PHP installation. This is unheard of. Obfuscation is the art of creating distracting noise. Obfuscating code is not about encrypting it, or making it impossible to reverse engineer. It’s about making it very difficult to understand what the code is doing. With equal amounts determination and coffee, any obfuscation can be at least partially reversed. If it couldn’t, it would be encrypted. Obfuscation generally involves function and variable renaming. These names are usually almost indiscernible from each other. Another technique used to obfuscate code is to remove all whitespace. This is very weak at best, but when used in combination with lots of other weirdness, can assist in the effort. Yet another technique is to add red herring code to the application. This code means nothing and does nothing, but the would-be code cracker doesn’t know that.
38
REVIEWS Encryption is the science of hiding information. Encryption, as it applies to PHP source code protection, is not a new technology. Generally, PHP source code protection requires an extension to be installed on the server. This extension is used to help the Zend Engine read the encrypted code, since it no longer resembles anything resembling PHP. This encryption (or “compiling”) technology has recently been exploited to give PHP developers the ability to offer trial versions, or time-limited software, as well as the ability to otherwise control the use of their software. Let’s take a look at how SourceGuardian stacks up. Installation The GUI only runs on Microsoft Windows, so that’s what I used. Installation was painless. Just download the zipfile, unpack it, and run the installer. Product registration required an Internet connection, which is unusual, but was very quick. In order to use the encryption part of this product, you must have compiled PHP with dynamic library support (which is the default), or have access to the php.ini file and the extension directory. All existing encryption products require the latter, so this is not surprising. Using the application The GUI application part of SourceGuardian is pretty
SourceGuardian Pro decent. It appears to be attempting to be a wizard, while still offering flexibility in movement through a tabbed window. It’s mostly successful in both. On the first tab, the file selection tab, we meet a friendly Windows Explorer-type file navigation interface. Here you select the files and directories that you wish to affect. This window (actually the whole application) makes use of the “select the entries on this side of the window, then click on the arrow to transfer them to the other side” paradigm. I have mixed feelings on this, because I often find that this works well in less places than it’s used. Here it works well, though. One of the biggest things that bugged me about this application appears on this page. Unfortunately, and I really do mean “unfortunately”, it would be pretty easy to miss. Because you are obfuscating and encrypting source code files, you are permanently and irreversibly changing them. The application offers two options, kept in a small radio button group down a the bottom of this first window, as to how you’d like to handle the original source files. One option (the recommended one) is to move the originals to a backup directory, thus saving them for later. The other option (the default one) offers to overwrite the original files for you. You can surely understand the implications of this second option. Granted, most good developers keep their source code in version control systems such as RCS or
Figure 1
June 2003 · PHP Architect · www.phparch.com
39
REVIEWS CVS, but this is no justification for this very poor design decision. Obfuscation The obfuscation tab offers a few options, including the removal of whitespace, name substitution, and the use of MD5 hashes for name creation (longer and uglier names). After changing options, the files must be analyzed. This allows you to see the variables and functions that SourceGuardian is planning on changing, as well as what they are being changed to (if anything). This view of the variables and functions lets you “fix” any mistaken identities. As I tried out SourceGuardian on a sample point-of-sale web application, I found that my client-side form validation was broken. This was because SourceGuardian thought that my JavaScirpt function was a PHP function, and changed the name for me. By changing this function’s name to be reserved, I forced SourceGuardian to not try to obfuscate this function, and the validation returned to my application. Encryption The encryption tab allows you to specify whether the file should be encrypted or not, as well as whether to place any other controls on its use. SourceGuardian offers the ability to place timeout restrictions on your Figure 2
June 2003 · PHP Architect · www.phparch.com
SourceGuardian Pro PHP applications, as well as the ability to bind the application to a single IP address. Only allowing one IP address seems a little restrictive. You may be selling a PHP application to an enterprise who wants to run it on a load-balanced system. This one-IP restriction is too restrictive. This tab also shows you what systems your encrypted files can run on. Currently supported systems are Windows/PHP4.2.1 – 4.3.1, Linux/PHP4.0.4 – 4.3.1, and FreeBSD/PHP4.0.6 – 4.3.1. Nice to see the FreeBSD support in there – many commercial products ignore it. Making it so The final tab offers the opportunity to verify what you’re about to do, and to do it. It is nice that when you click “GO” on this last page, a modal dialog pops up to get one last confirmation. Of course, once this pops up a few times, you ignore it anyway. How did it work? I already mentioned that I tested this out with a small point-of-sale system. This system contained a handful of PHP scripts, and was based on frames. It used sessions and a PostgreSQL backend, and with a hint of JavaScript on the frontend. I first tried to obfuscate it with all of the options. My first attempts were not successful. I had to make a handful of variables reserved (including $_SERVER, $_SESSION, $_REQUEST, etc). This fix allowed me to get past the first page of my application. After some more investigative work, I realized that it wasn’t a good idea to try and obfuscate plain HTML pages (such as my frameset page). This was causing my frames to act strangely. Next I had the problem I mentioned above with the JavaScript form validation. At each step, I had to make more variables reserved. Although at the end I only ended up with about 10 variables reserved out of about 75, this was still 10 variables whose names were not changed. I then tried encryption. The encryption option worked painlessly. I didn’t need to go in and edit the php.ini file, or add another extension to the extensions directory. In fact, no modifications at all were necessary to my PHP installa-
40
REVIEWS tion. I simply placed the appropriate .pxp file in a directory alongside my application, and the encrypted application worked perfectly. My simple attempts to get database passwords out of the encrypted file failed, which is nice. The benchmark I decided that there might be a price to pay for some of this obfuscation and portability, so I did a mini-benchmark. I wrote a simple factorial function, and did 1000 10000! calculations. My benchmark code is shown in Listing 1. The results are shown in Figure 2. As you can see, and might have expected, there is a small performance hit. This tiny penalty should be negligible for most applications. What I liked I liked a few things about this product. I really like the portable encryption. That feature alone makes this product worth its weight. The ability to also specify time-limited trials and bind applications to an IP address are also very attractive for those seeking code protection and licensing tools. I like the presence of a command-line tool. This makes the production of trial versions possible on-thefly. It also means that you can automate the encryption of your sensitive files as you push them out to the proListing 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
print "Total time of execution: {$total_time} \n"; 24 print "Average time per iteration: {$average_time_per} \n"; 25 26 function factorial($n) 27 { 28 if ($n == 0) 29 { 30 return 1; 31 } 32 return ($n * factorial($n-1)); 33 } 34 35 ?>
June 2003 · PHP Architect · www.phparch.com
SourceGuardian Pro duction environment. Although I had a marginal experience with the code obfuscation, I was able to get my code working. Given that most discussions of open-source PHP obfuscation products I’ve seen talk about the need to tweak and play around, I think I managed pretty well. What I didn’t like I didn’t like that I had to fight to get my code to work in the obfuscater. This makes me wonder if I really caught everything, and what obscure bugs were left in the application. If there was ever a reason to build a test suite... I didn’t like that I had to fight on my own to get my code to work in the obfuscater. There could have been more hints as to what might cause problems, and where possible solutions might be. I didn’t like that the obfuscation went so far as to break into my JavaScript, and mess it up. That was frustrating, and should be easily avoidable. I didn’t like the defaulted option for overwriting your code. In my opinion, this is one of the most dangerous and negligent things I’ve ever seen. I can just see the look on my face now, as I suddenly realize what I just did. I didn’t like how tedious it was to reset my code every time I checked it. Because of the problems I had, I had to run the obfuscation a number of times. This meant that each time I had to copy the code back into the directory, overwriting the obfuscated files. This could surely be automated for you.
In conclusion I would buy this product on the basis of the encryption and licensing capabilities, but I think that the obfuscater needs more work to be a contender. This product gets a 3 out of 5.
php|a
41
FEATURES
FEATURES
Tailoring W@P sites with WURFL By Andrea Trasatti
Introduction It was 1999 when I first heard of WAP (wireless application protocol). Nokia announced that year they were going to release the 7110 handset around June, and that it would be able to surf the Internet using WAP. Like many other people, I started thinking about browsing the web as I walked down the street, just as if I was sitting at my desk. I started thinking of the many useful services that I might like to access. I began reading the WAP specifications in early October, and my first application was ready around the end of the month. This was just in time to test it out on my new Nokia 7110. Sadly, it was a disappointment. The documentation I had been reading described many nice tags that could be used in WAP, but the device in my hands didn’t support most of them. Only the basic tags had been implemented. The device didn’t meet any of my expectations and, most of all, it’s output bore little resemblence to the emulator I had been using until then. This was not web browsing. It was a tiny screen with only a few words per line. It displayed images in black and white, didn’t support pop-ups, and it was not a multiple window system. I quickly realized that it wasn’t the web as I knew it.
June 2003 · PHP Architect · www.phparch.com
This experience taught me a few things. Depending on the screen size, each device would support a different image size, number of lines of text, and number of characters per line. Also, because the screen is small, the keyboard uncomfortable, and the air time costly, WAP sites needed to be fast and easy to browse. Basically, it became apparent that WAP sites needed to be tailored to each mobile device accessing them. The problem I started thinking of all the data that I would need to build pages dynamically, while still tailoring them to the device. I started taking note of things like the user agent, the screen size, and the maximum deck size (a WML page is composed of one or more cards, making up a deck – only one card at a time is shown onscreen). My list of properties became unmaintainable when REQUIREMENTS PHP Version: PHP 4.0.6+ and XML extension (register_globals must be turned on)
O/S: Any Additional Software: N/A Code Direcotory: wurfl
42
FEATURES my first project - a free site that let people read email, chat, and build their own WAP pages - went online. More and more devices started visiting my pages. Even though the Nokia 7110 was the standard, I wanted to support as many devices as I could, so as to offer the best possible service. I was very aware of the problems users would face if they were using unsupported devices, and so I tried to do my best. The first time I worked on a service for a customer I had my “big list” of devices ready. The customer, a big carrier, had its own list of devices that it wanted to support. They didn’t care about many of the devices that I had collected, and I obviously had to adapt it to their needs. With all those new devices it simply wasn’t possible to continue keeping it updated anymore, especially considering that more WAP devices kept coming out.
Tailoring W@P sites with WURFL the referred device, which helps to keep the file’s size manageable as new devices are added. The “fall_back” attribute is a must for all devices except the generic device and, of course, it must refer to an existing device ID. Inside the device tag we have a tag called “group” that is mainly for human readability. The “group” tag is used to group a list of capabilities that are related in some way. The “group” tag has an attribute called “name” that defines the name of the group. Inside a group we can have as many “capability” tags as needed. “capability” tags have two attributes: “name” and “value”. The “name” (we cannot have two capabilities with the same name) attribute is a string that defines the name of the capability. The “value” attribute can be a string, a boolean, or an integer depending on the kind of capability. Once we had the basic set of capabilities that we wanted to list, we created a root device called “generic”. The generic device has the absolute minimum default values for all the capabilities, and is at the end of all the fall-back chains. This ensures a minimum set of capabilities for all devices. You might be a little bit confused now, but I’m sure that an example will clear everything up. Taking a look at Listing 1, we see the capabilities for a Siemens SL45i. The device tag for the SL45i refers to (has a “fall_back” attribute pointing to) an “uptext_generic” device. The “uptext_generic” device refers to the “generic” device. This means that the SL45i has all of the capabilities listed in the “generic”, “uptext_generic”, and the SL45i devices. Any capabili-
The solution Since the time I started working with WAP devices, I read a mailing list called wmlprogramming (http://groups.yahoo.com/group/wmlprogramming). Last year I read about a new project being started. It was called WURFL (Wireless Universal Resource File). The idea was to identify a set of “capabilities” that could be used to describe any device. This would allow the developer to tailor WAP pages to the device. I read a few messages and decided to join the team. Since WURFL had to be accessible on as many platforms as possible, and be usable by as many programming languages as possible, XML was the logical solution. We started by listing the screen resolution, the lines-on-screen, the deck size, and other basic inforListing 1 mation. id="generic" user_agent=""> Let’s take a look at some <device fall_back="root" of the tags we used. The first tag we needed, was a tag to identify a device. The logical name for this tag was, as you might have guessed, “device”. The <device user_agent="UP.Browser/4" fall_back="generic" id="uptext_generic"> “device” tag has 3 attrib utes. The “user_agent” attribute defines the consis tent part of the user agent, needed to identify as pre cisely as possible a device hitting your site. The “id” attribute uniquely identifies a device in WURFL. The “fall_back” attribute defines <device user_agent="SIE-SLIK/3.1 UP/4" fall_back="uptext_generic" id="sie_sl45i_ver1"> the ID of the device to inherit from. This means that any capability that is not listed on the current device will be taken from June 2003 · PHP Architect · www.phparch.com
43
FEATURES
Tailoring W@P sites with WURFL
ty in a lower level (like “generic”) is overridden by similar attributes in a higher level (like “uptext_generic”). All of this works very well because most of the manufacturers don’t make up new devices every day. A lot of the time, new devices are just updates. For instance, maybe the browser software is the same, and only the screen resolution has been modified. We can simply define the new screen size and an appropriate “fall_back” ID, and the device will inherit all the information from the fall-back device. This lets us keep the file small. By small I mean that we now have definitions for about 1700 user agents, with more than a hundred capabilities, and the file is about 300KB. Who guarantees the integrity of the data? The community. Since this is an open-source project, anyone who would like to contribute to the project is very welcome. If you find anything that is not correct, you should point it out. We will look into why it was specified that way, and if it really was just a mistake. The generic device guarantees that any device has at least the basic capabilities, but this doesn’t mean that they’re the correct ones for a device. Also, don’t forget that it’s an XML file, and so you can apply any modification you like for your project.
array structure with the device name and the attribute or tag I am looking for as keys. The “group” and “capability” cases simply extend the array we started in the “device” case. Notice that I use two variables to always know the device and group name that we’re in: $curr_device and $curr_group. Once I’ve parsed WURFL, I cache it for performance reasons (I have to admit that in PHP 4.1 and above XML parsing have gotten MUCH better). It is only re-parsed if the timestamp on the WURFL file changes. Once I was done with my little parser, I posted the sources on the mailing list and waited for first impressions. As it turned out, other developers were working on tools for Perl and Java. I took this a little like a challenge. I wanted to show them that I could write efficient PHP tools that would do the same. My first target was building a class to manage the data extracted from WURFL (shown in Listing 3) First I needed to identify the device’s ID from just the user agent (the only information I know about the device when it first hits my site). I wrote the _GetDeviceCapabilitiesFromAgent() method. At first, I simply checked the known user agents in WURFL, but I soon realized that I also needed to find user agents not listed in WURFL (it’s not possible to keep them 100% updated), so my function worked by Working with WURFL The cool thing about WURFL is that it’s made by devel- cutting off the last character of the user agent until it opers. These are people that work with WAP and finds one that fits well. To speed up this cycle I use an mobile devices every day. The concept was born on a array that lists each device’s ID and the associated user mailing list that has been up since the beginning of agent. Once we’ve got the device ID, we need to WAP, and people who worked with WAP for get all of its capabilities. This involves readyears have contributed to the project. As ing the current device’s capabilities, and soon as I heard about WURFL, I realized NOTE: Due to reading the capabilities of all parent devices that it was just what I needed. I added as the length of the and merging them, making sure that capamuch information as I could to all the code listings for this bilities are overridden correctly. This is done devices that I found, from any specification article, they are not using the _GetFullCapabilities() document that I had. listed in the magamethod. The _GetFullCapabilities() WURFL has been a good opportunity for zine. They can be method uses another method, me to start working with the functions that found in this _GetDeviceCapabilitiesFromId(), PHP offers to manipulate XML files. The month’s package. which reads a device’s capabilities from its parser for WURFL (shown in Listing 2) was ID. This works well because I have the the first XML parser that I ever wrote. I device’s ID, and in the “fall_back” attribute I began by reading php.net have the fall-back device’s ID, and so on. I (http://www.php.net) and zend.com (http://www.zend.com), took some examples, and put get all the capabilities (in reverse order, from the curthem to work. Please don’t consider this a general pur- rent device up to the “generic” device) and then merge them with the PHP function array_merge(). pose XML parser, as it is only an ad hoc library. After a little experimenting and benchmarking, I The core of the parser is the startElement() function, a handler called by the parser whenever it finds an identified a bottleneck in the device ID search. I wrote element’s start tag. In this function I identify all the tags two methods to cache the best fitting device ID with user agent that hits my site: in a switch statement, and act differently depending on any and the kind of data that I want to store. As I said before, _WriteFastAgentToId() the tags needed to define a device and its capabilities _ReadFastAgentToId(). This means that we don’t are “device”, “group”, and “capability”. In the have to go through the whole process of trying to startElement() function I basically store the rest of match a user agent to a device ID every time. Another the tags as they are, while I pay a little more attention step might be to limit the growth of the cache file, curto these three main tags. In the “device” case, I start an rently not implemented. The only public method other June 2003 · PHP Architect · www.phparch.com
44
FEATURES than the constructor is getDeviceCapability(), which expects a capability name as a parameter and will return the capability value for the current device. While getDeviceCapability() is useful when you want to check a single capability, there are some bits of information that are always needed. These are set as actual properties of the object. $browser_is_wap is a boolean and tells you if the current device is a WAP device or not. $GUI is a boolean and refers to OpenWave’s GUI extensions (these look nicer than the basic implementation). $id is the current device’s ID. $user_agent is the current user agent. Finally, $wurfl_agent is the best fitting user agent that was found in WURFL. Before implementing the getDeviceCapability() method, I was using the $capabilities property extensively, which is the associative array of all the capabilities of the device. While these may not be enterprise-level libraries, the parser and the data access class do make a full set of tools to access WURFL and to manage the extracted data. A little after writing the class I finally had the chance to use WURFL in a real project. The first project that used WURFL was a ringtone download service. You would login with your mobile device and, depending on the capabilities of your device, you could download a ringtone with a standard HTTP GET, Download Fun, or a Nokia Smart Message. You could also send a ringtone to a friend. You just pick his device, and he will
Tailoring W@P sites with WURFL either receive a WAP Push message (if his device supports it) or a plain old SMS message with the URL of the download. Even if this first project didn’t unleash all the power of WURFL, I was pretty satisfied with the results. The first real use also gave me a chance to find a couple of bottlenecks (the aforementioned searching for the proper device ID from the user agent, for example) and one or two small bugs. The PHP libraries are still in development, I’ve recently released a new patch, and I have a couple of enhancements in mind that I hope to be able to implement soon. What if WURFL doesn’t meet all of your needs? No problem. It’s plain XML, and you can extend it very easily. The mailing list is also working on a standard “patch file” to extend or modify WURFL. This means that you will be able to install the WURFL file from the official site (or CVS) and easily apply the patches you need using a patch file that you will keep and maintain on your own. This feature is not yet implemented in the PHP libraries. The perfect use for WURFL To unleash the true potential of WURFL, the best use would be a portal. Having the chance to build an entire wireless portal based on WURFL would offer the possibility of building WAP pages perfectly tailored for any devices visiting your site. WURFL has lots of information about a large number of devices and, from my experi-
Figure 1
June 2003 · PHP Architect · www.phparch.com
45
FEATURES ence, nobody supports that many devices all at once. To try to demonstrate this I prepared a demo (see Listing 4) that shows how easy it would be to write a page with a set of links depending on the device visiting. Note that this source only works with “register_globals” turned on. As you can see, the code is quite simple. Load the needed libraries, create the WURFL object by passing in the user agent, set the proper headers, and check the capabilities you care about with getDeviceCapability(). In this demo all the values checked are booleans. Don’t forget, though, that you could also have capabilities that use strings and numbers. This is a tiny demo, but it demonstrates just how easy it would be to build a site using WURFL. Figure 1 shows a set of simulator screenshots of how the demo might look on different devices. These are not taken from real devices, but rather show how the demo site would look, depending on the device An alternative use: WURFL emulator While developing WAP pages I encountered many problems in getting my hands on all the devices that I was asked to support. Once again, WURFL was of great help. I built a PHP script that takes the data from WURFL and simulates a WAP browser. This way I can see the exact WML printed. The basic idea is quite simple. If WURFL knows a lot about a device, it will be the Figure 2
June 2003 · PHP Architect · www.phparch.com
Tailoring W@P sites with WURFL best one to simulate it. The target of this emulator is the developer, not the end user. When you are working on a site that needs to support many devices, you will need to have them all handy and maybe move your SIM card from one device to another. This can be tedious, if not impossible. With this emulator you can simulate each device and see what would happen. This, of course, is not the real device, and you don’t have a WAP gateway in the middle. You shouldn’t take it as the final truth, but it can help tremendously. The emulator lets you pick a device from WURFL (you will also have a list of the mostrecently-used devices), send custom headers (including cookies), click on links (with a little help from javascript) and more. Figure 2 shows the top half of an action screenshot of the emulator. As you can see, the WML is shown, and you can click on links. I used to test my sites with wget from the command line, but I had to copy and paste the URLs all the time. WAP also requires you to use “&” instead of “&” in query strings, so this meant I had to manually edit the links all the time. My first step towards the emulator was writing a script with Awk to parse WML pages and print out the links ready to copy and paste. This was not usable, though, and was instead an emergency tool. Listing 5 shows a code snippet from the WURFL emulator that shows how it creates the links. I consider this the smartest part of the software at this time. Syntax coloring is on my TODO list. I first found and adapted a regular expression that could parse out as many links as possible. I then created a function to add the necessary JavaScript. The JavaScript is necessary because of what is shown in Figure 3 (the bottom half of an action screenshot of the emulator). Figure 3 shows a form that gives the user the chance to check and maybe modify the URL they clicked on. The JavaScript fills the URL field. If everything is fine, then the user will click on “Submit” to fetch the page. 95% of the time you may not modify the URL, but sometimes I wanted to check it, copy it to the clipboard, or do something else with it. This feature has been useful. The emulator is still not
46
FEATURES perfect, there are many cool features that could be added, such as checking the validity of the WML, colors, and a better layout to make it more readable. Nevertheless, it’s probably the most useful testing tool that I have ever used for WAP sites. With the WURFL emulator I can test all the devices that I need to support. Any device that is listed in WURFL can be simulated. Another useful feature that I’m planning to implement is better support of the headers that the device would send to the web server. As of today I only simulate the user agent and have statically implemented the accept header, but there are many other details about the devices that are not being sent to the server. Online projects As of today, I have installed WURFL in about 4 online applications. They are all related to multimedia download. WURFL has been very useful in recognizing the devices visiting my sites and to know what content could be delivered, and how. Siemens is currently working on a commercial service that should be available around the beginning of June. The service gives users the opportunity to read teletext over a huge variety of devices: web browsers, PDAs and WAP phones (both WML 1.x and XHTML-MP). The entire application is based on WURFL.
Tailoring W@P sites with WURFL The future The members of the mailing list are always active and working on possible new tags. As more people get in touch with the project, we receive lots of comments and requests. The WURFL project, like any good opensource project, is open to everyone’s ideas and we always consider any suggestion. These days I am preparing a new set of tags that were proposed and approved in recent weeks. The next release candidate, due out at the beginning of May will include information about XHTML, i-mode and more. More and more developers are starting to play with WURFL and discovering the advantages. Many of them have already worked with similar projects and always have a lot of suggestions and comments to make it better. If you are interested, you can visit our web page and read more (http://wurfl.sourceforge.net) about it. If you would like to offer your contributions, please contact us on wmlprogramming (http://groups.yahoo.com/group/wmlprogramming). About The Author
?>
Andrea Trasatti started his career as a SYSOP for the second BBS in Italy to offer internet access. As the internet grew, he integrated his experience with the development of web applications. Now he specializes in the development of multichannel applications. He is an active member of the open-source community. Some of his projects are the leading value added services for one of the biggest mobile carriers of the world.
Click HERE To Discuss This Article http://www.phparch.com/discuss/viewforum.php?f=27
Figure 3
June 2003 · PHP Architect · www.phparch.com
47
Can’t stop thinking about PHP?
Write for us! Visit us at http://www.phparch.com/writeforus.php Click HERE for our “Author Guidelines”
FEATURES
FEATURES
GETTING A GRIP ON LDAP By Brian K. Jones
In this article, I will cover the basics of what LDAP is and why it is useful. In the process, I hope to also dispel some common myths regarding LDAP and what it is used for and to enlighten the reader as to how to make effective use of a directory service. I will follow up with a lightning-fast intro to getting the data you need from a directory server using PHP, as well as a quick overview of a couple of tools available to help make your interactions with PHP and LDAP a little easier.
The Setup So, it’s finally happened. You were brought in on a web development project for the HR department of a huge, multinational firm. The client has been told that you are nothing less than a PHP and web development genius. You almost start to believe this yourself; you have a few jobs under your belt, you can code with your eyes closed, and you freak if you see the letters ‘PHP’ on a license plate. Then it all comes crashing down; on your first day you ask for the information you’ll need in order to make a database connection to the HR database, only to be told ‘this project only uses data from our LDAP server’. Ugh. Your thoughts immediately turn to those of grungy, grey unemployment lines or flipping rat-burgers for the local ‘Grease Kingdom’ diner, or – well, you get the picture. Fact is LDAP use is currently not an uncommon method of storing data which might otherwise be found in NIS maps, system files or user or inventory databases. Furthermore, LDAP is becoming increasingly popular and is being used to store a broader range of data every day. There are many reasons for companies to migrate to LDAP, and there are many diverse tasks for which LDAP appears to be the perfect tool (at least for now). To illustrate this point, here’s a quick list of software with native support for LDAP:
June 2003 · PHP Architect · www.phparch.com
• Apache web server’s mod_auth_ldap allows Apache to authenticate users against an LDAP directory. • Sendmail can retrieve mail routing information, as well as maps and aliases, from a directory. Other mail servers, such as Qmail and Exim also support LDAP. • Most email clients (Netscape, Eudora, Evolution, Kmail, Entourage, the list goes on) at least have support for auto-completion of email addresses based on LDAP searches. • Samba can use an LDAP directory as a back end for authentication and roving profile storage for Windows users. • The nss_ldap and pam_ldap modules allow most UNIX variants (including Linux) to perform password, group and host lookups REQUIREMENTS PHP Version: N/A O/S: N/A Additional Software: N/A Code Directory: ldapintro
49
FEATURES against a directory. There is also a standardized way of using LDAP as a (theoretical) plug-in replacement for NIS. There is plenty I’ve left out here for the sake of brevity; you’ll need a stronger grasp of the basics before you go venturing down the road of storing things like Java objects and automounter maps in a directory. The real point I’m making here is that LDAP is becoming more ubiquitous. LDAP is being used to store more and more data (and not less), and that data is generally what drives any web application (or, indeed, any application) worth writing. Therefore, it behooves you to have at least a basic understanding of what makes LDAP tick, how you can come to terms with it, and what tools exist to help you out. I will not spend hours poring through the dry, RFClike data which actually defines the inner workings of LDAP. After all, that’s why the RFC’s exist – to serve the masochistic and, to some extent, provide useful information to developers. Instead, I will focus on high-level concepts and try to drill down into how you as a PHP developer will interface with it. I will also explore the challenges and benefits it offers someone who has only ever developed against a database back end. LDAP Is Just a Database, Right? LDAP stands for Lightweight Directory Access Protocol. Technically, it is only a protocol for interacting with data. LDAP is not, in and of itself, a data storage application of any kind. In fact – it’s not an application at all! However, the LDAP protocol really doesn’t have much use without some storage mechanism behind it, so for all practical purposes, it isn’t unreasonable (or uncommon) to refer to an LDAP implementation’s three main pieces collectively as ‘an LDAP server’. You can have a look at those three components in Figure 1, and I’ll briefly discuss each of them:
Getting a grip on LDAP
2. The Directory Service Agent (DSA) The DSA is the actual binary executable daemon which is listening on a port, throws errors, performs logging and all the rest of the stuff that we think a server process should do. It receives LDAP requests, parses the request, performs security checks, manages data indices, grabs whatever data we need and ships it out. It’s the middle man (and, indeed, gatekeeper!) between the protocol itself and the back end storage mechanism, which we’ll get to now... 3. The back end storage mechanism See? I told you we were getting to it! Contrary to what you might be thinking right now, this is actually the component of LDAP that you’ll be the least concerned with as a PHP developer, unless you’re also setting up the LDAP server from scratch, in which case it’s still not a very big deal. Another common assumption is that the back end storage mechanism is a relational database. This is not necessarily – or even usually – true. Truth is the back end storage can take many forms: a flat-file database (like BerkeleyDB – which is recommended with the current OpenLDAP server offerings), a relational database (I believe IBM’s SecureWay uses IBM’s DB2 RDBMS as a back end – you can also use MySQL with OpenLDAP), regular text files (such as /etc/passwd, /etc/groups and the like), or Figure 1
1. The LDAP Protocol For all intents and purposes, the protocol is really just an agreed-upon way of asking for or performing actions upon the data stored in a directory. The nice thing about the protocol is that it supports both querying and manipulating data at the protocol level. While this still doesn’t afford you things like the ability to rollback transactions (ie, you can’t send a ‘BEGIN’ and ‘END’ over the wire), it does make applications easy to port from one LDAP implementation to another. Generally speaking, applications using standard LDAP calls will work against Novell’s eDirectory as well as OpenLDAP, SunOne Directory and possibly even Microsoft’s Active Directory. June 2003 · PHP Architect · www.phparch.com
50
FEATURES
Getting a grip on LDAP
even arbitrary external programs can be used as possible storage channels. In addition, since DSAs just parse protocol requests and nobody has to know any better, you can set up a DSA to simply act as a proxy, which answers calls by (transparently) grabbing the data from another LDAP directory! This is why the link in Figure 1 connecting the DSA to the storage doesn’t specify a communication protocol or library call of any kind. It could be anything! From a distance (and that’s all we’re concerned with, really), an LDAP server looks like just another box, serving up just another service – which is really okay with us. LDAP is generally the domain of a system administrator or, more specifically, a directory administrator. You and I just need to know how to get the data we want out of this beast. Doing something useful with LDAP data will inevitably involve understanding how data is structured within the directory. Although LDAP shares some similarities with a traditional database, there is a decisive fork in the path where data structure is concerned, and this will affect your interaction with it. The Essence of LDAP: Objects LDAP stores data in what amounts to a hierarchical collection of objects. These objects form a tree-root-like structure. I say ‘tree-root’ because visually an LDAP tree would appear to start at the trunk, with the child objects of the trunk spreading out underneath it like roots from a tree. The fact that each individual entity in
an LDAP directory has its own object entry should make some of the differences between LDAP and a traditional database quite clear. Instead of having a user record, like you would find in a database, LDAP will store an object entry for each user. Let’s have a look at Figure 2, which shows a very simple LDAP tree. This is enough to get a feel for the hierarchical nature of an LDAP directory. At the top (or ‘trunk’) of the directory structure is what is known as the ‘Base Distinguished Name’ or ‘basedn’. It generally maps to whatever represents the ‘all-encompassing’ entity in an organization. In our case, everything that my directory will ever know about will be within the ‘dc=linuxlaboratory,dc=org’ realm.
“Although LDAP shares some similarities with a traditional database, there is a decisive fork in the path where data structure is concerned, and this will affect your interaction with it.” Underneath our basedn are a couple of ‘organizationalUnit’ objects (ou’s). I have one ou which is a parent to all of the ‘People’ objects and another
Figure 2
dc=linuxlaboratory,dc=org
ou=People,dc=linuxlaboratory,dc=org
June 2003 · PHP Architect · www.phparch.com
ou=Hosts,dc=linuxlaboratry,dc=org
51
FEATURES which is in charge of all of the ‘Host’ objects. The actual People and Host objects are represented as those little circles sitting underneath their respective ou parents. Figure 3 shows an example of what one of those People entries, represented by the little circles, looks like up close.
NOTE: Figure 3 is an incredibly simple entry that is commonly found supporting simple applications like a personal address book. The entry for a person in a corporate directory may very well have 30, 40, or more attributes associated with it!
These ‘leaf’ objects like people and hosts are generally the subjects of your LDAP searching. Armed with these basics, let’s take a look at some of the LDAP functionality in PHP and how you can use it to get what you need from an LDAP directory.
Getting What You Need From LDAP Part I: Making Contact In order to search an LDAP directory, you first have to establish a connection to it. This is a trivial matter, and simply calling the built in PHP function ldap_connect(), passing a host and optional port argument will return a link resource identifier. The second operation you must perform before sending an actual search query along is called ‘binding’ to the directory. This is the act of sending your identity and password (if necessary) along to the DSA. The DSA will use this information to perform both authentication (are you who you say you are) and authorization (what operations are you permitted to perform) procedures on the back end. This is all done with one simple call to ldap_bind(), which takes the link identifier from our previous call to ldap_connect(), along with an optional username and password. Assuming ldap_bind() succeeds, you’re free to start the real work! Here’s a snippet of code showing an anonymous bind:
Getting a grip on LDAP
NOTE: Whaddya mean ‘optional username and password’? If you call ldap_bind() with nothing but a connection identifier, you’ll be performing an ‘anonymous bind’, which will usually afford you some ‘default access’, like the ability to search all or part of the directory and nothing more. Some directories forbid anonymous binds.
Clearly, it would benefit you to replace the servername I’ve used with your own. Additionally, it’s worth pointing out that you can replace the hostname with an IP address. The rule basically is that you can put the hostname there as long as the resolution of that name to an IP address doesn’t rely on LDAP. Usually, DNS is called upon for this, so it isn’t a problem. The astute reader will also notice that we’re performing an anonymous bind here. This is because we’re not going to alter any of the data yet. There’s plenty of ground to cover in basic searching and returning results! If there is interest in going further with this material, please visit the forum for this article, and let me know – I’m happy to extend this tutorial into more advanced operations. For now, let’s search! Figure 3
FEATURES Part II: Simple Searches If you’re looking for a WHERE clause to use in your searches, you can forget it. It doesn’t exist at the protocol level, which is to say that ‘LDAP doesn’t have a WHERE clause’. Gasp! Then how do you specify what you want and narrow your search results? The answer to this question introduces three new terms, which I’ll cover briefly here: • search base – this is basically how you tell the directory which part of the tree to start searching from. For example, in our simple tree, it’s probably safe to assume that if I’m searching for ‘People’, I don’t need to search the ‘Hosts’ section of the tree. Telling the DSA to use ‘ou=People’ as the starting point of the search will cause the server to search only things that sit below this parent object. If I told the server to use ‘dc=linuxlaboratory,dc=org’ as the base, however, I would be searching the entire directory. • search scope – this tells the DSA how many levels of the tree you want to search. For example, if I start my search at the root of the tree (dc=linuxlaboratory,dc=org in my case), and tell it to return all objects, but only give it a scope of ‘ONELEVEL’, the search will only return the two ou’s underneath the base. If I perform the same search with a scope of ‘SUBTREE’, the entire directory will be traversed, and every object in the directory will be returned. • search filter – searching operations are implemented at the protocol level, so there’s a simple syntax that LDAP uses in order to
Getting a grip on LDAP allow for the narrowing of search results. This is a bit of an adjustment for SQL users, but it’s certainly not hard by any stretch of the imagination. To cover it all here would take up too much space, but you’ll be able to get a good grasp of how the syntax works from the examples coming up! Now that we have the searching terminology under our belts, let’s have a look at the only two LDAP searching functions I know of in PHP which return a search result resource: ldap_list() and ldap_search(). The only one difference between the two functions is that ldap_list() has a default search scope of ‘ONELEVEL’, while ldap_search() has a default search scope of ‘SUBTREE’. If you know that your search will only pertain to a particular part of the tree, and you don’t want the search to continue on down through the depths of your directory, ldap_list() can save you a lot of time. This is usually more useful in larger enterprise directories or badly designed personal address books. Both are likely to have a directory structure that is several levels deep. The difference is that this is way more easily (though not always) justified in an enterprise directory deployment, which may have an organizational unit for each office under which resides a ‘People’ ou, which is further subdivided into departments, as required by some LDAP-enabled application (or politics). In the code tarball that came with this issue, there’s a file called ‘dirview.php’. In it, you’ll get your first look at the ldap_list() function in action. What I’ve basically accomplished with this small bit of code is a quick ‘inspector’, which shows me all of the organizational units that sit just under the basedn. For each ou, I’ve
Have you had your PHP today?
Visit us at http://www.phparch.com and subscribe today. June 2003 · PHP Architect · www.phparch.com
php|architect 53
FEATURES
Getting a grip on LDAP
also listed out the names (called ‘common name’ or ‘cn’ in LDAP) of any objects that sit below the ou. Let’s walk through it together – starting at the top:
Here, I’ve just defined everything I’ll need to make a connection to the database. This is about as simple a connection as one can make. Remember that there are more complex ways of binding to a directory, using other functions such as ldap_set_option() and ldap_start_tls(). We’ll just stick to a simple local bind operation here. Now to get some data! $searchbase = ‘dc=linuxlaboratory,dc=org’;
Here, I’ve passed my connection resource ($conn), my search base, and a very, very simple search filter (‘ou=*’) to the [ldap_list()] function. Note that if there were ou objects nested three levels deep in my directory information tree (or DIT), ldap_list() would not tell me about them, due to the aforementioned search scope of the function. If someday I add some layers to my tree, I can simply replace the words ‘ldap_list’ above with ‘ldap_search’, and things will just work. They take the same exact arguments, in the same exact order. There are plenty of other arguments to pass to these functions, so check them out in the PHP manual. Once we get our result resource back from ldap_list(), we can define the array of entries and
Why not just migrate to a database? Why LDAP? Good question. There are a number of answers: • Much of the data design work is done for you in an LDAP deployment. There are published, tried and true schemas out there, which are widely used and supported (many of which will be delivered with whichever LDAP product you deploy). This means that there’s a decent chance that you can write a directory-enabled application in one environment and easily port it to another. It also means that applications support a standardized way of getting at the data they’re after. An example is found in the case of the many email programs which support LDAP. Every single one of them (that I’ve seen) supports looking up email addresses by requesting the ‘mail’ attribute associated with each user object entry. The end user knows nothing of this request – they just see the autocompletion functionality, which, to them, ‘just works’. • The protocol makes the back end storage completely transparent. If you decide that your BerkeleyDB back end isn’t working out for you, you can easily switch the back end to something else – another database, some other random external program, or another directory server. Since the applications and the server communicate using a standard protocol, there’s nothing to change in the client application. • An object model changes all the rules! Believe it or not, storing more than one value in any tuple of a database is bad, bad, bad design. That means if you have a user table in a database which has a column for ‘email’, you can only store one email per user. You could, of course, have multiple records with the same uid, but that leaves you with a lot of data redundancy for what amounts to one unique tuple of data. You could also have something akin to an email lookup table, but it’s a bit clunky – doing a join just to associate a user’s name with their email. LDAP allows you to store multiple instances of an attribute with a particular object entry. So if a user has six email addresses, you have only one object entry for the user, with six instances of the ‘mail’ attribute for that particular user. Other users may only have one or two. • LDAP allows for more granular security than your average database. The equivalent security in a typical RDBMS would basically allow you to secure at the tuple level. This does not exist, to my knowledge, though it can be implemented in the application code that asks for the data – but that’s more code! LDAP does this for you on the back end by making sure that the user who performed the bind operation is allowed to access ‘attribute x of entry y’. It’s a big time saver for application developers and an added layer of defense for administrators.
June 2003 · PHP Architect · www.phparch.com
54
FEATURES start looping through to extract the elements we’re interested in. This is done by calling ldap_get_entries(), which returns a multidimensional array. Let’s walk through the code that deals with the array: for($i = 0; $i < $oulist[“count”]; $i++) { $ou = $oulist[$i][“dn”]; echo “Organizational Unit: “.$ou.” \n”;
“A common assumption is that the back end storage mechanism is a relational database. This is not necessarily – or even usually – true.” Here, notice our first reference to the $oulist array is to an element that PHP builds in for us called ‘count’, which, oddly enough, contains the number of entries returned by our search. Next, we define an $ou as the distinguished name (“dn”) of the current entry (marked by $oulist[$i]) . With the ou now printed out, we can move on to see what objects belong to that ou: $retattrs = array(‘cn’); $objresult = ldap_list($conn, $ou, ‘objectclass=*’, $retattrs);
Here, we’re going to do another ‘list’ operation, but with two small twists thrown in. First, instead of returning the entire entry matching our search criteria, we only want the common name returned (the ‘cn’). This is the optional fourth argument to ldap_list(), and must be in the form of an array, even if there’s only one value in it. This will speed up searches in larger directories by limiting the number of attributes returned for each entry. The second twist we’ve thrown in is that we’ve changed the search base to be the current value of $ou. Finally, our filter in this case is ‘objectclass=*’. Since every object in an LDAP directory is required to have an ‘objectclass’ attribute, this is functionally equivalent to saying ‘every entry’. In SQL, this is a lot like saying “SELECT cn FROM $ou”. The difference is that it’s not really safe to make the assumptions about a database’s structure that we can about the structure of a DIT (Directory Information Tree). Moving on to the end of the script, we see more of the same things that we saw closer to the top of the
June 2003 · PHP Architect · www.phparch.com
Getting a grip on LDAP script. We get back our entries, loop through them, and grab what we want: $objlist = ldap_get_entries($conn, $objresult); for($o = 0; $o < $objlist[“count”]; $o++) { echo “Found object: “ . $objlist[$o][‘cn’][0] . ” \n”; } ?>
There is one interesting note to make here, though: how is it that ‘cn’ wound up being an array itself – requiring us to specify that we want the first element ([0])? Well, what we’re really doing here is saying “if
If LDAP Is So Cool, Why Bother With a Database? Another great question, and again, there are several answers: • Databases (well, the full-featured ones) support the notion of an atomic unit of work, which can be defined by the user. This means that if you want to update 1000 user records, but rollback any changes if there’s even a single problem, you want to stick with a database. LDAP won’t do this, though it is supported at the application layer by some of the more mature LDAP administration tools. • LDAP is not relational. While the nature of its data model allows for some data interactions that emulate a relational database, LDAP is not truly relational. There’s no such thing as a ‘JOIN’ in LDAP. In fact, LDAP doesn’t have a query language of any kind at all, as the protocol itself supports the querying and modification of the data on the back end. • LDAP servers (DSAs) are generally designed to provide data that is ‘read-intensive’. This makes it great for things like user account information for which few attributes are changed infrequently. Since the directory is optimized for read operations on the data, writes are even more expensive than they would be in a typical database, so you’re not going to see the back ends of major NYSE trading floor applications dropping their RDBMS in favor of LDAP.
55
FEATURES there’s more than one value for ‘cn’, just hand back the first one”. This is a result of LDAP being able to store an arbitrary number of instances of certain attributes for each entry.
NOTE: What’s all this ‘cn’ and ‘ou’ business? The formatting for submission and presentation of data in LDAP is called LDIF, which stands for LDAP Data Interchange Format. Figure 3, which you saw earlier, shows the details of an object entry in LDIF format. ‘cn’ and ‘ou’ are attributes of the object identified by its ‘dn’ (distinguished name). If you were doing complex LDAP operations (changing attributes or moving objects from one tree to another, bulk updates, etc), you would traditionally write out everything to an LDIF file and then run a tool like ‘ldapmodify’ in UNIX/Linux, telling it to read the LDIF file for its instructions.
Getting a grip on LDAP
LDAP-Related Tools and Applications The world of LDAP is a unique one, indeed. To help you along, there are plenty of applications to aide in administration or to give you a start in development. In my daily work with LDAP, I wear the hats of both administrator and developer, so I’ve done some of the discovery for you. The first thing you’re going to need if you’re interested in LDAP to any great degree is, well, an LDAP directory. While it’s true that you can get free copies of Novell’s eDirectory or Sun’s SunOne directory to play with (and they’re interesting products), I found in getting started that they actually put a little too much functionality into the hands of the end user who isn’t already intimate with the workings of an LDAP directory, how schemas work, how objectclasses and matching syntaxes are defined, etc. Lucky for us, the open
Resources Part III: More Complex Filters Discriminating among the sea of objects in your DIT can seem like an uphill battle from where we stand right now. However, there is a very simple syntax for creating more complex search filters that is easy to grasp and at the same time very powerful. To me, it looks a bit like Algebra 101. Let me show you what I mean: (&(‘sn=jones’)(‘roomNumber=101B’))
As in almost every other language and context, the ampersand (‘&’) is an ‘and’ operator. When applied in a search, this filter will match any entry which has a ‘sn’ attribute value of ‘jones’ AND a ‘roomNumber’ attribute of ‘101B’. In addition to the logical ‘AND’ operator, you can use the logical ‘OR’ operator (a pipe – or ‘|’), as well as the logical ‘NOT’ operator (‘!’). Furthermore, you can string these things together and make a rather more complex search filter, like this: (&(|(‘givenname=Brian’)(‘givenname=Jon’))(‘objec tclass=person’)(!(‘sn=jones’)))
This will return all ‘person’ entries where the surname (‘sn’) is NOT ‘jones’, AND the first names (givenname) are either ‘Brian’ OR ‘Jon’. At first glance, this may look intimidating, but as you start to work with the filter syntax, it all becomes quite easy. One tip I’ve found useful is to assign your filter to a variable instead of just including the whole thing in the argument list for one of the search functions. It makes debugging and reading your code a lot easier.
June 2003 · PHP Architect · www.phparch.com
• The OpenLDAP website: http://www.openldap.org • YoLinux Tutorials: http://www.yolinux.com/TUTORIALS/Lin uxTutorialLDAP.html • Novell’s ‘AppNotes’ repository is a valuable source of information for both developers and administrators. The information there regarding LDAP and eDirectory can often times be applied to any LDAP-compliant deployment. Check out this one regarding using PERL, Python or PHP to access their eDirectory LDAP product: http://developer.novell.com/research/app notes/2003/may/04/a030504.htm • Various resources linked from http://www.ldapman.com • The most thorough overview (that I know of) of LDAP, OpenLDAP, and applications which use LDAP: ftp://kalamazoolinux.org/pub/pdf/ldapv3 .pdf • “Understanding and Deploying LDAP Directory Services” 2nd Edition, Copyright 2003, Addison Wesley Publishing. • “LDAP System Administration” by Gerald Carter, Copyright 2003, O’Reilly & Associates.
56
FEATURES source community provides an enterprise-strength LDAP server in the form of OpenLDAP (http://www.openldap.org). What’s more, O’Reilly has just published their ‘LDAP System Administration’ book by Gerald Carter, which could very well have been titled ‘OpenLDAP Administration’, since all of the examples and concepts assume you’re using OpenLDAP. Besides this wonderful resource, the OpenLDAP site’s Administrator’s guide should be enough to get you going. Once you have a directory daemon of some sort running on some machine somewhere, you’ll probably want to start performing some basic operations like adding, deleting and modifying entries. Being that we’re all sort of ‘web minded’, and more specifically, we’re all PHP savvy, I’ve picked out a couple of browser based PHP apps to tickle your fancy. The first one is called ‘DaveDap’, which you can see a shot of in Figure 4. I happen to like DaveDap. It has a decent interface and makes doing simple things simple. Pretty much everything you need to put together a basic directory server, using any LDAP-compliant directory implementation, is here at your fingertips. However, in doing a ‘grep’ of the code tree for version 0.8.0pre1 for the words ‘ssl’ and ‘tls’, I came up dry. This basically means that binding to your directory
Getting a grip on LDAP server via DaveDap passes your password in clear-text. This is probably ok for a small, ‘localhost’ directory, but I cannot use it in this form when I head to the office – even in a testing environment! The second tool I’ve found is a very simple administrative interface called ‘YALA’ (Yet Another LDAP Administrator). The interface is quite similar to DaveDap, as you can see in Figure 5 (showing off YALA’s ‘create new object’ interface). I like YALA for two simple reasons. First, it has some functionality that makes my admin duties easier (like being able to create a template for creating certain objects quickly). Second, the code is so mind-numbingly easy to read and understand that it’s really simple to extend! Though the project maintainer doesn’t know it was me (unless he’s reading this), I’ve actually submitted patches to YALA so that it will support the two big features I need in an administrative tool: TLS support and bulk operations. I imagine these will be rolled into the application once they’re thoroughly tested. These two applications are fairly standard representations of the web-based administrative tools out there to help you get a directory running and populated with meaningful ‘stuff’. In addition, if you’re running Linux and administering an LDAP directory, you should certainly not be without a tool called ‘GQ’ (http://biot.com/gq). It’s a GTK application which sup-
Figure 4
June 2003 · PHP Architect · www.phparch.com
57
FEATURES ports most things you could ever want in a simple administrative interface, including many things I haven’t even covered here. Once you have some cool things in your LDAP directory, you can get to work writing applications to use that information or have a look at the many useful tools and applications which already include LDAP support. Typing ‘LDAP’ into a Freshmeat.net search returns about 200 applications; many of which are groupware, calendaring, email and other end-user applications that are tremendously useful. In Closing LDAP is an oft misunderstood topic by newbies and IT professionals alike. The notion that LDAP is ‘just a database’ is a perfect example. By now, I hope that you realize how silly this notion really is, and you can begin to understand that the differences between LDAP and a typical RDBMS make them inherently suited to different tasks. There are a couple of papers at LDAPzone.com (http://www.ldapzone.com), which further discuss the differences between an RDBMS and LDAP and which one you might consider using for which tasks. Many a horror story has been born from one overzealous consultant’s itch to use LDAP to solve a problem which LDAP was not even meant to address. While LDAP is extremely adept at handling a particular class of prob-
Getting a grip on LDAP lems, stepping outside of the boundaries of what it was designed to do well will quickly lead to misery. By now I hope you’re excited about some of the useful things that can be done with LDAP and PHP. At the very least, I hope I have armed you with enough information here that you won’t shudder in fear when your next consulting client mentions the term ‘LDAP’. If I have failed to do this in some way, I urge you to visit the forum for this article and ask questions. I’ll do my best to answer them, or at least point you in the right direction.
About The Author
?>
Brian Jones is editor-in-chief of php|architect magazine, founder of Linuxlaboratory.org, and a contributor to several open source projects. During the daytime hours, he works as a systems/network admin and PHP developer in the computer science department at Princeton University. He is rarely idle.You can flame him directly at [email protected]
Click HERE To Discuss This Article http://www.phparch.com/discuss/viewforum.php?f=28
Figure 5
June 2003 · PHP Architect · www.phparch.com
58
REVIEWS
PHPEdit
REVIEW
By Peter James
I
’m a GVIM user. I don’t generally go for integrated development environments, and I dislike pretty much any editor that requires me to take my fingers too far away from the center of the keyboard (to the arrow keys, for example). Believe it or not, my less-than-glowing review of Macromedia Dreamweaver’s PHP support last month actually piqued my interest in PHP IDE’s. This month I set out to find one that wasn’t just more of the same,
Quick Facts Price: FREE! License: QPL Download http://www.phpedit.net/products/PHPEdit Description: PHPEdit is an IDE (Integrated Developement Environment) under windows to work with PHP. Homepage: http://www.phpedit.net
June 2003 · PHP Architect · www.phparch.com
and it wasn’t too long before I ran across PHPEdit. PHPEdit’s massive feature list was what really caught my eye. All of the usual suspects were present, but there were a few things that really stood out to me. The plugin API, the built-in code beautifier, the keyboard templates, and the QuickMarks features were all very enticing, just to mention a few. Let’s put PHPEdit to the test, and see what falls out. Installation and configuration PHPEdit installed easily, and offered the choice to install from the Internet or from local files. If you install from the Internet, you are given the option of saving the installation files for later local installations. On start-up you are presented with the option to associate PHPEdit with all relevant file types (.php, .html, .xml, etc). This is handy, but if you choose to not see it again, there doesn’t seem to be any way (inside of PHPEdit) to change these associations again. This should maybe be a preferences item. In order to use the interactive debugger, you must install DBG (http://dd.cron.ru/dbg). Installing this on my Windows box was actuallly pretty easy, although actually setting it up in PHPEdit was a little confusing. The PHPEdit manual had some slightly different installation procedures from the DBG install.txt file, which was a little frustrating. It also took me a while to realize that I had to restart PHPEdit before the debugger support would work.
59
REVIEWS Editing The editor in PHPEdit is amazing. At its root, it’s just a plain old Windows editing environment, offering all of the standard Windows shortcut keys. Beyond that, it is a highly-configurable editing environment offering full shortcut key customization, code completion, keyboard templates, bracket matching, positional marking, and contextual syntax highlighting. The code completion abilities in PHPEdit are excellent. At any time you can bring up the options available to you with Shift-Space. The completion pop-up contains context-sensitive variable names, functions, constants, and more. It is very thorough, and offers a small description of PHP functions as well. Keyboard templates in PHPEdit are snippets of code that you shouldn’t always need to type. An example of this is typing “fri”, followed by a space. This inserts the following idiom: for($i = 0; $i < ; $i++){
PHPEdit positioned next to a bracket, it and its match are both underlined in red. This should help track down even more errors. The QuickMarks are a really cool feature. Normally, IDE’s might let you set bookmarks (as PHPEdit does), allowing you to quickly jump from one part of your code to another. QuickMarks take that one step further to define your actual position on a line. This might be particularly useful when refactoring code. You can stop editing your new function, drop a mark down, navigate to where you want to copy some code from, copy it into the clipboard, then hit Shift-Escape. This will take the code you just copied, and replace your last QuickMark with it. Now that’s efficient! As a web developer, you will often have mixed content in your pages. PHPEdit’s contextual syntax highlighting dims out the HTML content when you’re editing the PHP code, and dims out the PHP code when you’re editing the HTML. This can remove distractions, and help maintain your focus on what you are currently editing.
} // for and places the cursor right before the second semicolon in the ‘for’ statement. You can design your own keyboard template to suit any situation. Very cool. Any type of bracket or quotation mark is automatically closed when started. This can help prevent some of those more elusive parse errors. A related feature is automatic bracket matching. If your insertion cursor is
Other features PHPEdit is completely scriptable. All operations in the application and its plugins are defined by commands. These commands can be mixed together and assigned to new toolbar buttons or shortcut keys. This means that you can completely tailor the application to your liking, as well as create complex operations to run at the click of a button.
Figure 1
June 2003 · PHP Architect · www.phparch.com
60
REVIEWS As a nice bonus, PHPEdit comes with an integrated code beautifier. This little beauty, which also can ship as a separate application, is very useful in dusting off those old scripts, and keeping your new ones up to scratch. It comes with a number of configuration options, and works very nicely. PHPEdit also comes with a full plugin API for both compiled and, get this, PHP plugins. This means that you can extend PHPEdit using PHP! How cool is that? Any supported help document can be integrated into PHPEdit. This means that you can add just about any structured software documentation you want to the help window. PHP, PostgreSQL, and MySQL documentation can all live harmoniously in the same place, even though one might be a CHM file, one might be DocBook, and one might be an indexed helpfile. Adding a documentation type is as simple as editing three lines in an XML file. Very cool. What I liked Well, just about everything, actually. What I didn’t like Not too much. [ Editor’s note: It only runs under Windows. :-( ] I did not have great success with the debugger. This may have been due to my setup, but I tried it on two different platforms with little success. Regardless, there should be more documentation on the setup of the debugger on different platforms for PHPEdit.
June 2003 · PHP Architect · www.phparch.com
PHPEdit
In Conclusion I must admit that the version of PHPEdit that I tested was a development version (0.7.1.129), and did crash occasionally on my Windows XP box. I felt, however, that I’d get a better idea of the direction this application was taking by testing it over the stable version (0.6 – released in late 2001). Even though it was not a stable release, the functional behavior was very solid. All of the features I mentioned above (with the exception of the debugger) worked fine, with few surprises. I would enthusiastically recommend this editor to anyone, including myself (the purist). It has a feature-set that can make newbies look like pros, and pros look like gods. With a working debugger and a stable release, this gives a number of the commercial editors out there something to think about. I’m giving PHPEdit a 4.5 out of 5.
php|a
61
FEATURES
FEATURES
Object-o oriented Form Management With PHP (and an Eye on PHP5) By Marco Tabini
While a library for creating forms may add too much overhead to a high-traffic website and leave too little room for customization, management sites offer a perfect scenario for implementing object-oriented form libraries that promote uniformity and reusability.
I
f you regularly create web pages, chances are that you also regularly create HTML forms. Personally, I find forms one of the least fun parts of creating web pages. It’s not that I really like writing HTML code in general, mind you, but forms are definitely at the top of my Top 10 HTML Hate List. On top of the normally unpleasant syntax of HTML, forms involve actually soliciting information from the user. This usually requires a good amount of attention to detail in order to ensure that someone using your website, either on purpose or by mistake, will not wipe out half your database at the click of his mouse. Believe me, it’s happened, and it will happen again. When you’re writing forms for a high-traffic website, you really have no choice but to create your HTML code by hand. Each page often has a different look, and the performance constraints of high traffic loads make it difficult to justify the creation of any aiding tool, such as a library. A possible alternative, naturally, would be to create a library that generates the HTML code based on your specifications, but that would still be complex and impractical. Backend systems, on the other hand, are a completely different story. Most of the time, they do not suffer from the same bandwidth and design constraints
June 2003 · PHP Architect · www.phparch.com
as their frontend counterparts. For one thing, chances are that only a very limited number of people will be using a backend system (which is often secured through a username/password authentication scheme exactly to ensure that only the right people can access it). Additionally, where a frontend website requires diversity and creativity in design, backend systems actually promote linearity, simplicity and consistency in their design. A Form Library For the Ages A library for managing HTML forms can bring a number of benefits to your application: • Consistency—if all the forms are generated by the same library, they will all look similar and behave in a similar way (unless you need them to be different). • Reusability—if you plan things the right REQUIREMENTS PHP Version: Any O/S: Any Code Directory: forms
62
FEATURES way, you will have to write your code only once and then reuse it in all your scripts • Safety/Ease of Maintenance—a single codebase means fewer errors that only have to be fixed in a single location • Extendibility—you can build on your existing code to implement additional functionality throughout your entire codebase Clearly, a library also has some limitations. First of all, you won’t be able to create a completely unique design for each of your forms, although, as we’ll see later on, a properly designed library codebase can offer a good level of customization without compromising the uniformity of the interface. Additionally, no matter how efficient your code is, a library will introduce a certain amount of overhead that you have to take into consideration. It might not be enough for you to worry about if you’re working on your online store’s backend, but it will be there nonetheless. Although I am not a big fan of object-oriented programming (OOP), I cannot deny that there are several applications for which it is a perfect fit. In my experience, form management is one of them—at my company, we’ve been using the same forms library for close to two years now. Although we’ve made all sorts of improvements, the basic layout has pretty much remained unchanged from the beginning, resulting in a solid codebase that has grown to encompass such esoteric user controls as WYSIWYG HTML editors, spin editors, and so forth. Another good reason to use objects—where they make sense—is that PHP5 is becoming more and more a reality and, with its introduction, the level of support for OOP will increase significantly. Where right now making an object-oriented application work requires a bit of patience and a few counterintuitive hacks, with PHP5 everything will become much easier and, to a certain extent, provide slightly better performance. Object-oriented Problems In PHP4, objects are treated like any other variable. This causes a series of rather peculiar behaviours that, to the programmer who has had the opportunity to work with OOP before, may look rather odd. As an example, take a look at the script shown in Listing 1. Following the normal rules of OOP, one would expect that when the script is executed, the value “test” will be loaded in the $value member of the a class and then printed out. This is, indeed, what happens if you run the script through PHP5. If you execute the code through PHP4, however, the interpreter will print “Not changed”. If you are familiar with OOP and have never worked with it in PHP4, this is bound to drive you insane, as there is apparently no June 2003 · PHP Architect · www.phparch.com
Object-oriented Form Management With PHP logical explanation for this behaviour. You’d probably end up either discarding object-oriented PHP, or filing a bug report with the PHP Team, and find out that this is not a bug at all, but a completely expected “feature” of the language. Because an object is just another variable, when you are inside the foreach loop, the interpreter creates a copy of each object found in the array and lets your script manipulate it. When you alter the contents of each object, you’re really working on a copy—actually a completely separate instance of your class that is completely discarded once the foreach loop ends. Thus, you never really get to touch the object that is actually stored in the array, and that’s why the $value member doesn’t change. Solving this problem is actually pretty easy—you just need to manipulate the array directly, for example by using this approach: $keys = array_keys ($b) foreach ($keys as $c) $c->value = ‘test’;
You will encounter a similar problem when calling functions, since any objects you pass by value will be copied before being handed over to the function’s code. In this case, however, all you need to do is pass your objects by reference. Passing by reference means that, instead of creating a copy of your variables for use within the function, you require the interpreter to pass a reference (something akin to a pointer) to them. As a result, they will be manipulated directly by the function’s code. Passing a variable by reference is very easy—all you have to do is prepend an ampersand (&) to its name when you declare your function: function f ($obj) function f (&$obj)
value = 'Not changed'; } } $b = array (new a); foreach ($b as $c) $c->value = 'test'; echo $b[0]->value; ?>
63
FEATURES Designing Your Dream Form Manager Before examining any code, it’s a good idea to decide what features we want our form manager to support. First of all, even though we have already decided to sacrifice design flexibility in exchange for code reusability, it would be nice to still have at least some level of control over how things are displayed. For example, we can decide that every element has a text description associated with it, and that we want to have control over how the following HTML elements are rendered: • The code before the form begins • The code before and after the text associated with an element • The code before and after the element itself • The code after the form ends How do we provide this functionality? Easy enough— we use some external (global) functions that provide it. My version of these functions, which is extremely simple, is shown in Listing 2; as you can see, it wraps the form around a table, and then puts each element in its own row. Building the Form Class The next step consists of designing the class that will actually represent the form manager. Its responsibilities include holding each of the elements of the class, handling values and validation and managing the overall rendering process (that is—making sure that the entire class is sent to the browser). Validation is a particularly important process that can significantly shorten your development time. Throughout most of your scripts, you will likely have to perform common validation operations on your controls. For example, you’ll have to ensure that a “required” field has been filled out, or that a text value entered by the user is a valid integer. With a bit of forethought, most of these operations can be built into your form management framework, so that the next time they will be handled automatically by your classes rather than having to be handled manually. The form class, which you can see in Listing 3, contains only a minimal amount of code, since it essentially acts as a container and dispatch point for working on the individual elements. A proper OOP design would require an interface for accessing and manipulating the elements through CForm. However, the PHP4 OOP framework, which lacks the concept of member protection, does not really encourage this approach and, therefore, our class will simply allow outside callers to access the $elements array directly as needed. Note that the items of the CForm::$elements array are actually named using the name of each control as their insertion takes place. This way, it will be possible to easily address them from the outside, for June 2003 · PHP Architect · www.phparch.com
Object-oriented Form Management With PHP Listing 2 1 2 3 4 5 6 7 8 9 10 11 12 13
'; } function render_post_form(&$form) { echo ''; } function render_pre_element_text(&$form, &$element) { if ($element->type !== 'submit') echo '
'; }
14 15 16 17 18 19 function render_post_element_text(&$form, &$element) 20 { 21 if ($element->type !== 'submit') 22 echo '
'; 23 } 24 25 function render_pre_element(&$form, &$element) 26 { 27 if ($element->type === 'submit') 28 echo '
example after a form has been submitted and you want to retrieve the resulting values. The Render() method, which is used to actually create the HTML code needed to display the form, automatically adds a hidden element to the form who’s name is based on the class name and a combination of characters that, while maintaining full compatibility with the HTML specifications, is unlikely to occur as a man-made element name. This mechanism is used to determine whether the form has been submitted; while there are apparently easier ways to perform the same operation (such as checking whether there are values in the $_REQUEST or $_POST arrays), they all have their limitations. For example, suppose you have two forms in the same page—a situation not all that uncommon. Since only one of them can actually be submitted, it is
function CForm($action, $method='post', $name='form') { $this->elements = array(); $this->name = $name; $this->submitted = isset ($_REQUEST["____{$this->name}_submitted"]); $this->action = $action; $this->method = $method; $this->error = false; } function AddElement (&$element) { if (isset ($this->elements[$element->name])) { user_error ("Duplicate element {$element->name} added to form {$this->name}"); return false; } $this->elements[$element->name] = $element; } function Load() { $keys = array_keys ($this->elements); foreach ($keys as $key) $this->elements[$key]->Load(); if ($this->submitted) $this->Validate(); } function Validate() { $keys = array_keys ($this->elements); foreach ($keys as $key) { if (!$this->elements[$key]->Validate()) $this->error = true; } } function GetElementValue ($name) { if ($this->submitted) return (!strcasecmp ($this->method, 'post') ? $_POST[$name] : $_GET[$name]); else return null; } function Render() { render_pre_form ($this); echo ""; render_post_form ($this); } } ?>
June 2003 · PHP Architect · www.phparch.com
65
FEATURES necessary that the form object be capable of determining whether the information posted to the script belongs to it or not. Creating a Base Element Class It’s now time to actually start designing the various elements that we’ll be using in our forms. This is where the beauty and elegance of object-oriented programming come into play: since PHP supports inheritance, we can safely create a “base” element class that performs the operations common to all the elements, and then superclass it to support the specific needs of each individual control type. Many of the features that CForm supports, such as rendering and validation, also have to be supported by CElement. In the specific case of rendering, however, the base class cannot provide any functionality, since each individual element is rendered in a different way. However, CForm expects that all elements will provide a Render() method and, therefore, we will create an abstract method that, in the absence of an override mechanism in a child class, will output an error: function Render() // abstract { user_error ("Abstract function Render() called for element {$this->name}"); }
An abstract method essentially acts as “placeholder” for a method that any child class of your base class will have to implement in order to function properly. Abstract methods ensure that a certain level of consistency is maintained in an entire class hierarchy. When a subclass does not redeclare its own version of an abstract method of its base class, the interpreter should print out an error and refuse to execute the script. In PHP4, which does not support “A single codebase abstract methods, we can only means fewer errors “simulate” this behaviour by creating a method in the base that only have to be class that outputs an error. If a fixed in a single subclass omits the Render() method, the interpreter will location.” revert to the base class, which will print out an error. In PHP5, which supports abstract methods natively, we could have simply declared Render() as abstract, rather than having to create an actual method that creates a bit overhead. The base CElement class, which is shown in Listing 4, also provides a basic validation mechanism to ensure that an element marked as “required” contains a value once the form has been submitted. This one function alone has the potential to drastically reduce all the hubbub of code that you would otherwise have to manual-
June 2003 · PHP Architect · www.phparch.com
Object-oriented Form Management With PHP Listing 4 1 name = $name; 18 $this->formname = ($formname ? $formname : $name); 19 $this->text = $text; 20 $this->defaultvalue = null; 21 } 22 23 function Render() // abstract 24 { 25 user_error ("Abstract function Render() called for element {$this->name}"); 26 } 27 28 function RenderError() 29 { 30 if ($this->error) 31 { 32 render_pre_error ($this->form, $this); 33 echo $this->error; 34 render_post_error ($this->form, $this); 35 } 36 } 37 38 function Load() 39 { 40 $this->value = trim ( $this->form->GetElementValue ( $this->formname)); 41 } 42 43 function Validate() 44 { 45 if (($this->required) && (!strcmp ('', $this->value))) 46 { 47 $this->error .= 'This value must be specified '; 48 return false; 49 } 50 else 51 return true; 52 } 53 54 function SetValue($value) 55 { 56 $this->value = $value; 57 } 58 59 function SetForm (&$form) 60 { 61 $this->form = $form; 62 } 63 }
66
FEATURES ly write for each of your controls. Another important feature of the base class is the rendering of errors. Each element has its own $error member, which contains a string with the various error messages that are associated to it. These will be typically filled out in the validation phase, but could very well be set by external routines that perform specialized tasks (for example, verifying that a user’s nickname is
Object-oriented Form Management With PHP unique while he is signing up). This way, you’ll be able to output errors in the most topical position possible in the form (for example, right below the field that caused them), making it easier for the user to understand what was wrong with his input. Creating Elements Listing 5 shows the code to handle text and password
FEATURES controls. As you can see, much of the work is taken care automatically by simply inheriting the CElement class. In fact, the only differences are in the rendering of the HTML code—which has to be provided by each element class separately—and in the initialization of the class. CPasswordElement contains even less code, since password controls behave in a way that is essentially equivalent to text boxes. Text elements, however, are at the low-end of HTML forms as far as complexity goes. For now, let’s move on to something a bit more juicy: drop-down lists. The problem here is that we’ll have to pass three pieces of information: 1) An array of arrays containing all the information to be listed in the dropdown 2) A value that lets us know where in the array we can find values 3) A value that lets us know where in the array we can find the text associated with each value.
“An abstract method acts as a ‘placeholder’ for a method that each of your child classes will implement”
Despite the higher level of complexity, the code for the CDropDownElement (Listing 6) is barely fifty lines long. You could take this even further and, for example, provide a version of CDropDownElement that interfaces directly with MySQL to grab the control’s contents from your favourite database. All you would really have to change is the constructor and, perhaps, the Load() method (unless you wanted to store all the information in a local array and use the same format as the one I have provided here). Using the Library Since we wrote the library to make things easier in the first place, using it is extremely simple. As you can see from Listing 7 (next page), we start by instantiating CForm and redirecting it to the current script, so that we can perform everything through a single file. Next, we start to create the various elements. This clearly June 2003 · PHP Architect · www.phparch.com
Object-oriented Form Management With PHP takes more effort than simply writing HTML files, but it pays off when the form is submitted since we basically have to do no work at all, with the exception of our specialized code, which in this example consists only of checking that the username and password provided by the user are correct. When it comes to actually rendering the HTML code associated with the form, a single line of PHP code will do the trick.