elements with various combinations of
class names (#1). Then we define our class name checking function, which accepts as parameters the class name for which we will check, and the element type to check within. First, we collect all the elements of the specified type (#2). Then we set up our regular expression (#3). Note the use
of the new RegExp() constructor to compile a regular
expression based upon the class name passed to the function. This is an instance where we are unable to use a regex literal, as the class name for which we will search isn’t known in advance. We construct (and hence, compile) this expression once, in order to avoid frequent and unnecessary recompilation. Because the contents of the expression are dynamic (based upon the incoming className argument) we can realize major performance savings by handling the expression in this manner.
The regex itself matches either of the beginning of the string or a whitespace character, followed by our target class name, followed by either of a whitespace character or the end of the string. Something to notice is the use of a double-escape (\\) within new regex: \\s.
When creating literal regular expressions with terms including the backslash, we only have to provide the backslash once. However, since we're writing these backslashes within a string, we must doubly-escape them. This is a nuisance, to be sure, but one that we must be aware of when constructing regular expressions in strings rather than literals. Once compiled, using the regex to collect (#4) the matching elements is a snap using the
test() method (#5)
Preconstructing and precompiling regular expressions so that they can be re-used time
and time again is a recommended technique that affords us performance gains that cannot be ignored. Virtually all complex regular expression situations can benefit from the use of this technique. Back in the introductory section, we mentioned that the use of parentheses in regular expressions served not only to group terms for operator application, but also created what is known as captures. Let’s find out more about that.
7.4
Capturing matching segments
The height of usefulness with respect to regular expressions is realized when we capture the results that are found so that we can do something with those results. Simply determining if a string matches a pattern is an obvious first step and often all that we need, but determining what was matched is also useful in many situations.
7.4.1
Performing simple captures
Take a situation in which we want to extract a value that’s embedded in a complex string. A good example of such a string might be the manner in which opacity values are specified for ©Manning Publications Co. Please post comments or corrections to the Author Online forum: http://www.manning-sandbox.com/forum.jspa?forumID=431
Licensed to
161
Internet Explorer. Rather than the conventional opacity rule with a numerical value employed by the other browsers, IE8 and earlier use a rule such as: filter:alpha(opacity=50);
In the example, of Listing 7.4 we extract the opacity value out of this filter string.
Listing 7.4: A simple function for capturing an embedded value
#1 #1
<script> function getOpacity(elem) { var filter = elem.style.filter; return filter ? #2 filter.indexOf("opacity=") >= 0 ? (parseFloat(filter.match(/opacity=([^)]*)/)[1]) / 100) + "" : "" : elem.style.opacity; } window.onload = function() { assert( getOpacity(document.getElementById("opacity")) == "0.5", "The opacity of the element has been obtained."); }; #1 Defines test subject #2 Decides what to return
We define an element that specifies both styles for opacity (one for standards-compliant browsers, and one for IE) that we’ll use as a test subject (#1). Then we create a function that will return the opacity value as the standards-defined value between 0.0 and 1.0, regardless of how it was defined. The opacity parsing code may seem a little bit confusing at first (#2), but it's not too bad once we break it down. To start with, we need to determine if a filter property even exists
for us to parse. If not, we try to access the opacity style property instead. If the filter
property is resident, we need to verify that it will contain the opacity string that we're looking for. We do that with the indexOf() call.
At this point we can get down to the actual opacity value extraction. The match()
method of regular expressions returns an array of captured values if a match is found, or
null if no match is found. In this case we can be confident that there will be a match, as we
already determined that with the indexOf() call.
The array returned by match always includes the entire match in the first index, and then
each subsequent capture following. Remember that the captures are defined by parentheses in the regular expression. Thus, when we match the opacity value, the value is actually
contained in the [1] position of the array as the only capture we specified in our regex was created by the parentheses that we embedded after the opacity= portion of the regex.
©Manning Publications Co. Please post comments or corrections to the Author Online forum: http://www.manning-sandbox.com/forum.jspa?forumID=431
Licensed to
162
This example used a local regular expression and the match() method. Things change a
bit when we use global expressions. Let’s see how.
7.4.2
Matching using global expressions
As we saw in the previous section, using a local regular expression (one without the global flag) with the String object’s match() methods returns an array containing the entire matched string, along with any matched captures in the operation.
But when we supply a global regular expression (one with the g flag included), match()
returns something rather different. It’s still an array of results, but in the case of a global
regular expression, which matches all possibilities in the candidate string rather than just the first match, the array returned contains the global matches; captures within each match are not returned in this case. We can see this in action in the code and tests of listing 7.5.
Listing 7.5: Differences between a global and local search with match() <script type="text/javascript"> var html = "Hello world!
"; var results = html.match(/<(\/?)(\w+)([^>]*?)>/); assert(results[0] assert(results[1] assert(results[2] assert(results[3]
== == == ==
//#1
"", "The entire match."); "", "The (missing) slash."); "div", "The tag name."); " class='test'", "The attributes.");
var all = html.match(/<(\/?)(\w+)([^>]*?)>/g); assert(all[0] assert(all[1] assert(all[2] assert(all[3] assert(all[4] assert(all[5]
== == == == == ==
//#2
"
", "Opening div tag."); "", "Opening b tag."); "", "Closing b tag."); "", "Opening i tag."); "", "Closing i tag."); "
", "Closing div tag.");
#1 Matches using a local regex #2 Matches using a global regex
We can see that when we do a local match (#1), a single instance is matched and the captures within that match are also returned, but when we use a global match (#2), what’s returned is the list of matches. If captures are important to us, we can regain this functionality, while still performing a global search, by using the regular expression’s exec() method. This method can be
repeatedly called against a regular expression, causing it to return the next matched set of information every time it’s called. A typical pattern for how it can be used is shown in the code of Listing 7.6.
©Manning Publications Co. Please post comments or corrections to the Author Online forum: http://www.manning-sandbox.com/forum.jspa?forumID=431
Licensed to
163
Listing 7.6: Using the exec method to do both capturing and a global search <script type="text/javascript"> var html = "Hello world!
"; var tag = /<(\/?)(\w+)([^>]*?)>/g, match; var num = 0; while ((match = tag.exec(html)) !== null) { assert(match.length == 4, "Every match finds each tag and 3 captures."); num++; }
#1
assert(num == 6, "3 opening and 3 closing tags found.");
#1 Repeatedly calls exec()
In this example, we repeatedly call the exec() method (#1) which retains state from its
previous invocation so that each subsequent call progresses to the next global match. Each call returns the next match and its captures. By using either match() or exec(), we can always find the exact matches (and
captures) that we're looking for. But we discover that we’ll need to dig further when we need to begin referring back to the captures themselves within the regex.
7.4.3
Referencing captures
There are two ways in which we can refer back to portions of a match that we’ve captured: one within the match, itself, and one within a replacement string (where applicable). For example, let's revisit the match in Listing 7.6 (in which we match an opening or closing HTML tag) and modify it to also match the inner contents of the tag itself, as shown in Listing 7.7.
Listing 7.7: Using back-references to match the contents of an HTML tag. <script type="text/javascript"> var html = "Hello world!"; var pattern = /<(\w+)([^>]+)>(.*?)<\/\1>/g;
#1
var match = pattern.exec(html); assert(match[0] == "The entire assert(match[1] == assert(match[2] == assert(match[3] ==
"Hello", tag, start to finish."); "b", "The tag name."); " class='hello'", "The tag attributes."); "Hello", "The contents of the tag.");
match = pattern.exec(html); assert(match[0] == "world!", "The entire tag, start to finish."); ©Manning Publications Co. Please post comments or corrections to the Author Online forum: http://www.manning-sandbox.com/forum.jspa?forumID=431
Licensed to
164
assert(match[1] == "i", "The tag name."); assert(match[2] == "", "The tag attributes."); assert(match[3] == "world!", "The contents of the tag.");