Alternation and Grouping

Alternation allows use of the | (pipe) character to allow a choice between two or more alternatives, a bit similar to or in Boolean expressions. Expanding the chapter heading Regular Expression, you can expand it to cover more than just chapter headings – for example sections as well. However, it's not as straightforward as you might think. You might think that the following expressions match one or two digits after either 'Chapter' or 'Section' , between the beginning and ending of a line:

/^Chapter|Section [1-9][0-9]?$/

Unfortunately, what happens is that the Regular Expressions shown above will have two different parts, so this will be equal to getting a match on either of these two expressions:

/^Chapter/

/Section [1-9][0-9]?$/

So it will match either the word 'Chapter' at the beginning of a line, or 'Section' and 1-2 numbers at the end of the line.

You can use parentheses to limit the scope of the alternation, that is, make sure that the alternation applies only to the two words, 'Chapter' and 'Section'. However, parentheses are tricky as well, because they are also used to create sub expressions. By taking the Regular Expressions shown above and adding parentheses in the appropriate places, you can make the Regular Expression match either 'Chapter 1' or 'Section 3'.

/^(Chapter|Section) [1-9][0-9]?$/

These expressions work properly except that a by-product occurs. Placing parentheses around 'Chapter|Section' establishes the proper grouping, but it also causes either of the two matching words to be captured for future use. So a sub match is captured. In this example we really do not need that sub match.

In the examples shown above, all you really want to do is use the parentheses for grouping a choice between the words 'Chapter' or 'Section'. You do not necessarily want to refer to that match later. We recommend that unless you really need to capture sub matches, do not use them. Your Regular Expressions will be more efficient since they will not have to take the time and memory to store those sub matches.

You can use ?: before the Regular Expression pattern inside the parentheses to prevent the match from being saved for possible later use. The following modification of the Regular Expressions shown above provides the same capability without saving the sub match.

/^(?:Chapter|Section) [1-9][0-9]?$/

There are times you'd like to be able to test for a pattern without including that text in the match. For instance, you might want to match the protocol in a URL (like http or ftp), but only if that URL ends with .com. Or maybe you want to match the protocol only if the URL does not end with .edu. In cases like those, you'd like to "look ahead" and see how the URL ends. A lookahead assertion is handy here.

There are two non-capturing meta characters used for lookahead matches:

A positive lookahead, specified using ?=, matches the search string at any point where a matching Regular Expression pattern in parentheses begins.

A negative lookahead, specified using ?!, matches the search string at any point where a string not matching the Regular Expression pattern begins.

If you want to search for the protocol in a url like "http://www.confirmit.com", but only if it ends with .com:

/^[^:]+(?=.*\.com$)/

Because of the anchor ^, the match is found at the beginning of the string. Then the first part

[^:] +

searches for one or more characters different from :. One or more because of the quantifier +, different from : because of the bracket expression [^:]

Then there is a positive lookahead:

(?=.*\.com$)

which searches through the string to find a match for

.* \.com$

Because of the anchor $ this has to be at the end of the string. . matches any character except the newline character, and because of the quantifier * we can have 0 or more of these characters before the last part, which is the character . (which has to be escaped with backslash \) followed by com – i.e. ".com".

But the match will be for the first part of the string, i.e. "http" in "http://www.confirmit.com".

Similarly, the expression

/^[^:]*(?!.*\.edu$)/

will search for the first characters until : is reached in a string not ending with ".edu".

Character	Description
(pattern)	Matches pattern and captures the match. The captured match can be retrieved from the resulting Matches collection, using the $0...$9 properties in JScript .NET.
(?:pattern)	Matches pattern but does not capture the match (it is not stored for possible later use).
(?=pattern)	Positive look-ahead matches the search string at any point where a string-matching pattern begins. The match is not captured for possible later use. After a match occurs, the search for the next match begins immediately following the last match, not after the characters that comprised the look-ahead.
(?!pattern)	Negative look-ahead matches the search string at any point where a string not matching pattern begins. The match is not captured for possible later use. After a match occurs, the search for the next match begins immediately following the last match, not after the characters that comprised the look-ahead.

Forsta Scripting Support

Articles in this section