Pages

Set of characters

Another useful variation on basic Perl regular expressions is the one represented by the usage of set of characters.

If you can specify your pattern as a bunch of letters where some are fixed and other are varying, we could put the varying ones in a set delimited by square brackets.

Say that we want to check if our string has in it on of these three words: dark, dirk, dork. One way of doing it is considering that they are almost the same. Actually, three letters are exactely the same, and one, the second, is a choice among three different ones.

We can formalize it in this way:

$pattern = "d[aio]rk";
if(/$pattern/) {
print "found!\n";
}

In a set of choices the caret (^) metacharacter (already seen as starting anchor) assumes the sense of a negator.

Our search now is about a pattern starting with "d", ending by "rk", and with just a character in the middle that could be everything "i" or "o". So we won't accept "dirk" or "dork", but we will be cool with "dark" or even "durk":

$pattern = "d[^io]rk";

if(/$pattern/) {
print "found!\n";
}

In a set of character we can specify a range. For instance, if we are looking again for our d.rk word, but now we relax the requisites, letting go anything as second letter that would be a lowecase alphabetical, we could write the check in this way:

$pattern = "d[a-z]rk";
if(/$pattern/) {
print "- found!\n";
}

The second letter now could be anything ranging from a to z (lowercase).

For common choices Perl makes available shortcuts: \d is expanded to [0-9]; \w to [0-9A-Za-z_]; \s to [ \t\n\r]. The uppercase version is a negation. So, for instance, \D means anything but a digit.

Chapter 5 of Beginning Perl by Simon Cozens focuses on regular expressions.

No comments:

Post a Comment