This Thread: Set of characters

Another useful variation on basic Perl regular expressions is the one represented by the usage of set of characters.

If you can specify your pattern as a bunch of letters where some are fixed and other are varying, we could put the varying ones in a set delimited by square brackets.

Say that we want to check if our string has in it on of these three words: dark, dirk, dork. One way of doing it is considering that they are almost the same. Actually, three letters are exactely the same, and one, the second, is a choice among three different ones.

We can formalize it in this way:


$pattern = "d[aio]rk";
if(/$pattern/) {
    print "found!\n";
}

In a set of choices the caret (^) metacharacter (already seen as starting anchor) assumes the sense of a negator.

Our search now is about a pattern starting with "d", ending by "rk", and with just a character in the middle that could be everything "i" or "o". So we won't accept "dirk" or "dork", but we will be cool with "dark" or even "durk":


$pattern = "d[^io]rk";

if(/$pattern/) {
    print "found!\n";
}

In a set of character we can specify a range. For instance, if we are looking again for our d.rk word, but now we relax the requisites, letting go anything as second letter that would be a lowecase alphabetical, we could write the check in this way:


$pattern = "d[a-z]rk";
if(/$pattern/) {
    print "- found!\n";
}

The second letter now could be anything ranging from a to z (lowercase).

For common choices Perl makes available shortcuts: \d is expanded to [0-9]; \w to [0-9A-Za-z_]; \s to [ \t\n\r]. The uppercase version is a negation. So, for instance, \D means anything but a digit.

Chapter 5 of Beginning Perl by Simon Cozens focuses on regular expressions.

This Thread

Pages

Set of characters

No comments:

Post a Comment

Labels

Recent comments

Tutorials - Examples - Books

Have a look at these blogs ...