OWLRegexp

Synopsis

#include <owl/regexp.h>
OWLRegexp re(".*\\.doc");// Matches filename with suffix ".doc"

Description

Class OWLRegexp represents a regular expression. The constructor "compiles" the expression into a form that can be used more efficiently. The results can then be used for string searches using class string.

The regular expression (RE) is constucted as follows:

The following rules determine one-character REs that match a single character:
 
1.1  Any character that is not a special character (to be defined) matches itself.
1.2  A backslash (\) followed by any special character matches the literal character itself. I.e., this "escapes" the special character.
1.3  The "special characters" are:

          +     *     ?     .     [     ]     ^     $
1.4  The period (.) matches any character except the newline. E.g., ".umpty" matches either "Humpty" or "Dumpty."
1.5  A set of characters enclosed in brackets ([]) is a one-character RE that matches any of the characters in that set. E.g., "[akm]" matches either an "a", "k", or "m". A range of characters can be indicated with a dash. E.g., "[a-z]" matches any lower-case letter. However, if the first character of the set is the caret (^), then the RE matches any character except those in the set. It does not match the empty string. Example: [^akm] matches any character except "a", "k", or "m". The caret loses its special meaning if it is not the first character of the set.

The following rules can be used to build a multicharacter RE.
 
2.1  A one-character RE followed by an asterisk (*) matches zero or more occurrences of the RE. Hence, [a-z]* matches zero or more lower-case characters.
2.2  A one-character RE followed by a plus (+) matches one or more occurrences of the RE. Hence, [a-z]+ matches one or more lower-case characters.
2.3  A question mark (?) is an optional element. The preceeding RE can occur zero or once in the string -- no more. E.g. xy?z matches either xyz or xz.
2.4  The concatenation of REs is a RE that matches the corresponding concatenation of strings. E.g., [A-Z][a-z]* matches any capitalized word.

Finally, the entire regular expression can be anchored to match only the beginning or end of a line:
 
3.1  If the caret (^) is at the beginning of the RE, then the matched string must be at the beginning of a line.
3.2  If the dollar sign ($) is at the end of the RE, then the matched string must be at the end of the line.

The following escape codes can be used to match control characters:


Example

#include <owl/regexp.h>
#include <string>

main(){
  string aString("Hark! Hark! the lark");

  // A regular expression matching any lower-case word
  // starting with "l":
  OWLRegexp reg("l[a-z]*");

  cout << reg.match(aString) << endl;  // Prints "1"
}

Public Constructors

OWLRegexp(const char * pat);
OWLRegexp(const string & pat);

Public Destructor

~OWLRegexp();

Assignment Operators

OWLRegexp&
operator=(const string &);
OWLRegexp&
operator=(const char * pat);

Public Member Functions

int
exact_match( const char * str ) const;

int
exact_match
( const string& str ) const;

int
match( const char * str ) const;
int
match( const string& str ) const;
Status
status() const;
Status Meaning
OWLRegexp::Ok No errors
OWLRegexp::Illegal Pattern was illegal

string
substring( const char * str, unsigned short start=0 ) const;

     Returns the first substring of str starting after index start that matches the regular expression. If there is no such substring, then
     the empty string is returned.

string
substring( const string & str, unsigned short start=0 ) const;

     Returns the first substring of str starting after index start that matches the regular expression. If there is no such substring, then
     the empty string is returned.