OWLRegexp
Synopsis
#include <owl/regexp.h>
OWLRegexp re(".*\\.doc");// Matches filename with suffix ".doc"
Description
Class OWLRegexp represents a regular expression. The constructor
"compiles" the expression into a form that can be used more efficiently. The
results can then be used for string searches using class string.
The regular expression (RE) is constucted as follows:
The following rules determine one-character REs that match a single
character:
| 1.1 |
Any character that is not a special character (to be defined) matches
itself. |
| 1.2 |
A backslash (\) followed by any special character matches the
literal character itself. I.e., this "escapes" the special character. |
| 1.3 |
The "special characters" are: |
+ * ? . [ ] ^ $
| 1.4 |
The period (.) matches any character except the newline. E.g.,
".umpty" matches either "Humpty" or "Dumpty." |
| 1.5 |
A set of characters enclosed in brackets ([]) is a one-character
RE that matches any of the characters in that set. E.g., "[akm]"
matches either an "a", "k", or "m". A range of characters
can be indicated with a dash. E.g., "[a-z]" matches any lower-case
letter. However, if the first character of the set is the caret (^),
then the RE matches any character except those in the set. It does
not match the empty string. Example: [^akm] matches any
character except "a", "k", or "m". The caret
loses its special meaning if it is not the first character of the set. |
The following rules can be used to build a multicharacter RE.
| 2.1 |
A one-character RE followed by an asterisk (*) matches zero
or more occurrences of the RE. Hence, [a-z]* matches zero or more lower-case
characters. |
| 2.2 |
A one-character RE followed by a plus (+) matches one
or more occurrences of the RE. Hence, [a-z]+ matches one or more lower-case
characters. |
| 2.3 |
A question mark (?) is an optional element. The preceeding
RE can occur zero or once in the string -- no more. E.g. xy?z
matches either xyz or xz. |
| 2.4 |
The concatenation of REs is a RE that matches the corresponding concatenation
of strings. E.g., [A-Z][a-z]* matches any capitalized word. |
Finally, the entire regular expression can be anchored to match only the
beginning or end of a line:
| 3.1 |
If the caret (^) is at the beginning of the RE, then the matched
string must be at the beginning of a line. |
| 3.2 |
If the dollar sign ($) is at the end of the RE, then the matched
string must be at the end of the line. |
The following escape codes can be used to match control characters:
| \b |
backspace |
| \e |
ESC (escape) |
| \f |
formfeed |
| \n |
newline |
| \r |
carriage return |
| \t |
tab |
| \xddd |
the literal hex number 0xdd |
| \ddd |
the literal octal number ddd |
| \^C |
Control code. E.g. \^D is "control-D" |
Example
#include <owl/regexp.h>
#include <string>
main(){
string aString("Hark! Hark! the lark");
// A regular expression matching any lower-case word
// starting with "l":
OWLRegexp reg("l[a-z]*");
cout << reg.match(aString) << endl; // Prints "1"
}
Public Constructors
OWLRegexp(const char * pat);
Construct a regular expression from the pattern given by pat. The
status of the results can be found by using member function status().
OWLRegexp(const string & pat);
Construct a regular expression from the pattern given by pat. The
status of the results can be found by using member function status().
Public Destructor
~OWLRegexp();
Destructor. Releases any allocated memory.
Assignment Operators
OWLRegexp&
operator=(const string &);
Recompiles self to the pattern given by pat. The status of the results
can be found by using member function status().
OWLRegexp&
operator=(const char * pat);
Recompiles self to the pattern given by pat. The status of the results
can be found by using member function status().
Public Member Functions
int
exact_match( const char * str ) const;
Returns 1 if the whole string str
matches the regular expression compiled in self, or 0
otherwise. If an invalid regular expression is used for the search, -1 will be returned.
int
exact_match( const string& str ) const;
Returns 1 if the whole string str
matches the regular expression compiled in self, or 0
otherwise. If an invalid regular expression is used for the search, -1 will be returned.
int
match( const char * str ) const;
Returns 1 if there is an instance in the string
str that matches the regular expression compiled in self,
or 0 if there is no such match. If an invalid
regular expression is used for the search, -1
will be returned.
int
match( const string& str ) const;
Returns 1 if there is an instance in the string
str that matches the regular expression compiled in self,
or 0 if there is no such match. If an invalid
regular expression is used for the search, -1
will be returned.
Status
status() const;
Returns the status of the regular expression:
| Status |
Meaning |
| OWLRegexp::Ok |
No errors |
| OWLRegexp::Illegal |
Pattern was illegal |
string
substring( const char * str, unsigned short start=0 ) const;
Returns the first substring of str
starting after index start that matches the regular expression.
If there is no such substring, then
the empty string is returned.
string
substring( const string & str, unsigned short start=0 ) const;
Returns the first substring of str
starting after index start that matches the regular expression.
If there is no such substring, then
the empty string is returned.