Export (0) Print
Expand All

CAtlRegExp Class 

This class represents a regular expression.


template <
   class CharTraits = CAtlRECharTraits
>
class CAtlRegExp

Parameters

CharTraits

A character traits object. For an example, see CAtlRECharTraitsA.

To set the regular expression, call Parse:

CAtlRegExp<> re;
re.Parse( "{[0-9]?[0-9]}:{[0-9][0-9]}" ); // Time in h:mm or hh:mm format

Parse converts the regular expression into a program for the CAtlRegExp internal pattern-matching automaton.

To match the regular expression against a string, call Match:

re.Match( "1:57", &mc );  // Returns TRUE: successful match
re.Match( "01/03", &mc ); // Returns FALSE: no match

The arguments to Match are the string to match and a CAtlREMatchContext object. In the previous regular expression, there are two match groups delimited by braces. If the regular expression matches an input string, the CAtlREMatchContext object can be used to extract the actual match group text (in this case, the hour and the minute) from the input. For more information, see CAtlReMatchContext.

Match has an optional third parameter. If it is present, Match sets it to point just beyond the last character matched in the string. This allows you to continue matching in the string from that point on.

Regular Expression Syntax

This table lists the metacharacters understood by CAtlRegExp.

Metacharacter Meaning

.

Matches any single character.

[ ]

Indicates a character class. Matches any character inside the brackets (for example, [abc] matches "a", "b", and "c").

^

If this metacharacter occurs at the start of a character class, it negates the character class. A negated character class matches any character except those inside the brackets (for example, [^abc] matches all characters except "a", "b", and "c").

If ^ is at the beginning of the regular expression, it matches the beginning of the input (for example, ^[abc] will only match input that begins with "a", "b", or "c").

-

In a character class, indicates a range of characters (for example, [0-9] matches any of the digits "0" through "9").

?

Indicates that the preceding expression is optional: it matches once or not at all (for example, [0-9][0-9]? matches "2" and "12").

+

Indicates that the preceding expression matches one or more times (for example, [0-9]+ matches "1", "13", "456", and so on).

*

Indicates that the preceding expression matches zero or more times.

??, +?, *?

Non-greedy versions of ?, +, and *. These match as little as possible, unlike the greedy versions that match as much as possible (for example, given the input "<abc><def>", <.*?> matches "<abc>" while <.*> matches "<abc><def>").

( )

Grouping operator. Example: (\d+,)*\d+ matches a list of numbers separated by commas (for example, "1" or "1,23,456").

{ }

Indicates a match group. The actual text in the input that matches the expression inside the braces can be retrieved through the CAtlREMatchContext object.

\

Escape character: interpret the next character literally (for example, [0-9]+ matches one or more digits, but [0-9]\+ matches a digit followed by a plus character). Also used for abbreviations (such as \a for any alphanumeric character; see the following table).

If \ is followed by a number n, it matches the nth match group (starting from 0). Example: <{.*?}>.*?</\0> matches "<head>Contents</head>".

Note that, in C++ string literals, two backslashes must be used: "\\+", "\\a", "<{.*?}>.*?</\\0>".

$

At the end of a regular expression, this character matches the end of the input (for example,[0-9]$ matches a digit at the end of the input).

|

Alternation operator: separates two expressions, exactly one of which matches (for example, T|the matches "The" or "the").

!

Negation operator: the expression following ! does not match the input (for example, a!b matches "a" not followed by "b").

Abbreviations

CAtlRegExp can handle abbreviations, such as \d instead of [0-9]. The abbreviations are provided by the character traits class passed in the CharTraits parameter. The predefined character traits classes provide the following abbreviations.

Abbreviation Matches

\a

Any alphanumeric character: ([a-zA-Z0-9])

\b

White space (blank): ([ \\t])

\c

Any alphabetic character: ([a-zA-Z])

\d

Any decimal digit: ([0-9])

\h

Any hexadecimal digit: ([0-9a-fA-F])

\n

Newline: (\r|(\r?\n))

\q

A quoted string: (\"[^\"]*\")|(\'[^\']*\')

\w

A simple word: ([a-zA-Z]+)

\z

An integer: ([0-9]+)

The following program uses a regular expression to extract parts of a URL.

// catlregexp_class.cpp
#include <afx.h>
#include <atlrx.h>

int main(int argc, char* argv[])
{
    CAtlRegExp<> reUrl;
    // Five match groups: scheme, authority, path, query, fragment
    REParseError status = reUrl.Parse(
        "({[^:/?#]+}:)?(//{[^/?#]*})?{[^?#]*}(?{[^#]*})?(#{.*})?" );

    if (REPARSE_ERROR_OK != status)
    {
        // Unexpected error.
        return 0;
    }

    CAtlREMatchContext<> mcUrl;
    if (!reUrl.Match(
"http://search.microsoft.com/us/Search.asp?qu=atl&boolean=ALL#results",
        &mcUrl))
    {
        // Unexpected error.
        return 0;
    }

    for (UINT nGroupIndex = 0; nGroupIndex < mcUrl.m_uNumGroups;
         ++nGroupIndex)
    {
        const CAtlREMatchContext<>::RECHAR* szStart = 0;
        const CAtlREMatchContext<>::RECHAR* szEnd = 0;
        mcUrl.GetMatch(nGroupIndex, &szStart, &szEnd);

        ptrdiff_t nLength = szEnd - szStart;
        printf_s("%d: \"%.*s\"\n", nGroupIndex, nLength, szStart);
    }

    return 0;
}

Output

0: "http"
1: "search.microsoft.com"
2: "/us/Search.asp"
3: "qu=atl&boolean=ALL"
4: "results"

Class Required header Compatibility

CAtlRegExp

<atlrx.h>

Windows 95, Windows 98, Windows 98 Second Edition, Windows Millennium Edition, Windows NT 4.0, Windows 2000, Windows XP Home Edition, Windows XP Professional, Windows Server 2003, Windows Server 2003

Community Additions

ADD
Show:
© 2014 Microsoft