Regular expressions for finding text

You can perform sophisticated find and replace operations in Microsoft Expression Web by using regular expressions. Regular expressions are useful when you do not know the exact text or code you are looking for, or when you are looking for all occurrences of strings of text or code with one or more similarities.

A regular expression is a pattern of text that describes one or more variations of text or code that you want to find. A regular expression consists of specific characters—for example, the letters "a" through "z"—and special characters that describe the pattern of text—for example, an asterisk (*). For example, to find all variations of "page" in your site, you can search for "page*." If you do so, Expression Web finds all instances of "page," "pages," "pager," and any other words that begin with "page" in your site.

When you use regular expressions in your searches, there are specific rules that control which combination of characters perform specific matches. Each regular expression or combination of regular expressions is referred to as syntax. You can use multiple regular expressions in one syntax to precisely target your search.

To use regular expressions, see Edit multiple pages with find and replace.

Regular expressions syntax

Syntax Expression Description

.

Any character   acts as a wild card to match any single printing or non-printing character with the exception of the newline (\n)character.

For example, the regular expression c.t matches the strings cat, c t, cot, but not cost. In this example, the period (.) is a wild card for a single character. It appears between the letters 'c' and 't', so any single character between the characters 'c' and 't' will match the expression—even if it is a space.

*

Maximal—zero or more   matches zero or more occurrences of a character that precede the expression, matching as many characters as possible.

The regular expression .* matches zero or more occurrences of one character.

For example, the regular expression b.*k matches book, back, black, blank, and buck. In this example, we combine the period (.) with the asterisk (*) to make one syntax. The period (.) appears immediately before the asterisk (*) expression. The asterisk (*) matches zero or more occurrences of any character between 'b and 'k'. The period (.) acts as a wild card for the characters between 'b' and 'k'. In this example, it means that any character between 'b' and 'k' can be repeated.

+

Maximal—one or more   matches one or more occurrences of a character that precede the expression, matching as many characters as possible.

The regular expression .+ matches one or more occurrence of one character.

For example, the regular expression bo+. matches bob, book, and boot. In this example, we combine the period (.) with the plus sign (+) to make one syntax. The period (.) appears immediately after the plus sign (+) expression. The plus sign (+) matches one or more occurrences of the letter 'o'. The period (.) acts as a wild card for the last character of each word, which, in this example are 'b', 'k', and 't'.

@

Minimal—zero or more   matches zero or more occurrences of a character that precede the expression, matching as few characters as possible.

The regular expression .@ means match zero or more occurrences of one character.

For example, the regular expression a.@x matches 'abx' within 'abxbxb' and 'acx' within 'acxcxc'. In this example, we combine the period (.) with the at sign (@) to make one syntax. The period (.) appears immediately before the at sign (@) expression. The at sign (@) matches zero or more occurrences of any character between 'a and 'x'. In this example, the period (.) acts as a wild card for the characters 'b' and 'c' between the characters 'a' and 'x'.

#

Minimal—one or more   matches one or more occurrences of a character that precede the expression, matching as few characters as possible.

For example, the regular expression si.#er matches 'sicker' or 'silkier'. In this example, we combine the period (.) with the sharp sign (#) to make one syntax. The period (.) appears immediately before the sharp sign (#) expression. The sharp sign (#) matches one or more occurrences of any character between 'si' and 'er'. The period (.) acts as a wild card for the characters 'c' and 'k' in the word sicker, and 'l', 'k', and 'i' in the word silkier.

[ ]

Set of characters   matches any one of the characters within the brackets ([ ]). You can specify ranges of characters by using a hyphen (-), as in [a-z].

Examples:

  • The regular expression c[aou]t matches cat, cot, and cut, but not cet or cit.

  • The regular expression [0-9] means match any digit. You can specify multiple ranges of letters as well.

  • The regular expression [A-Za-z] means match all upper and lower case letters.

^

Beginning of line   anchors the match to the beginning of a line.

For example, the regular expression ^When in matches the any string that begins with "When in" and that also appears at the beginning of a line, such as "When in the course of human events" or "When in town, call me". Whereas, this regular expression does not match "What and when in the course of human events" if it appears at the beginning of a line.

$

End of line   anchors the match to the end of a line.

For example, the regular expression professional$ matches the end of the string "He is a professional" but not the string "They are a group of professionals".

^^

Beginning of file   anchors the match to the beginning of a file. Works only when searching for text in source code or in text files.

For example, to match the first HTML tag at the beginning of a file, use the following regular expression: ^^

$$

End of file   anchors the match to the end of a file. Works only when searching for text in source code or in text files.

For example, to match the last HTML tag at the end of a file (with no spaces following the tag), use the following regular expression: $$

|

Or   indicates a choice between two items, thereby matching the expression before or after the OR symbol (|).

For example, the regular expression (him|her) matches the following occurrences "it belongs to him" or "it belongs to her" but it does not match the line "it belongs to them."

\

Escape special character   matches the character following the back slash ( \ ). This allows you to find characters that are used in the regular expression syntax, such as a left curly brace ({) or a caret (^) or some other special character.

For example, you can use \$ to match the dollar sign character ($) rather than implementing the regular expression to 'anchor to end of a line'. Similarly, you can use the expression \. to match the period (.) character rather than match any single character, which is what the period (.) regular expression does.

{}

Tagged expression   tags the text matched by the enclosed expression. You can match another occurrence of the tagged text in a Find expression or insert the tagged text in a Replace expression using \N.

For example, suppose you are looking to find two duplicate, consecutive words. To search, use the following expression: {.#} \1

With the assumption that the consecutive words are separated by a single space, you'll want to add a space between the right curly brace (}) and the back slash ( \ ).

In this example, we combine the sharp sign (#) and the period (.) with the curly braces ({}) to make one syntax. In this expression, .# represents any consecutive characters. Since this portion of the expression is surrounded by curly braces ({}), the consecutive characters will be tagged and can be referred to as \1. This expression will find any consecutive characters followed by a space, followed by those exact same consecutive characters.

\N

Nth tagged expression    n a Find expression, \N matches the text matched by the Nth tagged expression, where N is a number from 1 to 9.

In a Replace expression, \N inserts the text matched by the Nth tagged expression where N is a number from 1 to 9. \0 inserts the text matched by the entire Find expression.

For example, suppose you want to find two duplicate, consecutive words and replace them with a single word. To search, use the following expression: {.#} \l

With the assumption that the consecutive words are separated by a single space, you'll want to add a space between the right curly brace (}) and the back slash ( \ ). In this example, we combined the sharp sign (#) and the period (.) with the curly braces ({}) to make one syntax.

To replace, use the following expression: \l

\1 represents what was found in the first pair of curly braces in the find string. By using \1 in the replace action, you essentially replace the duplicate, consecutive words with a single copy of the word.

( )

Group expression    marks the beginning and end of a sub expression.

A sub expression is a regular expression that you enclose in parenthesis ( ), such as the expression that follows: (ha)+ In this example, we combine the plus sign (+) with the parenthesis ( ) group expression to make one syntax. The sub expression is (ha) because it is encapsulated within the parenthesis ( ). When you add the plus sign (+), the expression enables you to find repeating pairs of letters. The plus sign (+) represents one or more occurrences of 'ha'.

This expression matches the following occurrences 'haha' and 'hahaha'.

~x

Prevent match    prevents a match when x appears at this point in the expression.

For example, the regular expression real~(ity) matches the "real" in "realty" and "really", but prevents the match to "real" in "reality".

\n

Line break    matches a new line in Code view, or a <br> in Design view.

The syntax (\n), is a shorthand approach to enable you to match all line breaks.

\t

Tab    matches a single tab character.

For example, if you want to find all single tabbed characters at the beginning of a line, the regular expression would look like the following:

^\t+

In this example, we combine the caret (^) and the plus sign (+) with the tab (\t) to make one syntax. The caret (^) that precedes the single tab character expression, anchors the match to all tabbed characters at the beginning of the line. The plus sign (+) represents the matching of one or more tab characters.

[^]

Any one character not in the set    matches any character that is not in the set of characters that follows the caret (^).

For example, to match any character except those in the range, use the caret (^) as the first character after the opening bracket. The expression [^269A-Z] will match any characters except 2, 6, 9, and any upper case alphabetical characters.

n

Repeat expression    matches n occurrences of the expression that precedes the caret (^).

For example, with n equaling 4, the expression [0-9]^4 matches any 4-digit sequence. In this example, we combine the set of characters ([ ]) syntax with the repeat (^n) syntax to demonstrate a more realistic use of regular expressions.

:a

Alphanumeric character    matches the expression [a-zA-Z0-9].

You can use the following expression: [a-zA-Z0-9] to match one occurrence of a letter (upper case or lower case) or number. Also known as alphanumeric occurrences. You can use the shorthand expression :a for all instances of [a-zA-Z0-9].

:b

White space    matches any white spaces in code or text.

For example, to match a single white space character at the beginning of a line, use the following regular expression: ^:b

:c

Alphabetic character    matches the expression [a-zA-Z]. When you use this expression, it enables you to match all upper or lower case letters.

You can use the shorthand expression :c for all instances of [a-zA-Z].

:d

Decimal digit    matches the expression [0-9]. This expression enables you to match any digit.

For example, suppose you want to find a social security number in a text file. The format for U.S. social security numbers is 999-99-9999. :d^3-:d^2-:d^4 or, by using [0-9], the same resulting expression:[0-9]^3-[0-9]^2-[0-9]^4]

You can use the shorthand expression :d for all instances of [0-9].

:h

Hexadecimal digit    matches the expression [0-9a-fA-F]+

Use a this expression when you want to match a hexadecimal combination of any upper or lower case letters between 'A' and 'F', and any numbers.

For example, suppose the pages in your site have multiple different background colors and you want to change the color of those pages to black, such as 000000. However, you do not know what the hexadecimal numbers are for the existing colors. Use the following regular expression to find all existing hexadecimal numbers:

\#:h

You could use [0-9a-fA-F] to search, but in this example we combine the back slash (\) and the sharp sign (#) with the hexadecimal digit (:h) syntax. \# matches a non-expression sharp sign (#) and :h matches any sequence of hexadecimal characters.

To replace the existing hexadecimal numbers, type the hexadecimal number of the background color that you want: 000000

:i

Identifier    matches the expression [a-zA-Z_$][a-zA-Z0-9_$]*

When working with code, if you want to match all program identifiers, you can use the shorthand expression :i to replace having to type the lengthy expression above.

:n

Rational number    matches the expression ([0-9]+\.[0-9]*)|([0-9]*\.[0-9]+)|([0-9]+)

If you want to match all whole numbers that contain a decimal point, you can use the shorthand expression :n to replace having to type the lengthy expression above.

:q

Quoted string    matches the expression ("[~"]*")|('[~']*')

If you want to match all quotes surrounded by quotation marks, you can use the shorthand expression :q to replace having to type the lengthy expression above.

:w

Alphabetic string    matches the expression [a-zA-Z]+

This syntax is a shorthand approach to enable you to match one or more alphabetical characters, either lower case or upper case.

:z

Decimal integer   matches the expression [0-9]+

This syntax is a shorthand approach to enable you match any number from zero or more.

See also

Tasks

Edit multiple pages with find and replace
Find and replace tags
Set HTML rules for finding text
Use the thesaurus

Send feedback about this topic to Microsoft. © 2011 Microsoft Corporation. All rights reserved.