Inform 7 Home Page / Documentation
§20.9. Summary of regular expression notation
MATCHING
Positional restrictions
|
^ |
Matches (accepting no text) only at the start of the text |
|
$ |
Matches (accepting no text) only at the end of the text |
|
\b |
Word boundary: matches at either end of text or between a \w and a \W |
|
\B |
Matches anywhere where \b does not match |
Backslashed character classes
|
\char |
If char is other than a-z, A-Z, 0-9 or space, matches that literal char |
|
\\ |
For example, this matches literal backslash "\" |
|
\n |
Matches literal line break character |
|
\t |
Matches literal tab character (but use this only with external files) |
|
\d |
Matches any single digit |
|
\l |
Matches any lower case letter (by Unicode 4.0.0 definition) |
|
\p |
Matches any single punctuation mark: . , ! ? - / " : ; ( ) [ ] { } |
|
\s |
Matches any single spacing character (space, line break, tab) |
|
\u |
Matches any upper case letter (by Unicode 4.0.0 definition) |
|
\w |
Matches any single word character (neither \p nor \s) |
|
\D |
Matches any single non-digit |
|
\L |
Matches any non-lower-case-letter |
|
\P |
Matches any single non-punctuation-mark |
|
\S |
Matches any single non-spacing-character |
|
\U |
Matches any non-upper-case-letter |
|
\W |
Matches any single non-word-character (i.e., matches either \p or \s) |
Other character classes
|
. |
Matches any single character |
|
<...> |
Character range: matches any single character inside |
|
<^...> |
Negated character range: matches any single character not inside |
Inside a character range
|
e-h |
Any character in the run "e" to "h" inclusive (and so on for other runs) |
|
>... |
Starting with ">" means that a literal close angle bracket is included |
|
\ |
Backslash has the same meaning as for backslashed character classes: see above |
Structural
|
| |
Divides alternatives: "fish|fowl" matches either |
|
(?i) |
Always matches: switches to case-insensitive matching from here on |
|
(?-i) |
Always matches: switches to case-sensitive matching from here on |
Repetitions
|
...? |
Matches "..." either 0 or 1 times, i.e., makes "..." optional |
|
...* |
Matches "..." 0 or more times: e.g. "\s*" matches an optional run of space |
|
...+ |
Matches "..." 1 or more times: e.g. "x+" matches any run of "x"s |
|
...{6} |
Matches "..." exactly 6 times (similarly for other numbers, of course) |
|
...{2,5} |
Matches "..." between 2 and 5 times |
|
...{3,} |
Matches "..." 3 or more times |
|
....? |
"?" after any repetition makes it "lazy", matching as few repeats as it can |
Numbered subexpressions
|
(...) |
Groups part of the expression together: matches if the interior matches |
|
\1 |
Matches the contents of the 1st subexpression reading left to right |
|
\2 |
Matches the contents of the 2nd, and so on up to "\9" (but no further) |
Unnumbered subexpressions
|
(# ...) |
Comment: always matches, and the contents are ignored |
|
(?= ...) |
Lookahead: matches if the text ahead matches "...", but doesn't consume it |
|
(?! ...) |
Negated lookahead: matches if lookahead fails |
|
(?<= ...) |
Lookbehind: matches if the text behind matches "...", but doesn't consume it |
|
(?<! ...) |
Negated lookbehind: matches if lookbehind fails |
|
(> ...) |
Possessive: tries to match "..." and if it succeeds, never backtracks on this |
|
(?(1)...) |
Conditional: if \1 has matched by now, require that "..." be matched |
|
(?(1)...|...) |
Conditional: ditto, but if \1 has not matched, require the second part |
|
(?(?=...)...|...) |
Conditional with lookahead as its condition for which to match |
|
(?(?<=...)...|...) |
Conditional with lookbehind as its condition for which to match |
IN REPLACEMENT TEXT
|
\char |
If char is other than a-z, A-Z, 0-9 or space, expands to that literal char |
|
\\ |
In particular, "\\" expands to a literal backslash "\" |
|
\n |
Expands to a line break character |
|
\t |
Expands to a tab character (but use this only with external files) |
|
\0 |
Expands to the full text matched |
|
\1 |
Expands to whatever the 1st bracketed subexpression matched |
|
\2 |
Expands to whatever the 2nd matched, and so on up to "\9" (but no further) |
|
\l0 |
Expands to \0 converted to lower case (and so on for "\l1" to "\l9") |
|
\u0 |
Expands to \0 converted to upper case (and so on for "\u1" to "\u9") |