Inform 7 Home Page / Documentation
§20.9. Summary of regular expression notation
MATCHING
Positional restrictions
^ |
Matches (accepting no text) only at the start of the text |
$ |
Matches (accepting no text) only at the end of the text |
\b |
Word boundary: matches at either end of text or between a \w and a \W |
\B |
Matches anywhere where \b does not match |
Backslashed character classes
\char |
If char is other than a-z, A-Z, 0-9 or space, matches that literal char |
\\ |
For example, this matches literal backslash "\" |
\n |
Matches literal line break character |
\t |
Matches literal tab character (but use this only with external files) |
\d |
Matches any single digit |
\l |
Matches any lower case letter (by Unicode 4.0.0 definition) |
\p |
Matches any single punctuation mark: . , ! ? - / " : ; ( ) [ ] { } |
\s |
Matches any single spacing character (space, line break, tab) |
\u |
Matches any upper case letter (by Unicode 4.0.0 definition) |
\w |
Matches any single word character (neither \p nor \s) |
\D |
Matches any single non-digit |
\L |
Matches any non-lower-case-letter |
\P |
Matches any single non-punctuation-mark |
\S |
Matches any single non-spacing-character |
\U |
Matches any non-upper-case-letter |
\W |
Matches any single non-word-character (i.e., matches either \p or \s) |
Other character classes
. |
Matches any single character |
<...> |
Character range: matches any single character inside |
<^...> |
Negated character range: matches any single character not inside |
Inside a character range
e-h |
Any character in the run "e" to "h" inclusive (and so on for other runs) |
>... |
Starting with ">" means that a literal close angle bracket is included |
\ |
Backslash has the same meaning as for backslashed character classes: see above |
Structural
| |
Divides alternatives: "fish|fowl" matches either |
(?i) |
Always matches: switches to case-insensitive matching from here on |
(?-i) |
Always matches: switches to case-sensitive matching from here on |
Repetitions
...? |
Matches "..." either 0 or 1 times, i.e., makes "..." optional |
...* |
Matches "..." 0 or more times: e.g. "\s*" matches an optional run of space |
...+ |
Matches "..." 1 or more times: e.g. "x+" matches any run of "x"s |
...{6} |
Matches "..." exactly 6 times (similarly for other numbers, of course) |
...{2,5} |
Matches "..." between 2 and 5 times |
...{3,} |
Matches "..." 3 or more times |
....? |
"?" after any repetition makes it "lazy", matching as few repeats as it can |
Numbered subexpressions
(...) |
Groups part of the expression together: matches if the interior matches |
\1 |
Matches the contents of the 1st subexpression reading left to right |
\2 |
Matches the contents of the 2nd, and so on up to "\9" (but no further) |
Unnumbered subexpressions
(# ...) |
Comment: always matches, and the contents are ignored |
(?= ...) |
Lookahead: matches if the text ahead matches "...", but doesn't consume it |
(?! ...) |
Negated lookahead: matches if lookahead fails |
(?<= ...) |
Lookbehind: matches if the text behind matches "...", but doesn't consume it |
(?<! ...) |
Negated lookbehind: matches if lookbehind fails |
(> ...) |
Possessive: tries to match "..." and if it succeeds, never backtracks on this |
(?(1)...) |
Conditional: if \1 has matched by now, require that "..." be matched |
(?(1)...|...) |
Conditional: ditto, but if \1 has not matched, require the second part |
(?(?=...)...|...) |
Conditional with lookahead as its condition for which to match |
(?(?<=...)...|...) |
Conditional with lookbehind as its condition for which to match |
IN REPLACEMENT TEXT
\char |
If char is other than a-z, A-Z, 0-9 or space, expands to that literal char |
\\ |
In particular, "\\" expands to a literal backslash "\" |
\n |
Expands to a line break character |
\t |
Expands to a tab character (but use this only with external files) |
\0 |
Expands to the full text matched |
\1 |
Expands to whatever the 1st bracketed subexpression matched |
\2 |
Expands to whatever the 2nd matched, and so on up to "\9" (but no further) |
\l0 |
Expands to \0 converted to lower case (and so on for "\l1" to "\l9") |
\u0 |
Expands to \0 converted to upper case (and so on for "\u1" to "\u9") |