Open Parser supports the standard set of Java RegEx character class metacharacters in the %Tokenize and @RegEx commands. A metacharacter is a character that carries special meaning in pattern matching. The supported metacharacters are:
([{\^-$|]})?*+.
There are two ways to force a metacharacter to be treated as an ordinary character:
- Precede the metacharacter with a backslash
- Enclose it within \Q (which starts the quote) and \E (which ends it).
%Tokenize follows the rule for Java Regular Expressions character classes—not Java Regular Expressions as a whole.
In general, the reserved characters for a character set are:
- '[' and ']' indicate another set.
- '-' is a metacharacter if in between two other characters.
- '^' is a metacharacter if it is the first character in a set.
- '&&' are metacharacters if they are between two other characters.
- '\' means next that the character is a literal.
If you have any doubt whether a character will be treated as a metacharacter and you want the character to be treated as a literal, escape that character using the backlash.