A token is a string consisting of any character, word or phrase and enclosed in single quotes in the left side of the equation in the definition. For example:
'MARY' INS NAME BEG ATT=GVN-NM1,GEN=F -- the token is 'MARY'
'HOLD MAIL' STREET ATT=HOLD - the token is 'HOLD MAIL'
Guidelines
- The maximum number of characters for a token is 100.
- A token cannot wrap to a second line.
- A token may include one or more sub-tokens or masks.
- A token or sub-token cannot start with a space. If it starts with a space the space will be ignored. For example ' sons' will be 'sons'.
Sub-tokens
A sub-token is a string within a token. A sub-token may appear at the beginning or end of the token. For example, 'STRASSE' can be a sub-token of 'BERGENSTRASSE.' Use the following keywords in the definition to specify sub-tokens.
- Beginning-Token (BEG-TKN). Keyword that indicates that the sub-token position is at the beginning of a token.
- Ending-Token (END-TKN). Keyword that indicates that the sub-token position is at the end of a token.
If a pattern entry contains a sub-token, you must specify whether the sub-token should be separated from the word or attached to it.
For example, assume your data contains 'BERGENSTRASSE 12'. A definition entry might exist in this format:
‘STRASSE’ STREET ENDING-TOKEN ATT=STR-TYPE-S
The following pattern is required in order to separate the sub-token from the word:
‘ALPHA STR-TYPE-S NUMERIC' PATTERN STREET REC=’STR-NM STR-TYPE HSNO’
The following pattern is required in order to keep the sub-token attached:
‘ALPHA NUMERIC’ PATTERN STREET REC=’STR-NM HSNO’
Masks
A mask is a description of a word or phrase. Masks define characters of data elements using:
- n to represent a number (0-9).
- a-z to represent lowercase alphabetic letters.
- Every character that is not a letter or number is represented by the character itself.
For example, a mask can define any series of five numerals as a postal code, instead of entering each of the 99,999 possible combinations in the table. This mask token looks like:
'nnnnn' MASK GEOG DEF ATT=POSTCODE
Masks may include special characters if they are part of the word representation. For example, a mask for the nine-digit postal code is:
'nnnnn-nnnn' MASK GEOG DEF ATT=POSTCODE