You can specify the following advanced settings to configure the CDP to meet your business needs. Changing one or more of these settings may have a significant impact on the CDP output. We recommend that you validate that your changes have the impact you want by running the CDP process and analyzing the results whenever you change one of these settings.
To specify Advanced Parser settings
- From the Navigation or Project View, right-click the Customer Data Parser process and select Edit Process.
- Select the Options menu option.
- Click Advanced. The Advanced Parser Settings window opens.
- Make the changes you want, using the following table as a guide.
-
When you have made the changes, click Back and then click Finish.
Option Description Assign gender option From the drop-down list, select one of the following options:
- Base gender on title and all given names. (Default)
- Base gender on title and first given name.
- Base gender on title and first given name that is not an initial.
This setting addresses problems that arise in cultures where a person may have both a male and female given name, which, with the default setting, would return a review code of 019.
Attribute to identify rows in exceptions file
From the drop-down list, select the attribute to write out with each bad pattern or city problem entry in the parsing exceptions file. The attribute must contain a unique value for each record (such as sequence numbers, social security numbers, or account number). By default, this attribute is not defined.
Pre-processing options for house numbers
From the drop-down list, select the type of house number processing. By default, the CDP applies minimal pre-processing of house numbers before street pattern processing.
- No pre-processing - disable pre-processing
- Minimal pre-processing - a fractional number like "1 1/2" becomes "11/2". Note that "1 1/2" becomes a HSNO token. The fraction portion must be 3 characters in length and include the '/'.
- Full pre-processing - This option handles fractions just as the Minimal Pre-Processing option does. In addition, it tells the CDP to convert a number like "2420-36" to "2420 36". This conversion does not apply to New York, New Jersey and Hawaii.
Split hyphenated house numbers into two attributes
If you have house number entries that contain a hyphen (for example 12-25) and you want to split the entry into house number and apartment number, select one of the following options:
- House no. + Apartment no. - the first number will be considered house number and the second will be considered apartment number.
- Apartment no. + House no. - the first number will be considered apartment number and the second will be considered house number.
The house number value will be written to the pr_house_number attribute and the apartment number will be written to the pr_dwelling1_name attribute.
Note: This option is primarily used for Netherlands (NL). It should not be used for Australia (AU), Canada (CA), United Kingdom (GB), or United States (US).Street ordinal options From the drop-down list, select the street ordinal option.
- Add street ordinals to street lines - add street ordinal. This is the default for US. For example, "10 22 Street" becomes "10 22nd Street". However, ordinals are not added if the street is made up of multiple words.
- Example:
-
Input= 10 22 ST
pr_street_name_original 22
pr_street_name_recoded 22nd
For multi-word street names, ordinals are not added even when this option is selected.
Input= 1 2 ST RD
pr_street_name_original 2 ST
pr_street_name_recoded 2 ST
- Remove street ordinals from street lines - remove street ordinal if it is there.
- Example:
-
Input= 10 22nd ST
pr_street_name_original 22
pr_street_name_recoded 22
- Do nothing (keep inputs) - do not add or remove street ordinal. This is the default for all countries except US.
Spelling algorithm for matching city names From the drop-down list, select the spelling algorithm option to use when the input city name does not match the city table.
- No spelling algorithm- the spelling algorithm is not used.
- Basic spelling algorithm - the spelling algorithm is applied when the first four (4) characters of the two names being compared are identical.
- Enhanced spelling algorithm - the enhanced spelling algorithm is applied. The CDP handles spaces, special characters and words correctly. For example, it will drop spaces, swap hyphens and spaces, change accented characters to non-accented characters, and swap short words such as "de" for "da."
- Example:
-
When this option is checked, the Parser will try to match San Clemente vs. S Clemente, Mount Hood vs. Mt Hood, L'Ardenne vs. Ardenne, BRIGGS-CORNER-QNS vs. Briggs Corner Qns, etc.
Use city changes file Check this box to use the secondary city lookup table (city changes file). The city changes file specifies characters and words used for enhanced table lookup. The default file is provided and accessed from the Advanced Rules tab. You can customize this file. For more information, see City Changes File. Note: The Enhanced spelling algorithm for the Spelling algorithm option must be selected to use this option.Maximum length for user definitions By default, the user definitions file, which includes all user-defined parsing customization definitions, allows entries up to 1000 characters. Specify a larger value (up to 4000 characters) if your definitions require it.
Copy original data to Parser original output attributes
Check this box to always retain the original input data in the parser output file. If this option is not selected, the CDP retains the original data unless the word/phrase is a synonym of another word or phrase. In that case, the synonym value is stored in the output attribute identified as Original. By default, this setting is disabled.
Interpret concatenee and next word as a surname
Check this box to define a combined token (also called a concatenated token) as a SURNAME attribute. Otherwise, the CDP defines the combined token as an ALPHA attribute. (By default, this setting is enabled.)
- Example:
-
When enabled, a name like "La Keysia Smith" parses as:
GIVEN-NAME1 = Smith
SURNAME = La Keysia
When this setting is not enabled, the CDP parses a name like"La Keysia Smith" as:
ALPHA = La Keysia
ALPHA = Smith
recoded to:
GIVEN-NAME1 = La Keysia
SURNAME = Smith
Activate business recognition functions
Check this box to enable the setting you selected for the business recognition function. See Identifying Business Names for more information. By default, this setting is enabled.
Disable automatic business line type identification
Check this box to disable automatic business line type identification and force the CDP to use the patterns for one pass. See Identifying Business Names for more information. By default, this setting is disabled.
Remove special characters in table lookups
Uncheck this box to include hyphens and slashes for a table lookup. When this setting is enabled (which is the default), the CDP separates the special characters before performing a lookup.
- Example:
-
When this setting is checked, if the Parser encounters the word INC-, it separates the hyphen from the word and looks up INC in the table. If you uncheck the box, the Parser looks up INC-.
Reverse names separated by commas before pattern lookup
Check this box to reverse names that are separated by commas before pattern lookup. By default, this setting is enabled.
- Example:
-
If this setting is enabled, SMITH, MARY M (ALPHA ALPHA 1ALPHA) would be changed to MARY M SMITH (ALPHA 1ALPHA ALPHA) before lookup. This could find an identifying pattern: GIVEN-NAME1 GIVEN-NAME2 SURNAME
Split concatenated names before parsing
The CDP normally does some pre-processing prior to name processing; this includes name concatenation and name splitting. Check this box to split concatenated names before name processing. By default, this setting is enabled.
- Example:
-
"JOHN&MARY", "JOHN+MARY" or "JOHN/MARY"
If this setting is enabled, these will be split into 3 tokens, if both have a valid GIVEN-NAME1 attribute.
Split three-character names into initials
Check this box to split three-character names into initials. For example, for the entry "BPL," "B" is parsed to first name attribute, "P" to middle name attribute and "L" to last name attribute. If this option is not selected, "BPL" would remain as "BPL." By default, this setting is disabled.
Start new logical line after CARE-OF attribute
Controls whether a CARE-OF attribute sets logical beginning and ending positions (such as GIVEN-NAME1 or TITLE). By default, this setting is disabled.
- Example:
-
If this setting is enabled, the line
ALPHA, BUSINESS, CARE-OF, TITLE, 1ALPHA, 1ALPHA, ALPHA
THORN, SECURITIES, C/O, MRS, F, B, SMITH, REC=1
is processed as follows:
A business is "care-of" a personal name, and that name's title is flagged as a TITLE. The logical line position was changed after CARE-OF. "Mrs." is now a TITLE.
If this setting is disabled, the line
ALPHA, BUSINESS, CARE-OF, ALPHA, 1ALPHA, 1ALPHA, ALPHA
THORN, SECURITIES, C/O, MRS, F, B, SMITH, REC=1
is processed as follows:
A business is "care-of" a personal name, and that name is not seen. Line position was not changed after CARE-OF. "Mrs." is not recognized, but seen as an ALPHA.
Try to match a single word preceding a street line
Check this box to attempt to match lines that have a single token above an identified street line to the table. This setting ensures that lines with a single token above an identified street line are given a single attempt to match a pattern to the table. The token must be identified as a single intrinsic attribute (ALPHA, ALPHA-1SPECIAL, and so forth). By default, this setting is disabled.
- Example:
-
JOHN SMITH BRIDLEWORKS1 10 MAIN ST BILLERICA MA
"BRIDLEWORKS1" is identified as ALPHA-1NUMERIC token and can be identified as a business if the pattern for ALPHA-1NUMERIC was assigned to a business.
Write street patterns to log file
Check this box to print to the log file the street pattern as it exists after the first six Street Rules are run, but before any other Street Rules are run. By default, this setting is enabled.
Move city/province from end of geography line to a new line
Check this box to split geography lines and move city/province into a new line. By default, this setting is disabled, which means the geography line is not split.
Hyphenate concatenee and surname Check this box to recode a concatenee and surname by inserting a hyphen between them. For example, when this box is checked, the CDP recodes VAN DAMME as VAN-DAMME. Otherwise, the CDP recodes a concatenee and surname by removing the space between them (for example, VANDAMME).
Note: Because the Window Key process has a rule that builds the window key with characters starting after a hyphen instead of at the beginning, a better window key can be generated. It would limit the window size and prevent over-matching.Eliminate duplicate dwellings Check this box to eliminate duplicate dwelling information. For example, if the pr_dwel1and pr_dwel2 attributes have identical information, the US Postal Matcher builds the pr_gout_deliver_addr field with data from both pr_dwel1 and pr_dwel2, resulting in duplicate information.
Write Z lines to misc address lines Check this box to write Z lines to the miscellaneous address attributes.
Note: A Z line is one the CDP has identified as a miscellaneous line type.Do not change APT token at end of line to ALPHA Check this box if you do not want to change an APT token at the end of a line to ALPHA. By default, if an APT token is the last token on a line (that is, it is not followed by an apartment number), the CDP changes its attribute to ALPHA. However, in some countries (such as Portugal) an APT token at the end of the line does represent an APT and the attribute should not be changed.
Return first word if multi-word city is not matched Check this box to use the first word in a multi-word city name when a match is not found for the full city name. Otherwise, the CDP returns the last word in the city name. This option enables you to determine on a country by country basis how you want the CDP to process city names with multiple words.
Assign street attribute to a building Check this box to parse buildings as street elements instead of name elements.
- Example:
-
JOE DOE
BLDG KENTUCK
25 LINNELL CIR
BILLERICA MA 01821
When parsing this address, the CDP assigns ALPHA to BLDG even though BLDG is defined with the attribute of SITE in the Parser rules table. The CDP will assign street weights to tokens with the attribute of UNIT and FLOOR only. If this option is enabled, BLDG will be assigned the attribute of SITE. If this option is not checked, then BLDG will be assigned to ALPHA as before.
Turn off conversion of plus signs to ampersands Check this box if you do not want to convert all plus signs (+) to ampersands (&). By default, the CDP converts all plus signs to ampersands.
Check street lines preceding complex line Check this box if you want the CPLX01 function to look for previous street lines in addition to succeeding street lines. Otherwise, the CPLX01 function does not check previous lines for street lines. Flag name line changes Check this box to flag changes to the original data that appear on a name line. If there is change, a value of '1' will be stored in the change flag attributes.
Note: You must add the change flag attributes to the output schema by selecting Add Parser Outputs in the Schema Editor.Flag street line changes Check this box to flag changes to the original data that appear on a street line. If there is change, a value of '1' will be stored in the change flag attributes.
Note: You must add the change flag attributes to the output schema by selecting Add Parser Outputs in the Schema Editor.Flag geography line changes Check this box to flag changes to the original data that appear on a geography line. If there is change, a value of '1' will be stored in the change flag attributes.
Note:You must add the change flag attributes to the output schema by selecting Add Parser Outputs in the Schema Editor.
Write misc data to neighborhood Check this box to obtain neighborhoods when there is miscellaneous data identified between a street line and a geography line. The output is stored in the pr_neigh1 and pr_neigh2 attributes.
Note: This option should be used when you know that a country uses neighborhoods. It is selected by default for India (IN) and Hong Kong (HK). If used with the Write Z lines to misc address lines option, the Write Z lines option takes precedence over this option.Output all name data Check this box to write out any name data that do not match the found name pattern. The unmatched name data is stored in the pr_name_relation attributes. Check the box if you want to write any unmatched token, even though it is not part of the pattern.
- Example:
-
When enabled, a name like "ITF MISS HSIAO MING WANG" parses as:
Input: <ITF> <MISS> <HSIAO> <MING> <WANG>
Looking up name pattern: <RELATIONSHIP> <TITLE> <ALPHA> <ALPHA> <ALPHA>
Pattern not found.
Looking up name pattern: <MISS> <HSIAO> <MING> <WANG>
Pattern found. <TITLE> <ALPHA> <ALPHA> <ALPHA>
Recoded pattern: <TITLE> <FIRST> <MIDDLE ><LAST>
"ITF" will be written to the pr_name_relation attribute.
Use TSS city table Check this box to use the city table provided by Precisely. This option is used when you create a user-defined template using ww_proj (with dummy city table), and then subsequently purchase the country-specific city table.
Note: This option is enabled only when the country-specific city table exists.Additional geography lookup Check this box to validate province/city/postcode combination against the auxiliary city table and update or append information in the output. This option will return the flag (Y/N) in the pr_verified_geography attribute. For Hong Kong, this option also returns island for some cities in the pr_sub_city attribute.
Note: This option is not available for the following countries: Basic Countries (ZZ), China (CN), United Kingdom (GB), Japan (JP), Korea (KR), Singapore (SG), and Taiwan (TW).Note: For Canada (CA), Netherlands (NL), and Portugal (PT), the flag in the pr_verified_geography attribute is not available.Lookup all city combinations Check this box to look up all word combinations for city when the city lookup process is performed for a multi-word input. By default, the lookup is not performed for all combinations when the initial full string is not a city. - Example:
-
Default city lookup combinations:
1st lookup: <West Virginia Center Junction> not found in table.
2nd lookup: <Virginia Center Junction > not found in table.
3rd lookup: <Center Junction > not found in table.
4th lookup: <Junction > not found in table.
All city lookup combinations:
1st lookup: <West Virginia Center Junction > not found in table.
2nd lookup: <West Virginia Center > not found in table.
3rd lookup: <West Virginia > not found in table.
4th lookup: <West > not found in table.
5th lookup: <Virginia Center Junction > not found in table.
6th lookup: < Virginia Center > not found in table.
7th lookup: <Center Junction > not found in table.
8th lookup: <Junction > not found in table.