Parameters for Global Rules Table - trillium_discovery - trillium_quality - 17.1

Trillium Control Center

Product type
Software
Portfolio
Verify
Product family
Trillium
Product
Trillium > Trillium Discovery
Trillium > Trillium Quality
Version
17.1
Language
English
Product name
Trillium Quality and Discovery
Title
Trillium Control Center
Topic type
Overview
Administration
Configuration
Installation
Reference
How Do I
First publish date
2008

The Global Rules Table (rtrules1.win/unx) defines the rules that apply to the data from all countries. Using this table, the Router matches data from input records to the system tables (Global Geography Tables: GLOBRTR.len/ben and APGLBRTR.tbl) containing data specific to a country. When a match is found, a weight is assigned to the identified word or phrase. The assigned weight indicates the level of probability that the word or phrase belongs to the country from which the table match was made.

Note: Parameter names are NOT case-sensitive. Comment lines can be added by including a forward slash (/) and an asterisk (*) on the first left position of the line. For example, / * This is a comment line.

Parameter

Description

DROP_PERIOD

When set to Y, it enables the following:

  • If a period is followed by a space, the period is removed before processing.
  • If a period is in the middle of a word, the period is replaced with a space.

Example: DROP_PERIOD Y

DUPLICATE_CITY_NAME

Overrides the order in COUNTRY_LIST for specific cities that appear in multiple countries. City names that appear in more than one city table are extracted to this table.

Entry format: City name, country1, country2

Countries are listed in the most likely order of where the city would be found. The Router only returns a weight for the country that is highest ranked in the DUPLICATE_CITY_NAME entry.

Example:

DUPLICATE_CITY_NAME      Toronto,CA,US,AU

COUNTRY_LIST          us,ca

If the input record contained just "Toronto", with no state or province identifier, and no postcode mask, then the Router would only give weight to Canada (even though US is first in the country list).

FIX_UTF8

Specifies that the Global Data Router will attempt to fix UTF8 encoding that is incorrect. Use this parameter when you suspect that UFT8 data is really in a different code page.

Example: FIX_UTF8

GEOG_RECODE

Allows for recodes from the Global Geography Table to be applied to the input string. The program uses recode and synonym phrases that are found in the table.

Example:  GEOG_RECODE Y

If the Global Geography Table contains ‘ONT’ RECODE ’ON’ , then if ‘ont’ is found in input, it will be replaced with ‘on’.

LOAD_FILES

When set to Y, it indicates that the Global Geography Table should be loaded to memory at program startup. This improves performance for real-time applications.

Note: Typically, this parameter is NOT used in batch mode. LOAD_FILES must be set to Y when used with Trillium Director.

Example: LOAD_FILES Y

MAX_ADDITIONAL_ POSTCODES

Contains the maximum number of postal codes to add to the weight. Limits the amount of weight added when a city name has been identified. This works in conjunction with WEIGHT_ADDITIONAL_POSTCODES. See the description for WEIGHT_ADDITIONAL_POSTCODES for an explanation of the interaction of these two parameters.

Example: MAX_ADDITIONAL_POSTCODES 50

MIN_LENGTH

Minimum length of Asian Level1, Level2 or Level3 for it to be a match.

Example:    MIN_LENGTH 6

Asian data tends to not have spaces to separate the text. The same combination of characters can be used to represent different words, so sometimes very short combinations might represent a city name, or be part of a longer string.

MIN_WEIGHT

Sets the minimum weight value that must be accumulated for the record to be assigned to a country.

Example:   MIN_WEIGHT 70

If the total weight for a country is less than this value, a match is not made.

RECODE_ALL

Uses user-defined values. Performs recodes for every country.

Example: RECODE_ALL co antrim

This example would ensure that a value like "CO ANTRIM" does not get sent to the US (since the program might think that "co" means Colorado).

SPACE_AFTER_PERIOD

If Y, then this parameter inserts a space, prior to matching the string to the lookup tables, if the program encounters a period in the text and the next character is not a space. This allows text such as "S.Diego" to convert to "S. Diego". If "S." is recoded to "San", then the program gets a match in the Global Geography Table.

Example: SPACE_AFTER_PERIOD <Y or N>

STREET_TYPE

A list of names that are street identifiers. If the word preceding the street type looks like a city name, then the resulting city match is ignored. When working with multiple countries, it is best to put all street types for the various languages in this list, so the program knows to which country each type belongs.

Example:

  STREET_TYPE    R,road

              R,st

               L,rue de

The identifiers (L = left, R = right) indicate the identifier position, relative to street name. "Rue" always comes to the left of the street name, and "road" to the right.

TRANSLATE_CHAR

This parameter converts characters from one form to another.

Format:  TRANSLATE_CHAR   ab, cd, ef

‘a’ is translated to ‘b’, ‘c’ translated to ‘d’ and ‘e’ translated to ‘f’.

Example:  TRANSLATE_CHAR ØO,AE

This would translate ‘Ø’ to ‘O’ and ‘A’ to ‘E’.

TRANSLATE_TABLE

Contains a translation table to convert accented characters into non-accented ones. Many internal city-state tables are entered without accents. This table allows the text to be converted for each country so that accented characters will match up. This table should be entered before the country sections of the rules table.

Within the country section, use USE_TRANSLATE_TABLE to reference a particular table.

Example:  TRANSLATE_TABLE  cp1250

UNICODE_RANGE

The range of characters and weight for each character in that range. This enables you to give weight to characters that should only occur in the designated country. The low and high range values are entered in hexadecimal.

Example:

UNICODE_RANGE  12f 12e 20

                                       140 142 10

UTF8_ENCODING

Enter an alternate code page to try if the data is supposed to be in UTF8 but is not properly formatted. You can have one of these for each country in the rules table. The default is ASCII.

Example:  UTF8_ENCODING cp1250

WEIGHT_ADDITIONAL_ CITY_MATCH

For city name matches, this value is added if another city match is made for a particular entry in a different country. The result is added to the total weight, giving more weight for more than one city match.

Example:  WEIGHT_ADDITIONAL_CITY_MATCH  10

In this case, if the city of ‘Portsmouth’ got a match in the US as well as the UK, a value of 10 would be applied twice.

WEIGHT_ADDITIONAL_ POSTCODES

This value is added to any city name with more than one postcode. It is multiplied by the lesser of the values in MAX_ADDITIONAL_POSTCODES, and the number of postal codes, minus one for this entry in the Global Geography Table.

For example, if "Boston" had 15 postal codes and MAX_ADDITIONAL_ POSTCODES was set to 10, weight would be calculated by subtracting 1 from 15, giving 14. Since 14 is greater than 10, 10 would be multiplied by the value of the weight.

If the weight is zero, then this is not used in the calculation.

Example:    WEIGHT_ADDITIONAL_POSTCODES   10

WEIGHT_ADDITIONAL_ WORDS_IN_CITY

Add this weight if the city name contains two words. Default = 15

Example:    WEIGHT_ADDITIONAL_WORDS_IN_CITY  20

WEIGHT_ATT_CITY

Add this weight for all entries in the Global Geography Table that have attribute assignments of "att=city".

This is useful for foreign countries that may have data that is not represented in the native-language spelling. For instance, in Italy, the capital is "Roma". However, the name could be entered with the English spelling of "Rome." This value is in the Global Geography Table, so it would be categorized as being in Italy.

Example:   WEIGHT_ATT_CITY 70

WEIGHT_ATT_STATE

Add this weight for all entries in the Global Geography Table that have attribute assignments of "att=state." This weight is similar to "WEIGHT_ATT_CITY", except that it uses state names instead of city names.

Example:    WEIGHT_ATT_STATE 50

WEIGHT_COUNTRY_ CODE

Add this weight if the country code for this record is in the appropriate column. This is probably a large value, exceeding the threshold, since that the value, if present, is likely correct.

Example:  WEIGHT_COUNTRY_CODE 100

WEIGHT_COUNTRY_ NAME

Add this weight if the country name appears anywhere in the record. This should not exceed the threshold.

Example:  WEIGHT_COUNTRY_NAME 25

WEIGHT_COUNTRY_ NAME_LAST

Add this weight if the country name is last input data. If the country name is in a particular attribute, then put this attribute LAST in the list of attributes to pick up.

Example:   WEIGHT_COUNTRY_NAME_LAST 25

WEIGHT_EMAIL_EXTENSION

Add this weight if an e-mail attribute exists in the input and the E-mail Attribute setting is specified in the Advanced settings window.

Example:  WEIGHT_EMAIL_EXTENSION 100

WEIGHT_FOUND_ ENDINGS

Add this weight if an ending from the Global Geography Table or from the ADD_ENDING parameter for a particular country in your Router Rules Table is found.

Example:   If the ending –weg is in the table, and the word "Arborweg" is in the record, we add the desired weight.

Example:   WEIGHT_FOUND_ENDINGS 60

WEIGHT_LEVEL1 Specifies the value to add if there is a match at the state, province, and county level entry.
WEIGHT_LEVEL2 Specifies the value to add if there is a match at the city level entry.
WEIGHT_LEVEL3 Specifies the value to add if there is a match at the locality and neighborhood level entry.
WEIGHT_LEVEL4 Specifies the value to add if a match exists at the dependent locality and secondary neighborhood level entry.
WEIGHT_NO_POSTCODE_MATCH Specifies the value to SUBTRACT if there is a match on the city, but not on the postal code.
WEIGHT_NO_STATE_ MATCH Specifies the value to SUBTRACT if there is a match on the city, but not on the state/province.
WEIGHT_POBOX

Add this weight if there is a word in the record that refers to a post office box for that country. These items are in the Global Geography Table, with an attribute of "ATT=PBOX".

Example:  WEIGHT_POBOX 30

WEIGHT_POSTCODE_ MASK

Add this weight when there is a match on the position and pattern described in POSTCODE_MASK.

Example:  WEIGHT_POSTCODE_MASK 50

WEIGHT_POSTCODE_ MASK_ANYWHERE

Add this weight if the POSTCODE_MASK is matched anywhere in the record. The entire record is then searched for the postal code pattern.

This is useful in countries like Canada or the UK, where postal codes can be very distinctive.

Example:    POSTCODE_MASK lc,nnnnn,nnnnn-nnnn

                        WEIGHT_POSTCODE_MASK_ANYWHERE 75

WEIGHT_SECONDARY GEOG_MATCH

Add this weight if there is a secondary match at a particular level: for example, if the city is matched and then the state or postal codes associated with this city are also matched. Consider this example:

Joe Customer 184 Main St Billerica Ma 01821

And

WEIGHT_SECONDARY GEOG_MATCH 1000

If the program looked this up in the table, it would find "Billerica". When checked further, it would find a match on "MA", and also on postal code "01821". This would add 2 x 100 to the weight value for the state and postal code matches.

WEIGHT_STATE_ POSTCODE_RANGE

The program will add this weight if the following condition is met: Router can't find an exact postcode match, but the postcode is in the correct range for the state from the STATE_POSTCODE_RANGE table (created in the rules file).

Example:   WEIGHT_STATE_POSTCODE_RANGE 10

Billorica ma 01821 (Typically, this is Billerica MA 01821)

This will not get a match on "Billorica", but since the range for Massachusetts contains the sectional center 018, it will add this weight.

WEIGHT_SYNONYM If a synonym is found in the Global Geography Table, this weight is added to the overall total.
WEIGHT_TEMP_TABCIT Specifies the value to subtract from the WEIGHT_LEVEL2 value if there is a match in the city changes file.
WEIGHT_THREE_CHAR_LEVEL2

Add this weight if the city name consists of three characters. This is for countries that have very short city names that might occur in other countries as fillers. When the three-character city is matched, instead of WEIGHT_LEVEL2, this will add a much smaller weight so that if there is other information pointing to a different country, this does not override it.

Example: WEIGHT_THREE_CHAR_LEVEL2 20

WEIGHT_THREE_ WORDS_IN_CITY

Add this weight if the city name contains three or more words. Default = 500

Example:  WEIGHT_THREE_WORDS_IN_CITY 2000

WEIGHT_THRESHOLD

This is a user-defined value that the total computed weight is compared against. When the total weight is greater than or equal to this value, no more data comparison is performed and the country of origin is determined.

Example:   WEIGHT_THRESHOLD 1000

If the computed weight equals 100 or greater, processing stops and the country of origin is determined.

WEIGHT_TWO_CHAR_LEVEL2

Add this weight if the city name consists of two characters. This is for countries that have very short city names that might occur in other countries as fillers. When the two-character city is matched, instead of WEIGHT_LEVEL2, this will add a much smaller weight so that if there is other information pointing to a different country, this does not override it.

Example: WEIGHT_TWO_CHAR_LEVEL2 20