Parsing Spanish and German Names - spectrum_quality_1 - 23.1

Spectrum Data Quality Guide

Product type
Product family
Spectrum > Quality > Spectrum Quality
Product name
Spectrum Data Quality
Spectrum Data Quality Guide
Topic type
How Do I
First publish date

This template demonstrates how to parse mixed-culture names, such as Spanish and German names, into component parts. The parsing rule separates each token in the Name field and copies each token to the fields defined in the Personal and Business Names parsing grammar. For more information about this parsing grammar, select Tools > Open Parser Domain Editor and then select the Personal and Business Names domain and either the German (de) or Spanish (es) cultures.

This template also applies gender codes to personal names in using table data contained in Table Management. For more information about Table Management, select Tools > Table Management.

Business Scenario

You work for a pharmaceuticals company based in Brussels that has consolidated its Germany and Spain operations. Your company wants to implement a mixed-culture database containing name data and it is your job to analyze the variations in names between the two cultures.

The following dataflow provides a solution to the business scenario:

Solution to business scenario dataflow

This dataflow template is available in Enterprise Designer. Go to File > New > Dataflow > From template and select ParseSpanish&GermanNames. This dataflow requires Data Normalization.

In this dataflow, data is read from a file and processed through the Open Parser stage. For each data row in the input file, this data flow will do the following:

Read from File

This stage identifies the file name, location, and layout of the file that contains the names you want to parse. The file contains both male and female names and includes CultureCode information for each name. The CultureCode information designates the input names as either German (de) or Spanish (es).

Open Name Parser

Open Name Parser examines name fields and compares them to name data stored in the Spectrum Technology Platform name database files. Based on the comparison, it parses the name data into First, Middle, and Last name fields.

Conditional Router

This stage routes the input so that personal names are routed to the Gender Codes stage and business names are routed to the Business Names stage.

Gender Code

Double-click this stage on the canvas and then click Modify to display the table lookup rule options.

The Categorize option uses the Source value as a key and copies the corresponding value from the table entry into the field selected in the Destination list. In this template, Complete field is selected and Source is set to use the FirstName field. Table Lookup treats the entire field as one string and flags the record if the string as a whole can be categorized.

The Destination is set to the GenderCode field and uses the lookup terms contained in the Gender Codes table to perform the categorization of male and female names. If a term in the input data is not found, Table Lookup assigns a value of U, which means unknown. To better understand how this works, select Tools > Table Management and select the Gender Codes table.

Write to File

The template contains two Write to File stages, one for personal names and one for business names. In addition to the input field, the personal names output file contains the Name, TitleOfRespect, FirstName, MiddleName, LastName, PaternalLastName, MaternalLastName, MaturitySuffix, GenderCode, CultureUsed, and ParserScore fields.

The business names output file contains the Name, FirmName, FirmSuffix, CulureUsed, and ParserScore fields.