Split - Data360_DQ+ - Latest

Data360 DQ+ Help

Product type
Software
Portfolio
Verify
Product family
Data360
Product
Data360 DQ+
Version
Latest
Language
English
Product name
Data360 DQ+
Title
Data360 DQ+ Help
Copyright
2024
First publish date
2016
ft:lastEdition
2024-07-09
ft:lastPublication
2024-07-09T15:09:58.774265

The Split node allows you to parse a String field, using Regular Expressions.

Doing so requires 4 input parameters:

  • Field to Split - This is the String field you would like to split into multiple fields.
  • Number of Fields - This is the number of fields you would like to split the Field to Split into, and the number of new fields that will be added to your data set.
  • Field Names - These are the names of the new fields created by the split. Names default to Field1, Field2, Field3, etc., but can be changed by double clicking.
  • Regular Expression Pattern - This is the RegExp you want to use to split your field. This is the most crucial component of the Split node, as it is what actually does the work of splitting the field. As such, it is covered in more detail below.

Split node Regular Expression Patterns

The Split node follows the rules of Regular Expressions in Java. This means that splitting is not always simply a matter of specifying the characters with which you want to split.

For example, if you had a name field that you wanted to split into a firstName field and a lastName field, you could split using the Space character - simply by typing the Space bar once - or you could use the more formal RegExp version of Space, \\s.

Either case would result in the following:

name

firstName

lastName

Ernest Hemingway

Ernest

Hemingway

Split Node Example 1 - Splitting on a Space

Some cases require more special treatment, however. For example, if you wanted to parse a URL using the period character, and you simply used the period character . as your Regular Expression Pattern, you would very quickly find that this does not work.

url

protocolPrefix

companyName

worldWideSuffix

http://www.infogix.com

 

 

 

Split Node Example 2 - A RegExp That Doesn't Work

This is because with Regular Expressions the period . is a special character, which needs to be escaped. In Data360 DQ+, special RegExp characters should be escaped using a double back slash \\

Using such a RegExp would result in \\. which would properly split our URL field into the three fields shown below.

url

protocolPrefix

companyName

worldWideSuffix

http://www.infogix.com

http://www

infogix

com

Split Node Example 3 - Splitting on \\.

Data360 DQ+ uses the Java Regular Expression Standards. More on this can be found here:

https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

Note: In both of these examples, the characters that are used to split on are effectively removed from the newly created columns.