Array handling in the transformation builder - Data360_DQ+ - Latest

Data360 DQ+ Help

Product type
Software
Portfolio
Verify
Product family
Data360
Product
Data360 DQ+
Version
Latest
Language
English
Product name
Data360 DQ+
Title
Data360 DQ+ Help
Copyright
2024
First publish date
2016
Last updated
2024-10-09
Published on
2024-10-09T14:37:51.625264

The transformation builder can handle arrays in a number of ways. You can copy an array, or explode the contents, choosing to include or exclude null values.

When you add an array to a schema, choose from the following array handling options.

Copy

The Copy option generates a single row in the output for each row in the input.

Explode

The "Explode" option results in multiple rows in the output data set for each row in the input. Each non-null value in an array generates a separate row in the output. For example, using the default behavior on the following input generates two unique output rows with the same name values and different phone details.

{  
    "givenName": {
        "first": "Michael",
        "last": "Smith"
    },
    "preferredName": {
        "first": "Mike",
        "last": "Smith"
    },
    "phone": [    
        {      
            "type": "home",
            "number": "212 555-1234"    
        },
        {   "type": "work",      
            "number": "646 555-4567"
        }
   }
}

Explode Outer

By default, the built schema uses the "Explode Outer" option for the Output Transform Action on any arrays in your JSON object. Explode Outer behaves like Explode, but it also generates rows in the even of Null values. Using the above JSON, for example, if the value of the work phone number was Null - that is, if an entry existed for "type": "work", but the entry for number contained no value or was missing - an output row would still be created, containing a Null value.

Pos Explode

Pos Explode is simlar to Explode. Additionally, you can specify a related field. The related field must have the same Source Data Type and Output Transform Action. The data is considered to be related by its position in each array. For example, you could start with the following simple JSON:

{"field": ["first","second"], "relatedField": ["one","two"]

If you handle each array by using Explode or Explode Outer, the generated output contains four rows, one for each combination of the two values in each array.

field relatedField
first one
first two
second two
second one

If you select the Pos Explode option for each array, and specify "relatedField" as the Related Field of "field", the output contains two rows:

field relatedField
first one
second two

Pos Explode Outer

Pos Explode Outer behaves the same as Pos Explode, but also includes Null values in its output.