Use the JSON Parser node to parse and transform a single JSON field from an input, to generate one or multiple output rows from each JSON object. You can manually generate the schema that is used to parse the JSON, or it can be generated automatically from a sample JSON object that you provide.
Properties
Display Name
The name of the node in the analysis.
Edit Transformations
Click Edit Transformations to configure the schema that is used to parse the JSON. For more information, see Using the Transformation Builder.
JSON Field
The input node field that contains the JSON you want to parse.
Additional Output Fields
Click Add Field to add any other fields from your input node that you want to include in the output. Use the Field Selector to select fields and click the arrows to move them between the Available and Selected columns.
Using the Transformation Builder
You can parse JSON objects into output rows by using the transformation builder.
- Select the JSON Parser node.
- Click Edit Transformations. The Edit Transformations window is displayed.
- Click Add.
- Choose an option from the drop down list:
- Field - Select this option to start creating the JSON schema manually, or to add a field to or edit an existing schema. For details, see Building manually.
- From Sample JSON - Select this option if you have an example of the JSON object that you will be parsing with the JSON Parser node. For details, see Building by using sample JSON.
- When you have built the schema, click OK to close the Edit Transformations window.
- In the properties panel, select the JSON field from the input node that contains the JSON you are parsing.
- Use the Additional Output Fields panel to select and add any other input fields that you want to include in your output along with the parsed JSON. Use the Field Selector to select fields and click the arrows to move them between the Available and Selected columns.
Building schema by using sample JSON
- Click in the text area
- Enter sample JSON that represents the JSON you want to parse using the JSON Parser node, for example
{"region":"name", "city": "name"}
. - Click Build Schema. The transformation is automatically generated using the default parsing options.
- Click OK to close the Edit Transformations Window
- Click Accept Changes.
If you use the generator to build your schema, the first field in the schema will always be a Struct with no Source Name given.
Configuring a schema
Depending on the complexity and structure of your JSON object, you might want to change some of the options in the built schema, rather than accept all the defaults.
Unique field names
If you have multiple fields with the same name as part of separate arrays in your JSON object, when you use the schema builder to automatically generated the schema, by default these fields will be given the names "field", "field2", "field3" etc in your output. You might need to then edit these fields to give them meaningful titles relating to the arrays that they have been extracted from.
Array handling
When parsing JSON in Data360 DQ+, you can explode the contents of an array in different ways. You can choose from the following Explode options.
Explode
The "Explode" option results in multiple rows in the output data set for each row in the input. Each non-null value in an array generates a separate row in the output. For example, using the default behavior on the following input generates two unique output rows with the same name values and different phone details.
{
"givenName": {
"first":"Michael",
"last":"Smith"
},
"preferredName": {
"first":"Mike",
"last":"Smith"
},
"phone": [
{
"type": "home",
"number": "212 555-1234"
},
{ "type": "work",
"number": "646 555-4567"
}
}
}
When you are parsing a JSON object, consider how you want to present and display the information.
- If you want to create a separate record for each phone number, in this case you should use "Explode Outer" or another "Explode" option.
- If you want a single row for each person, you should use the "Copy" option for the "phone" array.
Explode Outer
By default, the built schema uses the "Explode Outer" option for the Output Transform Action on any arrays in your JSON object. Explode Outer behaves like Explode, but it also generates rows in the even of Null values. Using the above JSON, for example, if the value of the work phone number was Null - that is, if an entry existed for "type":"work"
, but the entry for number contained no value or was missing - an output row would still be created, containing a Null value.
Pos Explode
Pos Explode is simlar to Explode. Additionally, you can specify a related field. The related field must have the same Source Data Type and Output Transform Action. The data is considered to be related by its position in each array. For example, you could start with the following simple JSON:
{"field": ["first","second"], "relatedField": ["one","two"]
If you handle each array by using Explode or Explode Outer, the generated output contains four rows, one for each combination of the two values in each array.
field | relatedField |
---|---|
first | one |
first | two |
second | two |
second | one |
If you select the Pos Explode option for each array, and specify "relatedField" as the Related Field of "field", the output contains two rows:
field | relatedField |
---|---|
first | one |
second | two |
Pos Explode Outer
Pos Explode Outer behaves the same as Pos Explode, but also includes Null values in its output.
Building JSON schema manually
You can use the Transformation Builder to manually generate a schema that the JSON parse node can use to parse a JSON object.
Adding the initial struct
Starting with a blank schema, you should first add a Struct field. This field will contain all the fields of your JSON object.
- Click Add.
- Select Field from the drop down list. The Add Transformation dialog is displayed.
- Leave Source Name blank.
- For Source Data Type, choose "Struct".
- Click Save.
Adding fields
Next you need to add fields and arrays, depending on the structure of your JSON object.
Use the following steps to add the required fields for the following simple JSON object: {"region":"<region_name>", "city": "<city_name>"}
. For more information about the available options when adding and editing fields, see Transformation options.
- Select the Struct you created above.
- Click Add.
- Select Field from the drop down list. The Add Transformation dialog is displayed.
- For Source Name, enter "region".
- For Source Data Type, choose "String".
- Check that the default Output Transform Action of "Copy" is automatically selected.
- When you selected the Source Data Type, the Output Field Name value was added. Check that it is the same as the value you entered in Source Name, "region".
- Click Save. The Add Transformation dialog closes, and the Edit Transformations window is updated, with the "region" field added as part of the Struct.
- Click Add.
- Select Field.
- For Source Name, enter "city".
- For Source Data Type, choose "String".
- Check that Output Transform Action is Copy, and Output Field Name is "city".
- Click Save.
- Click OK to close the Edit Transformations window.
- Click Accept Changes.
Transformation options
Source Name
The name of the JSON field.
Source Data Type
The data type of the JSON field. Choose from the available options:
- Array (Boolean)
- Array (Number)
- Array (String)
- Array (Struct)
- Boolean
- Number
- String
- Struct
Notes:
- Arrays of mixed type are not supported.
- If you add a non-array field to a schema, you can not later change the field data type to an array type by editing the field. You will need to add a new field and delete the existing field.
-
For hortonworks (which is on an older version of Spark), when the transform action for a field of type "Array (Struct)" is set to "Copy", no child at any level in the field can be set to "Ignore".
Output Transform Action
Determines how the input field should be handled in the output. The available actions varies based on JSON data type. The following options are available for all data types.
- Copy - Includes the JSON field's value as a field in the output.
- Ignore - The field is excluded from the output. If the field is an array, any elements inside the array are excluded from the output.
If the Source Data Type is an Array, the following additional "Explode" options are available.
- Explode - Includes the field value in the output. Each non-null value generates a separate row in the output.
- Explode Outer - Includes the field value in the output. Each array value, including Null values, generates a separate row in the output.
- Pos Explode - Includes the field value in the output. Each non-null value generates a separate row in the output. Additionally, you can select a Related Field that contains data that is related by its position in the array. The related field must have the same Source Data Type and Output Transform Action.
- Pos Explode Outer - Includes the field value in the output. Each array value, including Null values, generates a separate row in the output. Additionally, you can select a Related Field that contains data that is related by its position in the array. The related field must have the same Source Data Type and Output Transform Action. For more information about array options, see Array handling.
Output Field Type
The data type of the output field.
You can only select an Output Field Type when the Source Data Type is "Number" or "Array (Number)".
Choose one of the JSON numeric types:Big Integer, Decimal, Floating Point, Integer.
The default value is Decimal.
Output Field Name
The name for the field in the output.
If an array contains a hierarchy of structs (that are not arrays themselves), then the output field name must be unique across that hierarchy of structs because they will be flattened.
The default value is the value of the Source Name option.