Facilitates dimensional analysis by subsequent nodes, identifying the occurrences of unique value combinations within an input data set, given a designated set of attribute fields.
An output record is created for each of these occurrences, containing aggregated totals of designated calculation fields. A histogram is also produced in the second output, for each of the specified dimension fields in the input data set.
The dimensional slices identified are presented as a set of intersections of the defined dimensions. If we have defined the dimensions to be HOUR_OF_DAY and ORIGINATING_STATE, for instance, we'll get these statistics for each combination of these two dimensions (e.g. calls between 1-2pm from MA, call between 1-2pm from CA, calls between 4-5am from CA, etc.). These intersections give us a more granular approach to data analysis, so we can pinpoint the errors to specific anomalies: there may be a spike in dropped calls between the hours of 1-2pm for calls which originate in California, whereas other states show no such behavior for this hour, and this state shows no such behavior for other hours of the day.
This node performs dimensional analysis on an input data source. Dimensions are defined by the columns listed in the DimensionColumnList property, with positions along each dimension for each value of the dimension field. When executed, the node traverses the data sets and creates a set of tallies using the TallyFieldList fields for the input data. The result data stream contains one line per dimension slice, with the following output fields:
- INTERSECTION_ID - unique key for the slice, constructed of the following four fields.
- DIM1_NAME - field name for the first dimension of the slice.
- DIM1_VALUE - value used as a position along the first dimension.
- DIM2_NAME - field name for the second dimension of the slice.
- DIM2_VALUE - value used as a position along the second dimension.
- OCCURRENCE_COUNT - the count of records from the input data found along a dimensional slice.
Properties
DimensionFieldList
Specify a comma-separated list of fields from the data sources, which will be used as dimensions for the analysis. Each position along each dimension is compared with each position along every other dimension.
A value is required for this property.
TallyFieldList
Specify a comma-separated list of fields from the data sources, which will be used as tallies for the analysis. For each listed input field, an output tally field is generated: <FIELD>_TOTAL: the sum of the values for the <FIELD> from the input data source calculated for the given dimensional intersection.
OutputAllPermutations
Optionally specify whether to outputs all of the permutations for the analysis. This means that if there is an entry with: DIM1_NAME:A, DIM1_VALUE:a & DIM2_NAME:B, DIM2_VALUE:b, there will be a corresponding entry with : DIM1_NAME:B, DIM1_VALUE:b & DIM2_NAME:A, DIM2_VALUE:a.
The default value is False.
DimensionTypeConversion
Optionally specify how to deal with data conversion. Since all of the dimensions are placed in the same output field, they need to be converted to a common output field type. Choose from:
- None - No conversion will take place. The node will fail in any case when there are dimension fields that are not of the same type.
- To String - Values will be converted to strings. The node will fail if Unicode values are encountered which cannot be converted.
- To Unicode - MS Access and SQL Server.
The default value is None.
NullDimensionBehavior
Optionally specify the node behavior when a value in a dimension field is null:
- Ignore - The intersection will be ignored and the node will continue.
- Error - The node will fail and provide error information.
- Emit - A dimension slice, or intersection, will be created when a null value is encountered. This means that null values are treated like all other values.
The default value is Emit.
Inputs and outputs
Inputs: data, multiple optional.
Outputs: Intersections, Dimension Histogram, multiple optional.