Interval Inspection - Data360_Analyze - Latest

Data360 Analyze Server Help

Product type
Software
Portfolio
Verify
Product family
Data360
Product
Data360 Analyze
Version
Latest
Language
English
Product name
Data360 Analyze
Title
Data360 Analyze Server Help
Copyright
2024
First publish date
2016
Last updated
2024-11-28
Published on
2024-11-28T15:26:57.181000

Identifies gaps in sequenced data sets; verifies that the values in each row are increasing according to a specific pattern defined by a sequence interval.

The interval must be positive, and the pattern can be chosen from the list in the SequencePattern property. Whenever a gap is found, the algorithm uses the final value in the previous interval to start the next interval. This approach avoids false negatives.

For example, if the interval is 2, the rule is Exact, and the sequence is "1, 3, 5, 6, 8, 10", then the only gap that would be reported is between 5 and 6. After that gap is found, the algorithm will start the next sequence with 6 and search for 8, 10, etc. The first row of data in the input and the first row of each group of input data is considered to be in sequence by definition. Therefore, inputs of a single record and groups of a single record will not result in any output.For each gap found, the previous sequence value, the current sequence value, the group (if applicable), and the interval are output. Outputting the interval enables recreating missing values in the data flow. That way, a single data flow can handle multiple sequence intervals without any code changes.

Properties

SortInput

Optionally specify whether the input data is sorted on SequenceExpr or on (GroupBy, SequenceExpr) if grouping is used. If false, then the input data is not sorted.

Please note that this node is designed to work only with ascending data sequences.

The default value is False.

SequenceExpr

Specify a Script expression that is evaluated against each row to determine the sequence value for that row. The sequence value of this row is then compared to the next row in order to determine if the next row is in or out of sequence.

This expression could simply be a field name or a full Script expression. This property must evaluate to a single value.

A value is required for this property.

SequenceInterval

Specify the expected interval of the input data. This value must be positive. It can be an integer or a floating point number.

A value is required for this property.

SequencePattern

Optionally specify the acceptable sequence for the input data. Choose from:

  • Exact - The next value must be exactly SequenceInterval away from the last value.
  • Exact with Gaps - The next value must be exactly SequenceInterval away from the last value or a multiple of SequenceInterval from the last value. In other words, the pattern follows the same style as "Exact" but allows gaps in the sequence.
  • Greater Than - The next value must be more than SequenceInterval away from the previous value.
  • Greater Than or Equals - The next value must be exactly SequenceInterval away from the previous value or more than SequenceInterval away from the previous value.
  • Less Than - The next value must be less than SequenceInterval away from the previous value.
  • Less Than or Equals - The next value must be exactly SequenceInterval away from the previous value or less than SequenceInterval away from the previous value.

The default value is Exact.

If a discrepancy is found, then the final value is used to start the next sequence. In other words, if SequenceInterval is 2, the pattern is Exact, and the sequence is "1,3,5,6,8,10", this node will report a gap from 5 to 6 and then use 6 to start next sequence, thus looking for 8, 10, etc. Due to this "rebasing", only one discrepancy is reported.

GroupBy

Optionally specify a grouping field that divides the input table into groups. A sequence cannot be part of two groups.

By default, the fields specified in GroupBy are output along with the default output fields outlined in the node description. If your grouping fields are named "Previous", "Current", or "Interval", then they will conflict with the default output fields, and this node will be unable to run.

IgnoreDuplicates

Optionally specify whether to ignore duplicates when reporting values out of sequence.

If set to False, this node treats duplicates as out of sequence, otherwise the node will not report duplicate values as out of sequence.

The default value is False.

OutputExpr

Optionally specify whether to enable customization of the output by adding to the default output fields, overriding them, removing them, or renaming them.

By default, the following three fields are output on each run of this node: Previous (the previous sequence value found), Current (the current sequence value), and Interval (the value of SequenceInterval). The optional Group field is provided if the user has provided a field in the GroupBy property.

In this property, the user can use standard Script to exclude, rename, or override these fields. In addition, the user can add to these fields and output any field input in this node. Please note that if you want to override any of the default output fields, then you have to use the "override emit" statement. You can find out more about the override keyword in the Script help.

Epsilon

Optionally specify a tolerance for floating point rounding errors. It is used only if SequenceExpr is of type double. If this value is blank, and SequenceExpr evaluates to floating point values, then direct comparisons are used. In other words, floating point rounding errors are not taken into consideration.

As an example of this type of error, if SequencePattern is Exact, SequenceInterval is 2.0, the previous sequence value is 0, and the next sequence value is 2.00001, then the two values will be considered in sequence if the difference of the last sequence value and current sequence value is less than Epsilon. Otherwise, these two values will be considered out of sequence.

NullValueBehavior

Optionally specify how this node will behave if the SequenceExpr evaluates to NULL. Choose from:

  • Error - The node will raise an error and stop processing once SequenceExpr evaluates to NULL.
  • Log - The node will log the NULL value and continue processing. The NULL value will be skipped in determining sequences.
  • Ignore - The node will ignore the NULL value completely and continue processing as if it were not encountered at all.

The default value is Error.

Inputs and outputs

Inputs: in1.

Outputs: OutOfSequence.