Rules - Data360_DQ+ - Latest

Data360 DQ+ Help

Product type
Software
Portfolio
Verify
Product family
Data360
Product
Data360 DQ+
Version
Latest
Language
English
Product name
Data360 DQ+
Title
Data360 DQ+ Help
Copyright
2024
First publish date
2016
Last updated
2024-12-12
Published on
2024-12-12T10:34:46.959692

When you have defined a rule group, you can start to add rules to it.

You can create new rules to add to a rule group, or copy existing rules from one rule library to another rule library within the same pipeline. For more information, see Copying rules.

You can add the following types of rules:

Completeness rule

A Completeness rule is used to determine whether a selected field has completely passed a set of checks. If all the checks pass, True is returned. If at least one check fails, False is returned.

Placeholder Field

This is a field from the Data Quality Rule Group that you would like to use in the Completeness rule.

Note: When using the Completeness rule in an analysis, the data type of the field being evaluated in the analysis must match that of the Completeness rule's Placeholder Field.

Not Null

Returns True if a record value found in the Placeholder Field is not null.

Returns False if a record value found in the Placeholder Field is null.

Not Empty

Returns True if a record value found in the Placeholder Field is not null or has a String length greater than 0.

Returns False if a record value found in the Placeholder Field is null or has a String length of 0.

Not Blank

Returns True if a record value found in the Placeholder Field is not blank.

Returns False if a record value found in the Placeholder Field is blank. Values that are considered blank are those that are empty (as defined above in Not Empty), or those that contain only spaces.

Static Text

Used to compare values in the Placeholder Field to a static String. If a record value found in the Placeholder Field matches the Static Text, this check returns True. If a record value found in the Placeholder Field does not match the Static Text, this check returns False.

Check Expression

This property allows you to create an expression using functions and the Completeness rule's Placeholder Field. If a record value found in the Placeholder Field causes the expression to evaluate to True, this check will return True, otherwise, this check will return False.

Consistency rule

A Consistency rule evaluates whether a record's values satisfy an expression. If a record satisfies the conditions in an expression, the Consistency Rule will return True. Conversely, when a record doesn't satisfy an expression, the node will output False.

Note: There is no Placeholder Field parameter for Consistency rules because all Placeholder Fields from the Data Quality Rule Group may be used.

Check Expression

This parameter allows you to create an expression using functions and the Consistency rule's Placeholder Field. If a record value found in the Placeholder Field causes the expression to evaluate to True, this check will return True, otherwise, this check will return False.

Value Conformity rule

A Value Conformity rule checks whether data set values match specific values or fall within defined value ranges. If a data set value matches any of the values in a Value Conformity Check or falls within any of the specified ranges, a True value is returned. Conversely, if a data set value does not match any of the values in a Value Conformity Check or falls outside of the specified ranges, a False value is returned.

For example, if you have configured a value conformity rule that checks against the following ranges, 1-15, 35-45 and 65-75, a data set value of 20 would return a False value, as 20 does not fall within any of the specified ranges.

To reverse the functionality, you can select Negate results of checks to create an "exclusion list". In this case, the rule checks that specific values, or specified ranges of values, are excluded from a data set. With this option selected, if a data set value matches any of the values in a Value Conformity Check, or falls within any of the specified ranges, a value of False is returned. If there are no matching values, or if the data set values fall outside of the specified ranges, a value of True is returned.

Placeholder Field

This is a field from the Data Quality Rule Group that you would like to use in the Value Conformity Rule. When using the Value Conformity Rule in an Analysis, the data type of the field being evaluated in the Analysis will need to match that of the Value Conformity rule's Placeholder Field.

Checks

  • Value List - Checks to see if a data set contains any values that match any of the values in a specified set. Values in the set may be added one at a time using the New Value option or as a comma separated list with the Multiple Values option.
  • External Value List - Also checks to see if a value matches any values from a set of values, however here the set of values should come from a Data Store field within your Pipelines. The specific Data Store to be used is selected via the Source Data Store parameter. The specific field within the Data Store to be used is selected via the Values Field parameter. Once these parameters are specified, unique values from the Data Store's field will be used for rule evaluation.
  • Range - Checks to see if a data set contains any values within a given range. This check is available only for numeric, date, time and datetime fields. When Range is selected, you can enter values by using the from and to parameters.

Type Conformity rule

A Type Conformity rule is used to check whether data set values conform to patterns. If a data set value matches the pattern in a Type Conformity Check, a True value is returned. Conversely, if a data set value does not match the pattern in a Type Conformity Check, a False value is returned.

Placeholder Field

This is a field from the Data Quality Rule Group that you would like to use in the Type Conformity rule.

Note: When using the Type Conformity rule in an analysis, the data type of the field being evaluated in the analysis will need to match that of the Type Conformity rule's Placeholder Field.

Checks

  • Regular Expression - Checks to see if a value conforms with the specified Regular Expression Pattern.
  • Number - Checks to see if a value conforms to a specific number formatting pattern. Either predefined formatting patterns or a custom pattern can be used.
  • Date - Checks to see if a value conforms to a specific date formatting pattern. Either predefined formatting patterns or a custom pattern can be used.

Regular Expression rule

The Regular Expression rule is used to check whether values in a Referenced Placeholder Field conform to a specific regular expression.

Referenced Placeholder Field

This is a field from the Data Quality Rule Group that you would like to use in the Regular Expression rule.

Note: When using the Regular Expression rule in an analysis, the data type of the field being evaluated in the analysis must match that of the Regular Expression rule's Referenced Placeholder Field.

Regular Expression Pattern

Checks to see if a value conforms with the specified Regular Expression Pattern.

Timeliness rule

The Timeliness rule checks whether the amount of time between two fields surpasses an acceptable level. If the difference between the two chosen fields is less than or equal to the acceptable interval, the record is considered timely and a True value is returned. Conversely, if the difference between the two chosen fields is greater than the acceptable interval, the record is not considered timely and a False value is returned.

Start Time Placeholder

This is a Date, Date and Time, or Time field from the Data Quality Rule Group that you would like to compare to another, end time field.

End Time Placeholder

This is a Date, Date and Time, or Time field from the Data Quality Rule Group that you would like to compare to another, start time field.

Allowed Time Difference

If the difference between the two chosen fields is less than or equal to the Allowed Time Difference, the record is considered timely and a True value is returned. Conversely, if the difference between the two chosen fields is greater than the Allowed Time Difference, the record is not considered timely and a False value is returned.

Uniqueness rule

The Uniqueness rule checks for unique records in a data set. If a unique record is identified, the check returns a True value. Otherwise, if the record is not unique, a value of False is returned.

Checks

Click Add then select a placeholder field that you have defined in the Data Quality Rule Group. The data type of the placeholder field should match the data type of the field that you want to analyze.

Select a match type, choose from:

  • Exact Match - The rule checks for identical values in the selected field.
  • Match by Expression - The rule uses the specified expression to identify matches. You can write an expression that allows tolerance based matching for numeric fields, or one that is based on fuzzy matching for string fields.

Anomaly rule

The Anomaly rule is used for anomaly detection. If an anomalous record is identified, the check returns a False value. Otherwise, if the record fits within the normal range, a value of True is returned.

Use all fields in incoming record for anomaly detection

This option is selected by default and means that all input fields will be used to detect anomalies.

Fields for anomaly detection

If Use all fields in incoming record for anomaly detection is not selected, you can instead select one or more Placeholder Fields that have been defined in the Data Quality Rule Group.

Max number of Sample Records

Specify the maximum number of sample records to evaluate. Choose from Records or Percent. If Percent is selected, the value is divided by 100 to create the percent value.

The default value is 250 Records.

Number Of Trees

Optionally specify the number of isolation trees that will be used by the anomaly detection algorithm.

The default value is 100.

Contamination

Optionally specify a contamination value between 0 (inclusive) and 0.5 (exclusive). This is an estimation of the number of anomalous records in your data set and is used to calculate a threshold score.

A value of 0.1 would compute a threshold score that labels the top 10% scored records as anomalies.

If set to 0, the threshold score is not computed and an anomaly label is not assigned. In this case, the node will execute more quickly and you can use the anomaly scores to decide how to handle the data.

The default value is 0.1.

Contamination Error Percent

Optionally specify a Contamination Error Percent value. The threshold score computation can be time consuming, so to speed it up, approximation can be applied. The Contamination Error Percent is the error allowed in approximation.

A value of 1 would allow the computation to be within plus or minus 1%.

A value of 0 means that an exact calculation will be used.

The specified value is converted to a percent value.

The default value is 1.

General Expression rule

A General Expression rule is used to determine whether values in a Referenced Placeholder Field cause a Check Expression to evaluate to True or False.

Referenced Placeholder Field

This is a field from the Data Quality Rule Group that you would like to evaluate in the General Expression rule.

Note: When using the General Expression rule in an analysis, the data type of the field being evaluated in the analysis must match that of the General Expression rule's Referenced Placeholder Field.

Check Expression

This is the expression that is used to evaluate the rule's Referenced Placeholder Field. When it is used in an analysis, it will evaluate to either True or False, depending on the expression and the values it encounters.

Script rule

A Script rule creates a reusable script in Javascript. This rule can be reused wherever scripting can be performed, such as in a data store's Transform tab or within an analysis Script node.

Note: A Script rule must be created in a Script Rule Group.

Creating the Script and Function Signatures

When building a Script rule, the Edit Script button can be used to write functions in Javascript. For each function you create, you should also create a Function Signature that defines a Name, Description, Return Data Type, and Parameters for the function.

Basic Script Rule Example

Suppose that within a Script rule, you create the following Script:

function addOne(var1) {

return var1 + 1;

}

With the following Function Signature:

Name

Description

Return Data Type

Parameters

addOne

Adds 1 to value

STRING

var 1

You could then use this Script rule within an analysis Script node. To do so, you would just need to add the Rule Library, Script Group, and Script Rule that contained the addOne function to the Script node using the node's Script Rules tab. Once you have done this, you should be able to call the addOne function in the Script node's script by name.

For example:

output.field1 = addOne(input.field1);

Assuming input.field1 contained the following data:

field1

1

2

3

output.field1 would contain the following after the script had run:

field1

2

3

4

Copying rules

You can reuse rules by copying them from one Rule Group to another Rule Group in the same Rule Library, or by copying them from one Rule Library to another Rule Library within the same pipeline.

Complete the following steps to copy a rule.

  1. Navigate to the Rule Library that you want to copy a rule into.
  2. Click Edit >Edit Stage.
  3. Select the Rule Group that you want to copy a rule into.
  4. Click Copy From, and select whether to copy the rule from This Rule Library or Other Rule Library.
  5. If you selected Other Rule Library, find and select the Rule Library in the Copy From Other Rule Library dialog.
  6. Select the Rule Group that contains the rule you want to copy.
  7. Select the Rule that you want to copy.
  8. Click Copy.
  9. If the rule contains any placeholders that are not present in the current Rule Group, you will be asked to confirm that you want to add them to the current rule group. If you click No, the rule group is not copied. Click Yes to continue and complete the copy.
  10. The rule is copied into the selected rule group.

When a rule is copied into a group, its name is preserved. If a rule group already contains a rule that has the same name, the copied rule has _Copy1 appended to its name and display name.

Note: You can only copy rules to the appropriate rule group type. For example, you cannot copy a script rule into a data quality rule group.

Using a rule library in an analysis

Once you have created a Rule Library containing a Data Quality Rule Group and some reusable rules, you can use the Rule Library in an Analysis by using the Execute Rule Library node. With this node, usage is simply a matter of selecting which Rule Library and rules to execute and mapping the node's incoming fields to the placeholder fields defined in the Rule Library.

For more information about the Execute Rule Library node, see Execute Rule Library.