When you have defined a rule group, you can start to add rules to it.
You can create new rules to add to a rule group, or copy existing rules from one rule library to another rule library within the same pipeline. For more information, see Copying rules.
You can add the following types of rules:
Completeness rule
A Completeness rule is used to determine whether a selected field has completely passed a set of checks. If all the checks pass, True is returned. If at least one check fails, False is returned.
Placeholder Field
This is a field from the Data Quality Rule Group that you would like to use in the Completeness rule.
Not Null
Returns True if a record value found in the Placeholder Field is not null.
Returns False if a record value found in the Placeholder Field is null.
Not Empty
Returns True if a record value found in the Placeholder Field is not null or has a String length greater than 0.
Returns False if a record value found in the Placeholder Field is null or has a String length of 0.
Not Blank
Returns True if a record value found in the Placeholder Field is not blank.
Returns False if a record value found in the Placeholder Field is blank. Values that are considered blank are those that are empty (as defined above in Not Empty), or those that contain only spaces.
Static Text
Used to compare values in the Placeholder Field to a static String. If a record value found in the Placeholder Field matches the Static Text, this check returns True. If a record value found in the Placeholder Field does not match the Static Text, this check returns False.
Check Expression
This property allows you to create an expression using functions and the Completeness rule's Placeholder Field. If a record value found in the Placeholder Field causes the expression to evaluate to True, this check will return True, otherwise, this check will return False.
Consistency rule
A Consistency rule evaluates whether a record's values satisfy an expression. If a record satisfies the conditions in an expression, the Consistency Rule will return True. Conversely, when a record doesn't satisfy an expression, the node will output False.
Check Expression
This parameter allows you to create an expression using functions and the Consistency rule's Placeholder Field. If a record value found in the Placeholder Field causes the expression to evaluate to True, this check will return True, otherwise, this check will return False.
Value Conformity rule
A Value Conformity rule checks whether data set values match specific values or fall within defined value ranges. If a data set value matches any of the values in a Value Conformity Check or falls within any of the specified ranges, a True value is returned. Conversely, if a data set value does not match any of the values in a Value Conformity Check or falls outside of the specified ranges, a False value is returned.
For example, if you have configured a value conformity rule that checks against the following ranges, 1-15, 35-45 and 65-75, a data set value of 20 would return a False value, as 20 does not fall within any of the specified ranges.
To reverse the functionality, you can select Negate results of checks to create an "exclusion list". In this case, the rule checks that specific values, or specified ranges of values, are excluded from a data set. With this option selected, if a data set value matches any of the values in a Value Conformity Check, or falls within any of the specified ranges, a value of False is returned. If there are no matching values, or if the data set values fall outside of the specified ranges, a value of True is returned.
Placeholder Field
This is a field from the Data Quality Rule Group that you would like to use in the Value Conformity Rule. When using the Value Conformity Rule in an Analysis, the data type of the field being evaluated in the Analysis will need to match that of the Value Conformity rule's Placeholder Field.
Checks
- Value List - Checks to see if a data set contains any values that match any of the values in a specified set. Values in the set may be added one at a time using the New Value option or as a comma separated list with the Multiple Values option.
- External Value List - Also checks to see if a value matches any values from a set of values, however here the set of values should come from a Data Store field within your Pipelines. The specific Data Store to be used is selected via the Source Data Store parameter. The specific field within the Data Store to be used is selected via the Values Field parameter. Once these parameters are specified, unique values from the Data Store's field will be used for rule evaluation.
- Range - Checks to see if a data set contains any values within a given range. This check is available only for numeric, date, time and datetime fields. When Range is selected, you can enter values by using the from and to parameters.
Type Conformity rule
A Type Conformity rule is used to check whether data set values conform to patterns. If a data set value matches the pattern in a Type Conformity Check, a True value is returned. Conversely, if a data set value does not match the pattern in a Type Conformity Check, a False value is returned.
Placeholder Field
This is a field from the Data Quality Rule Group that you would like to use in the Type Conformity rule.
Checks
- Regular Expression - Checks to see if a value conforms with the specified Regular Expression Pattern.
- Number - Checks to see if a value conforms to a specific number formatting pattern. Either predefined formatting patterns or a custom pattern can be used.
- Date - Checks to see if a value conforms to a specific date formatting pattern. Either predefined formatting patterns or a custom pattern can be used.
Regular Expression rule
The Regular Expression rule is used to check whether values in a Referenced Placeholder Field conform to a specific regular expression.
Referenced Placeholder Field
This is a field from the Data Quality Rule Group that you would like to use in the Regular Expression rule.
Regular Expression Pattern
Checks to see if a value conforms with the specified Regular Expression Pattern.
Timeliness rule
The Timeliness rule checks whether the amount of time between two fields surpasses an acceptable level. If the difference between the two chosen fields is less than or equal to the acceptable interval, the record is considered timely and a True value is returned. Conversely, if the difference between the two chosen fields is greater than the acceptable interval, the record is not considered timely and a False value is returned.
Start Time Placeholder
This is a Date, Date and Time, or Time field from the Data Quality Rule Group that you would like to compare to another, end time field.
End Time Placeholder
This is a Date, Date and Time, or Time field from the Data Quality Rule Group that you would like to compare to another, start time field.
Allowed Time Difference
If the difference between the two chosen fields is less than or equal to the Allowed Time Difference, the record is considered timely and a True value is returned. Conversely, if the difference between the two chosen fields is greater than the Allowed Time Difference, the record is not considered timely and a False value is returned.
Uniqueness rule
The Uniqueness rule checks for unique records in a data set. If a unique record is identified, the check returns a True value. Otherwise, if the record is not unique, a value of False is returned.
Checks
Click Add then select a placeholder field that you have defined in the Data Quality Rule Group. The data type of the placeholder field should match the data type of the field that you want to analyze.
Select a match type, choose from:
- Exact Match - The rule checks for identical values in the selected field.
- Match by Expression - The rule uses the specified expression to identify matches. You can write an expression that allows tolerance based matching for numeric fields, or one that is based on fuzzy matching for string fields.
Anomaly rule
The Anomaly rule is used for anomaly detection. If an anomalous record is identified, the check returns a False value. Otherwise, if the record fits within the normal range, a value of True is returned.
Use all fields in incoming record for anomaly detection
This option is selected by default and means that all input fields will be used to detect anomalies.
Fields for anomaly detection
If Use all fields in incoming record for anomaly detection is not selected, you can instead select one or more Placeholder Fields that have been defined in the Data Quality Rule Group.
Max number of Sample Records
Specify the maximum number of sample records to evaluate. Choose from Records or Percent. If Percent is selected, the value is divided by 100 to create the percent value.
The default value is 250 Records.
Number Of Trees
Optionally specify the number of isolation trees that will be used by the anomaly detection algorithm.
The default value is 100.
Contamination
Optionally specify a contamination value between 0 (inclusive) and 0.5 (exclusive). This is an estimation of the number of anomalous records in your data set and is used to calculate a threshold score.
A value of 0.1 would compute a threshold score that labels the top 10% scored records as anomalies.
If set to 0, the threshold score is not computed and an anomaly label is not assigned. In this case, the node will execute more quickly and you can use the anomaly scores to decide how to handle the data.
The default value is 0.1.
Contamination Error Percent
Optionally specify a Contamination Error Percent value. The threshold score computation can be time consuming, so to speed it up, approximation can be applied. The Contamination Error Percent is the error allowed in approximation.
A value of 1 would allow the computation to be within plus or minus 1%.
A value of 0 means that an exact calculation will be used.
The specified value is converted to a percent value.
The default value is 1.
General Expression rule
A General Expression rule is used to determine whether values in a Referenced Placeholder Field cause a Check Expression to evaluate to True or False.
Referenced Placeholder Field
This is a field from the Data Quality Rule Group that you would like to evaluate in the General Expression rule.
Check Expression
This is the expression that is used to evaluate the rule's Referenced Placeholder Field. When it is used in an analysis, it will evaluate to either True or False, depending on the expression and the values it encounters.
Script rule
A Script rule creates a reusable script in Javascript. This rule can be reused wherever scripting can be performed, such as in a data store's Transform tab or within an analysis Script node.
Creating the Script and Function Signatures
When building a Script rule, the Edit Script button can be used to write functions in Javascript. For each function you create, you should also create a Function Signature that defines a Name, Description, Return Data Type, and Parameters for the function.
Basic Script Rule Example
Suppose that within a Script rule, you create the following Script:
function addOne(var1) {
return var1 + 1;
}
With the following Function Signature:
Name |
Description |
Return Data Type |
Parameters |
---|---|---|---|
addOne |
Adds 1 to value |
STRING |
var 1 |
You could then use this Script rule within an analysis Script node. To do so, you would just need to add the Rule Library, Script Group, and Script Rule that contained the addOne function to the Script node using the node's Script Rules tab. Once you have done this, you should be able to call the addOne function in the Script node's script by name.
For example:
output.field1 = addOne(input.field1);
Assuming input.field1 contained the following data:
field1 |
---|
1 |
2 |
3 |
output.field1 would contain the following after the script had run:
field1 |
---|
2 |
3 |
4 |
Copying rules
You can reuse rules by copying them from one Rule Group to another Rule Group in the same Rule Library, or by copying them from one Rule Library to another Rule Library within the same pipeline.
Complete the following steps to copy a rule.
- Navigate to the Rule Library that you want to copy a rule into.
- Click Edit >Edit Stage.
- Select the Rule Group that you want to copy a rule into.
- Click Copy From, and select whether to copy the rule from This Rule Library or Other Rule Library.
- If you selected Other Rule Library, find and select the Rule Library in the Copy From Other Rule Library dialog.
- Select the Rule Group that contains the rule you want to copy.
- Select the Rule that you want to copy.
- Click Copy.
- If the rule contains any placeholders that are not present in the current Rule Group, you will be asked to confirm that you want to add them to the current rule group. If you click No, the rule group is not copied. Click Yes to continue and complete the copy.
- The rule is copied into the selected rule group.
When a rule is copied into a group, its name is preserved. If a rule group already contains a rule that has the same name, the copied rule has _Copy1
appended to its name and display name.
Using a rule library in an analysis
Once you have created a Rule Library containing a Data Quality Rule Group and some reusable rules, you can use the Rule Library in an Analysis by using the Execute Rule Library node. With this node, usage is simply a matter of selecting which Rule Library and rules to execute and mapping the node's incoming fields to the placeholder fields defined in the Rule Library.
For more information about the Execute Rule Library node, see Execute Rule Library.