Anomaly - Data360_DQ+ - Latest

Data360 DQ+ Help

Product type
Software
Portfolio
Verify
Product family
Data360
Product
Data360 DQ+
Version
Latest
ft:locale
en-US
Product name
Data360 DQ+
ft:title
Data360 DQ+ Help
Copyright
2025
First publish date
2016
ft:lastEdition
2025-02-20
ft:lastPublication
2025-02-20T08:06:02.625000
Note: Before using the Analytics nodes, you first need to create an "Analytic Model", see Creating analytic models.

The Anomaly node uses an isolation forest algorithm to detect anomalies in a data set.

This node checks for anomalies using the same algorithms that are used by the Analytics Anomaly node.

The node outputs a field named "IsAnomaly". Anomalous records are identified by a False value, all other "normal" records return a True value in this field.

Properties

Display Name

Specify a name for the node.

The default value is Anomaly.

Input Fields

Click Add Field to select input fields to analyze.

Use the left and right arrows to add or remove fields.

Max number of Sample Records

Optionally specify the maximum number of sample records. Choose from Records or Percent. If Percent is selected, the value in the numeric field is divided by 100 to create the percent value.

The default value is 250 Records.

Number of Trees

Optionally specify the number of isolation trees that will be used by the anomaly detection algorithm.

The default value is 100.

Contamination

Optionally specify a contamination value between 0 (inclusive) and 0.5 (exclusive). This is an estimation of the number of anomalous records in your data set and is used to calculate a threshold score.

A value of 0.1 would compute a threshold score that labels the top 10% scored records as anomalies.

If set to 0, the threshold score is not computed and an anomaly label is not assigned. In this case, the node will execute more quickly and you can use the anomaly scores to decide how to handle the data.

The default value is 0.1.

Contamination Error Percent

Optionally specify a Contamination Error Percent value. The threshold score computation can be time consuming, so to speed it up, approximation can be applied. The Contamination Error Percent is the error allowed in approximation.

A value of 1 would allow the computation to be within plus or minus 1%.

A value of 0 means that an exact calculation will be used.

The specified value is converted to a percent value.

The default value is 1.