The Anomaly node uses an isolation forest algorithm to detect anomalies in a data set.
This node checks for anomalies using the same algorithms that are used by the Analytics Anomaly node.
The node outputs a field named "IsAnomaly". Anomalous records are identified by a False value, all other "normal" records return a True value in this field.
Properties
Display Name
Specify a name for the node.
The default value is Anomaly.
Input Fields
Click Add Field to select input fields to analyze.
Use the left and right arrows to add or remove fields.
Max number of Sample Records
Optionally specify the maximum number of sample records. Choose from Records or Percent. If Percent is selected, the value in the numeric field is divided by 100 to create the percent value.
The default value is 250 Records.
Number of Trees
Optionally specify the number of isolation trees that will be used by the anomaly detection algorithm.
The default value is 100.
Contamination
Optionally specify a contamination value between 0 (inclusive) and 0.5 (exclusive). This is an estimation of the number of anomalous records in your data set and is used to calculate a threshold score.
A value of 0.1 would compute a threshold score that labels the top 10% scored records as anomalies.
If set to 0, the threshold score is not computed and an anomaly label is not assigned. In this case, the node will execute more quickly and you can use the anomaly scores to decide how to handle the data.
The default value is 0.1.
Contamination Error Percent
Optionally specify a Contamination Error Percent value. The threshold score computation can be time consuming, so to speed it up, approximation can be applied. The Contamination Error Percent is the error allowed in approximation.
A value of 1 would allow the computation to be within plus or minus 1%.
A value of 0 means that an exact calculation will be used.
The specified value is converted to a percent value.
The default value is 1.