Extracts association rules from an association rule model.
This node uses the embedded R engine to extract rules that match specified criteria from a serialized association rule model file. The association rule model is typically created using the Market Basket Analysis node. The extracted rules can be sorted and output, and optionally saved as a new serialized model file.
The following criteria can be specified when extracting rules from the model:
- The minimum "Support" level.
- The maximum "Support" level.
- The minimum "Confidence" level.
- The maximum "Confidence" level.
- The minimum "Lift" level.
- The maximum "Lift" level.
"Support" is the proportion of transactions that contain the items in a particular itemset.
"Confidence" is the conditional probability of one itemset being in a transaction given the presence of another (antecedent) itemset, i.e. the probability of finding the RHS itemset of the rule in the transactions under the condition that these transactions also contain the LHS itemset.
"Lift" is the ratio of the observed support to that expected if the LHS and RHS were independent and is a measure of how likely the rule is to be not a coincidence (i.e. a Lift value of 1 would imply the association was purely random chance).
The extracted rules can be sorted (ascending/decending) by Support level, by Confidence level or by Lift.
The association rule model that contains the extracted rules can optionally be saved to a new file.
When run, the node provides a summary of the rules in the input association rule model, a summary of the extracted rules and details of each extracted rule.
The Summary pin contains summary statistics for both the original rules in the input association rule model and statistics for the model that contains the extracted rules. These fields each include information on:
- The number of rules.
- The distribution of the length of the rules in terms of the total number of items in the rule (in both the LHS and RHS of the rule).
- Minimum, maximum and quartile statistics.
- Quality statistics for Support, Confidence and Lift.
- The number of transactions analyzed in the original model.
- The Support and Confidence values used when originally deriving the rules.
The Summary pin also contains the path to the file containing the input association rule model. If the model containing the extracted rules is saved to a file the Summary pin also includes the path to the file containing the extracted rules model.The Results pin contains details of the extracted association rules and includes information on:
- A list of the items in the left hand side (antecedent) itemset.
- A list of the items in the right hand side (consequent) itemset.
- Support value for the rule.
- Confidence value for the rule.
- Lift value for the rule.
Powered by TIBCO®
Properties
ModelName
Optionally specify the name of a model which is displayed on the output data.
A model name must start with a letter and may contain any of the following:
- letters
- numbers
- period character (".")
- underscore ("_")
If not specified, a default model name "MBMiner" is displayed on the output data. Where the node is configured to write the serialized model to a file, the model name is used as the output filename.
File
Specify the absolute filepath to the file containing the association rules to be mined.
Choose the (from Field) variant of this property to look up the value from an input field with the name specified.
A value is required for this property.
MinimumSupport
Optionally specify the minimum level of support for an output association rule. Must be a positive number.
If not specified, no minimum will be used.
MaximumSupport
Optionally specify the maximum level of support for an output association rule. Must be a positive number.
If not specified, no maximum will be used.
MinimumConfidence
Optionally specify the minimum level of confidence for an output association rule. Must be a positive number.
If not specified, no minimum will be used.
MaximumConfidence
Optionally specify the maximum level of confidence for an output association rule. Must be a positive number.
If not specified, no maximum will be used.
MinimumLift
Optionally specify the minimum level of lift for an output association rule. Must be a positive number.
If not specified, a value of 1 will be used.
MaximumLift
Optionally specify the maximum level of lift for an output association rule. Must be a positive number.
If not specified, no maximum will be used.
ModelOutputMode
Optionally specify whether the serialized model is written to a file on disk.
This property also determines how ModelOutputField and ModelOutputDirectory behave.
The default value is None.
ModelOutputField
Optionally specify a name for the output field that contains the full path of the file where the association rule results have been written.
The default value is "Mined_ResultsOutput".
ModelOutputDirectory
Specify the directory where the serialized model is written when ModelOutputMode is set to File.
When ModelOutputDirectory is blank, files are written to the Data360 Analyze temporary directory. Otherwise, the files are written to the specified directory - the specified directory must exist and be writeable.
This node will not overwrite existing files by default. This behavior can be set in the ExceptionBehavior tab.
SortByField
Optionally specify the field by which the output rules will be sorted. Choose from:
- support - the rules are sorted by the 'support' field.
- confidence - the rules are sorted by the 'confidence' field.
- lift - the rules are sorted by the 'lift' field.
- none - the rules are not sorted.
The default value is support.
The direction of the sort is controlled by the SortDirection property.
SortDirection
Optionally specify the direction by which the output rules will be sorted when the SortByField property is set to a value other than None. Choose from:
- decreasing - The rules are sorted in decreasing value by the field specified in the SortByField property.
- increasing - The rules are sorted in increasing value by the field specified in the SortByField property.
The default value is decreasing.
ModelOutputFileExistsBehaviour
Optionally specify whether an existing serialized model file will be overwritten. Choose from:
- Error - Generate an error and do not overwrite the file.
- Log - Log a warning message and do not overwrite the file.
- Ignore - Do not overwrite the file.
- Overwrite - Overwrite the file.
The default value is Error.
Inputs and outputs
Inputs: 1 optional.
Outputs: Summary, Results.