Classifies data into a specified number of clusters.
The data are partitioned into a hierarchy of sub-groups. The hierarchy of sub-groups is constructed from the bottom up by clustering the observations such that the distance between observations is minimized at each step.
This node uses the embedded R engine to classify the input data using an "unsupervised" agglomerative clustering algorithm to identify the optimum cluster assignment for each of the observations and group sub-groups in a hierarchy until only one group is formed. You specify the number of sub-clusters to be formed at the lowest level of the hierarchy. You can specify the field in the data that contains the label for each row in the records; if this is not specified, a record identifier field is added to the output data.
The node analyzes the input variables when performing the hierarchical cluster analysis. The variables must have a numeric data type (with the exception of the row names field). The node applies the "Euclidean" distance measure and "Complete" clustering method. The data are partitioned into the specified number of clusters at the lowest level of the hierarchy. The permitted number of clusters ranges from 1 to N where N is the number of observations (records) in the input data.
The node outputs the input data together with details of the cluster assignment for each record at the lowest level in the hierarchy.
Powered by TIBCO®
Properties
NumberClusters
Specify the number of clusters into which data observations will be clustered.
The value must be a minimum of 1 and a maximum of N, where N is the number of input data records. A value is required for this property.
RowNames
Optionally specify the name of the field in the input data that contains the row names. If not specified, a row name field is added to the output data.
Inputs and outputs
Inputs: data.
Outputs: assignedClusters.