Hierarchical Clustering - Data360_Analyze - Latest

Data360 Analyze Server Help

Product type
Software
Portfolio
Verify
Product family
Data360
Product
Data360 Analyze
Version
Latest
Language
English
Product name
Data360 Analyze
Title
Data360 Analyze Server Help
Copyright
2024
First publish date
2016
Last updated
2024-11-28
Published on
2024-11-28T15:26:57.181000

Classifies data into a specified number of clusters.

The data are partitioned into a hierarchy of sub-groups. The hierarchy of sub-groups is constructed from the bottom up by clustering the observations such that the distance between observations is minimized at each step.

Tip: Before working with this node, there are a number of prerequisite steps, see Working with the Statistical and Predictive Analytics nodes.
Note: An additional Statistical and Predictive Analytics node pack license is required to run this node. See Applying a node pack license. This node processes data in-memory. Additional RAM will be required when processing data sets with a large volume of data.

This node uses the embedded R engine to classify the input data using an "unsupervised" agglomerative clustering algorithm to identify the optimum cluster assignment for each of the observations and group sub-groups in a hierarchy until only one group is formed. You specify the number of sub-clusters to be formed at the lowest level of the hierarchy. You can specify the field in the data that contains the label for each row in the records; if this is not specified, a record identifier field is added to the output data.

The node analyzes the input variables when performing the hierarchical cluster analysis. The variables must have a numeric data type (with the exception of the row names field). The node applies the "Euclidean" distance measure and "Complete" clustering method. The data are partitioned into the specified number of clusters at the lowest level of the hierarchy. The permitted number of clusters ranges from 1 to N where N is the number of observations (records) in the input data.

The node outputs the input data together with details of the cluster assignment for each record at the lowest level in the hierarchy.

Powered by TIBCO®

Properties

NumberClusters

Specify the number of clusters into which data observations will be clustered.

The value must be a minimum of 1 and a maximum of N, where N is the number of input data records. A value is required for this property.

RowNames

Optionally specify the name of the field in the input data that contains the row names. If not specified, a row name field is added to the output data.

Inputs and outputs

Inputs: data.

Outputs: assignedClusters.