The Segmentation node uses clustering algorithms to perform "unsupervised" learning. Unsupervised learning takes raw, unlabeled data and clusters it into segments. Data points that have been clustered into the same segment are likely to be related in some way.
The Segmentation node can be useful if you have a data set to which you would like to apply discrete categories. For example, given a data set containing customer demographics and purchase records, you could use the Segmentation node to derive market segments.
Segmentation training
To perform segmentation training, you will need an unlabeled data set, that is, one where records have not been previously categorized. This data set will need to contain a set of fields, or parameters, that the analytic model can compare to create segments.
Loan example: Training
This example uses a sample from a loan data set. The object of training is to group loan applicants into customer segments based on parameterized information about the applicant.
Field names: amt
= Loan amount, yEm
= Years employed, dti
= Debt to income ratio, inc
= Annual income, intR
= Interest rate, term
= Loan term, inst
= Monthly installment, grade
= Loan grade.
id |
amt |
yEm |
dti |
inc |
intR |
term |
inst |
grade |
---|---|---|---|---|---|---|---|---|
001 |
5000 |
10 |
20.97 |
80000 |
10.99 |
36 |
163.67 |
B |
002 |
5000 |
3 |
24 |
28800 |
13.33 |
36 |
169.27 |
C |
003 |
15000 |
10 |
10.81 |
75000 |
12.29 |
36 |
500.3 |
C |
004 |
24000 |
9 |
23.76 |
87818 |
19.99 |
60 |
635.72 |
E |
005 |
13200 |
10 |
24.05 |
70000 |
9.17 |
60 |
275.11 |
B |
006 |
19000 |
10 |
13.12 |
56900 |
17.86 |
60 |
481.03 |
D |
007 |
14500 |
5 |
24.25 |
63500 |
9.17 |
60 |
302.2 |
B |
008 |
35000 |
2 |
30.61 |
84000 |
18.25 |
60 |
893.54 |
E |
009 |
25000 |
10 |
8.62 |
110000 |
7.89 |
60 |
505.6 |
A |
010 |
18000 |
1 |
21.46 |
70000 |
13.99 |
36 |
615.11 |
C |
- Select Train in the Operation property.
- For training, eight parameters are used as Input Fields:
amt
,yEm
,dti
,inc
,intR
,term
,inst
, andgrade
.Note: Fields that are not specified as inputs to the analytic model (id
in this example) will still be passed through the node attached to records. - Select an Analytic Model. Note that you can only select from Segmentation type analytic models.
- Specify a Prediction Field, for example
CustomerSegment
. - Select the Segmentation tab and specify 4 segments in the Number Of Desired Segments property. This would actually give 5 segments (0, 1, 2, 3 and 4).
- Run the analysis.
The Segmentation node outputs a data set containing all specified input fields, the Prediction Field containing the segment number, and any other fields attached to records.
id |
amt |
yEm |
dti |
inc |
intR |
term |
inst |
grade |
CS |
---|---|---|---|---|---|---|---|---|---|
001 |
5000 |
10 |
20.97 |
80000 |
10.99 |
36 |
163.67 |
B |
3 |
002 |
5000 |
3 |
24 |
28800 |
13.33 |
36 |
169.27 |
C |
0 |
003 |
15000 |
10 |
10.81 |
75000 |
12.29 |
36 |
500.3 |
C |
0 |
004 |
24000 |
9 |
23.76 |
87818 |
19.99 |
60 |
635.72 |
E |
4 |
005 |
13200 |
10 |
24.05 |
70000 |
9.17 |
60 |
275.11 |
B |
1 |
006 |
19000 |
10 |
13.12 |
56900 |
17.86 |
60 |
481.03 |
D |
4 |
007 |
14500 |
5 |
24.25 |
63500 |
9.17 |
60 |
302.2 |
B |
4 |
008 |
35000 |
2 |
30.61 |
84000 |
18.25 |
60 |
893.54 |
E |
4 |
009 |
25000 |
10 |
8.62 |
110000 |
7.89 |
60 |
505.6 |
A |
1 |
010 |
18000 |
1 |
21.46 |
70000 |
13.99 |
36 |
615.11 |
C |
2 |
Segmentation scoring
Prerequisite: You have selected a child model within your analytic model to use for scoring, see Creating analytic models.
Once you have created a child model to use for scoring, you can create another analysis that uses a Segmentation node to score a new data set. The new data set must contain the same fields that were used as parameters when the scoring model was trained. During scoring, the Segmentation node will compare the values in these fields to values in the scoring model in order to assign segments.
Loan example: Scoring
This example continues the "Loan example: Training" from above.
- Another data set with the same parameters is used as an input to a Segmentation node in another analysis.
- Select Score in the Operation property.
- Select the same Analytic Model that was used in training.
- Specify a Prediction Field, for example
CustomerSegment
. - Run the analysis.
The Segmentation node outputs a data set containing the input fields, the Prediction Field, and any other field attached to records. You could use dashboards or the visualizer to further explore this data set.
Segmentation evaluation
The Segmentation node does not feature an Evaluate operation, however the results of training/scoring can still be evaluated by using the Data Store Output from a training or scoring run to build a data view.
Loan example: Evaluation
In the case of the loan example outlined above, placing the Data Store Output of training or scoring into a data view would allow you to analyze loan applicant segments. Such a data view would enable dashboards that visualize the parameters describing applicants, showing which applicants have similar parameter values. These applicants would be grouped into segments, or "clusters".
Properties
Display Name
Specify a name for the node.
The default value is Segmentation.
Model tab
Operation
Select an operation type. Choose from:
- Train
- Score
Input Fields
Click Add Field to select input fields to analyze.
Analytic Model
Select an analytic model. You can only choose from Segmentation type models.
Label Field
Enter a name for a label field which will be included in the output of the node.
Prediction Field
Enter a name for a prediction field which will be included in the output of the node.
Segmentation tab
Number Of Desired Segments
Specify the number of segments that you want to create.
Max Number Of Iterations To Run
Specify the maximum number of iterations to run.