Segmentation - Data360_DQ+ - Latest

Data360 DQ+ Help

Product type
Software
Portfolio
Verify
Product family
Data360
Product
Data360 DQ+
Version
Latest
Language
English
Product name
Data360 DQ+
Title
Data360 DQ+ Help
Copyright
2024
First publish date
2016
Last updated
2024-10-09
Published on
2024-10-09T14:37:51.625264
Note: Before using the Analytics nodes, you first need to create an "Analytic Model", see Creating analytic models.

The Segmentation node uses clustering algorithms to perform "unsupervised" learning. Unsupervised learning takes raw, unlabeled data and clusters it into segments. Data points that have been clustered into the same segment are likely to be related in some way.

Segementation

The Segmentation node can be useful if you have a data set to which you would like to apply discrete categories. For example, given a data set containing customer demographics and purchase records, you could use the Segmentation node to derive market segments.

Segmentation training

To perform segmentation training, you will need an unlabeled data set, that is, one where records have not been previously categorized. This data set will need to contain a set of fields, or parameters, that the analytic model can compare to create segments.

Loan example: Training

This example uses a sample from a loan data set. The object of training is to group loan applicants into customer segments based on parameterized information about the applicant.

Field names: amt = Loan amount, yEm = Years employed, dti = Debt to income ratio, inc = Annual income, intR = Interest rate, term = Loan term, inst = Monthly installment, grade = Loan grade.

id

amt

yEm

dti

inc

intR

term

inst

grade

001

5000

10

20.97

80000

10.99

36

163.67

B

002

5000

3

24

28800

13.33

36

169.27

C

003

15000

10

10.81

75000

12.29

36

500.3

C

004

24000

9

23.76

87818

19.99

60

635.72

E

005

13200

10

24.05

70000

9.17

60

275.11

B

006

19000

10

13.12

56900

17.86

60

481.03

D

007

14500

5

24.25

63500

9.17

60

302.2

B

008

35000

2

30.61

84000

18.25

60

893.54

E

009

25000

10

8.62

110000

7.89

60

505.6

A

010

18000

1

21.46

70000

13.99

36

615.11

C

  1. Select Train in the Operation property.
  2. For training, eight parameters are used as Input Fields: amt, yEm, dti, inc, intR, term, inst, and grade.
    Note: Fields that are not specified as inputs to the analytic model (id in this example) will still be passed through the node attached to records.
  3. Select an Analytic Model. Note that you can only select from Segmentation type analytic models.
  4. Specify a Prediction Field, for example CustomerSegment.
  5. Select the Segmentation tab and specify 4 segments in the Number Of Desired Segments property. This would actually give 5 segments (0, 1, 2, 3 and 4).
  6. Run the analysis.

The Segmentation node outputs a data set containing all specified input fields, the Prediction Field containing the segment number, and any other fields attached to records.

id

amt

yEm

dti

inc

intR

term

inst

grade

CS

001

5000

10

20.97

80000

10.99

36

163.67

B

3

002

5000

3

24

28800

13.33

36

169.27

C

0

003

15000

10

10.81

75000

12.29

36

500.3

C

0

004

24000

9

23.76

87818

19.99

60

635.72

E

4

005

13200

10

24.05

70000

9.17

60

275.11

B

1

006

19000

10

13.12

56900

17.86

60

481.03

D

4

007

14500

5

24.25

63500

9.17

60

302.2

B

4

008

35000

2

30.61

84000

18.25

60

893.54

E

4

009

25000

10

8.62

110000

7.89

60

505.6

A

1

010

18000

1

21.46

70000

13.99

36

615.11

C

2

Note: Training will also create a child model within the selected analytic model. You can use this child model at a later date for scoring, see Creating analytic models.

Segmentation scoring

Prerequisite: You have selected a child model within your analytic model to use for scoring, see Creating analytic models.

Once you have created a child model to use for scoring, you can create another analysis that uses a Segmentation node to score a new data set. The new data set must contain the same fields that were used as parameters when the scoring model was trained. During scoring, the Segmentation node will compare the values in these fields to values in the scoring model in order to assign segments.

Loan example: Scoring

This example continues the "Loan example: Training" from above.

  1. Another data set with the same parameters is used as an input to a Segmentation node in another analysis.
  2. Select Score in the Operation property.
  3. Select the same Analytic Model that was used in training.
  4. Specify a Prediction Field, for example CustomerSegment.
  5. Run the analysis.

The Segmentation node outputs a data set containing the input fields, the Prediction Field, and any other field attached to records. You could use dashboards or the visualizer to further explore this data set.

Segmentation evaluation

The Segmentation node does not feature an Evaluate operation, however the results of training/scoring can still be evaluated by using the Data Store Output from a training or scoring run to build a data view.

Loan example: Evaluation

In the case of the loan example outlined above, placing the Data Store Output of training or scoring into a data view would allow you to analyze loan applicant segments. Such a data view would enable dashboards that visualize the parameters describing applicants, showing which applicants have similar parameter values. These applicants would be grouped into segments, or "clusters".

Properties

Display Name

Specify a name for the node.

The default value is Segmentation.

Model tab

Operation

Select an operation type. Choose from:

  • Train
  • Score

Input Fields

Click Add Field to select input fields to analyze.

Analytic Model

Select an analytic model. You can only choose from Segmentation type models.

Label Field

Enter a name for a label field which will be included in the output of the node.

Prediction Field

Enter a name for a prediction field which will be included in the output of the node.

Segmentation tab

Number Of Desired Segments

Specify the number of segments that you want to create.

Note: Segment numbering begins at 0. For example, if you want 4 segments, you would actually need to specify 3.

Max Number Of Iterations To Run

Specify the maximum number of iterations to run.