Recommend - Data360_DQ+ - Latest

Data360 DQ+ Help

Product type
Software
Portfolio
Verify
Product family
Data360
Product
Data360 DQ+
Version
Latest
Language
English
Product name
Data360 DQ+
Title
Data360 DQ+ Help
Copyright
2024
First publish date
2016
ft:lastEdition
2024-07-09
ft:lastPublication
2024-07-09T15:09:58.774265
Note: Before using the Analytics nodes, you first need to create an "Analytic Model", see Creating analytic models.

You can use the Recommend node to find new items that a customer might be interested in.

The Recommend node requires a data set containing at least three fields:

  • A field that uniquely identifies users
  • A field that uniquely identifies products
  • A numeric ranking field that represents how a given user has rated a product

With these three fields, the Recommend node can compare the rankings that different users have given to the same products to predict how users might rank products that they have not yet experienced. For example:

User Product User's ranking of product
User 1 A 7
  B 6
  C 9
User 2 A 6
  B 7
  C ?

In this example, the Recommend node could predict how User 2 might rank Product C. This prediction would be based on how User 1 ranked Product C, given that User 1 and User 2 ranked Products A and B quite similarly. In cases where predicted rankings are high, the new product could then be recommended to the user.

Recommendation training

To perform recommendation training, you will need a labeled data set, that is, one where users have ranked products.

Product recommendation example: Training

This example uses a sample from a product rankings data set. The object of training is to predict how users might rank products that they have not used.

user

product

ranking

001

a

99

001

b

42

001

c

73

002

a

33

002

b

63

  1. For training, specify ranking in the Rating Field property.
  2. Specify user in the User Field property.
  3. Specify product in the Product Field property.
  4. Specify a Prediction Field with an appropriate name, for example predictedRank.
  5. Run the analysis.

The Recommend node outputs a data set containing the user and product input fields, along with the Prediction Field in place of the ranking field:

user

product

predictedRank

001

a

99

001

b

35

001

c

68

002

a

40

002

b

51

Note: Training will also create a child model within the selected analytic model. You can use this child model at a later date for scoring, see Creating analytic models.

Recommendation evaluation

To evaluate the accuracy of an analytic model, and by extension the accuracy of scoring that is performed using that model, you can use the Recommend node's Evaluate operation.

To evaluate a child training model, you need to use a validation data set as an input to a Recommend node, see Generating training and validation data sets.

Product recommendation example: Evaluating the child training model

To evaluate the child model created during the product ranking data set training:

  1. Provide a validation data set as input to the Recommend node.
  2. Select Evaluate in the Operation property.

The evaluation produces an RMSE:

ModelDisplayName

ChildModelDisplayName

Rank

RMSE

Recommendation Model

Child Model 1

1

15.82

Recommendation re-training and re-evaluating

After training and evaluating your first child model, you can choose to train another one in order to obtain a better RMSE and more accurate scoring results.

To retrain for recommendation, you will need new data. Each time you re-train using the same analytic model, another child model is produced. Once a new child model is produced, you can then evaluate it using the data store output and child model that was produced by your new training attempt. If a new child model is found to have a lower RMSE, you could then use it for scoring.

Recommendation scoring

Prerequisite: You have selected a child model within your analytic model to use for scoring, see Creating analytic models.

Once you have selected a child model to use for scoring, you can create another analysis that uses a Recommend node to score an unlabeled data set, that is, to predict values for each record. There are three types of scoring with the Recommend node. In the Score Type property you can choose from:

  • Ratings - Given a user field and a product field, predict a rating field that represents how that user might rate that product.
  • Users - Given a product field, find ratings that were given to the product.
  • Products - Given a user field, find products to recommend to the user.

Product recommendation example: Scoring

This example completes the product recommendation data set examples in this topic. You have an unlabeled data set containing a user field and a product field. The values in these fields are the same as the set of values used in training:

user

product

1

A

1

B

1

C

2

A

2

B

2

C

  1. Select a child model to use for scoring, for example Child Model 1.
  2. Provide the unlabeled data set as input to a Recommend node.
  3. Select Score in the Operation property.
  4. Select Ratings in the Score Type property.
  5. Specify user in the User Field property.
  6. Specify product in the Product Field property.
  7. Specify a name for the Prediction Field to hold the rating values, for example predictedRank.
  8. Select a data type for the prediction field in the Prediction Field Type property, for example Integer.

The following results are produced:

user

product

predictedRank

1

A

91

1

B

30

1

C

76

2

A

46

2

B

50

2

C

60

You can then output this data set to a new data store and use it in other data stages, such as a dashboard. Note that in this example, the scoring model was able to generate a prediction about how User 2 would rank product C, based on how User 1 ranked products A, B, and C.

Regression or Recommendation: Root Mean Square Error (RMSE)

The Root Mean Square Error (RMSE) is a measure used to evaluate Regression or Recommendation models. RMSE is the square root of the mean of the square of the summation of all errors between predicted values and labeled values.

In general, the lower the RMSE, the better the performance of a model. What typifies a "low" RMSE depends on the range of values in the model's label field.

If there are large errors between predicted values and labeled values (i.e. a high ), this will magnify the RMSE because this value is squared.

Properties

Display Name

Specify a name for the node.

The default value is Recommend.

Operation

Select an operation type. Choose from:

  • Train
  • Score
  • Evaluate

Analytic Model

Select an analytic model. You can only choose from Recommendation type models.

Rating Field

Select an input field to use for ranking. This must be a numeric field where users have rated items.

User Field

Select an input field that contains the user information.

Product Field

Select an input field that contains the product information.

Prediction Field

Enter a name for a prediction field which will be included in the output of the node.