The Regression node performs "supervised" learning algorithms to find a best fit relationship between independent and dependent variables in a data set. Once a relationship is found, new data can be compared to the Regression model to make predictions.
Regression works best when attempting to output continuous values. For example, you could apply a Regression node to a data set containing information about product pricing and order quantities to determine how pricing affects demand. Once the Regression node has determined the relationship between the input variables, you can use that relationship to make predictions.
Regression training
To perform Regression training, you will need a labeled data set, that is, one where the desired output value is already known. This data set must also contain a set of fields, or parameters, that the analytic model can associate with values in the Label Field.
Loan example: Training
This example uses a sample from a loan data set. The object of training is to suggest interest rates for applicants based on parameterized information about the applicant.
Field names: amt
= Loan amount, yEm
= Years employed, dti
= Debt to income ratio, inc
= Annual income, intR
= Interest rate.
id 
amt 
yEm 
dti 
inc 
intR 

001 
14000 
10 
27 
68000 
12.29 
002 
25000 
1 
20.09 
85000 
14.65 
003 
6000 
10 
27.8 
85000 
12.69 
004 
15600 
1 
13.37 
95000 
9.17 
005 
9250 
1 
21.76 
70000 
9.99 
006 
2500 
1 
14.8 
45000 
9.99 
007 
10000 
5 
14.97 
93000 
9.17 
008 
20000 
1 
11.81 
135000 
14.65 
009 
3600 
1 
29.69 
29999 
15.61 
010 
20150 
10 
27.55 
48000 
17.86 
 For training, add four parameters in the Input Fields property:
amt
,yEm
,dti
, andinc
.  Specify
intR
in the Label Field property.  Specify a Prediction Field that has the same data type as the field specified in the Label Field property, with an appropriate name, such as
Suggested_Interest_Rate (sIR)
of the Decimal data type.  Run the analysis.
The Regression node outputs a data set containing all specified Input Fields, the Label Field, the Prediction Field, and any other fields associated to records:
id 
amt 
yEm 
dti 
inc 
intR 
sIR 

001 
14000 
10 
27 
68000 
12.29 
12.68 
002 
25000 
1 
20.09 
85000 
14.65 
12.23 
003 
6000 
10 
27.8 
85000 
12.69 
12.22 
004 
15600 
1 
13.37 
95000 
9.17 
10.06 
005 
9250 
1 
21.76 
70000 
9.99 
12.17 
006 
2500 
1 
14.8 
45000 
9.99 
10.65 
007 
10000 
5 
14.97 
93000 
9.17 
10.08 
008 
20000 
1 
11.81 
135000 
14.65 
12.50 
009 
3600 
1 
29.69 
29999 
15.61 
13.54 
010 
20150 
10 
27.55 
48000 
17.86 
15.49 
Regression evaluation
To evaluate the accuracy of an analytic model, and by extension the accuracy of scoring that is performed using that model, you can use the Regression node's Evaluate operation.
To evaluate a child training model, you need to use a validation data set as an input to a Regression node, see Generating training and validation data sets.
Loan example: Evaluating the child training model
To evaluate the child model created during the loan data set training:
 Provide a validation data set as input to the Regression node.
 Select Evaluate in the Operation property.
The evaluation produces an RMSE:
ModelDisplayName 
ChildModelDisplayName 
Rank 
RMSE 

Regression Model 
Child Model 1 
1 
3.82 
Regression retraining and reevaluating
After training and evaluating your first child model, you can choose to train another one in order to obtain a better RMSE and more accurate scoring results. To retrain, you can use new data and/or different parameters as input fields. Each time you retrain using the same analytic model, another child model is produced.
Loan example: Retraining and reevaluating
In this example, the loan data set's analytic model is retrained with two additional parameters for each record: open_acc
and msld
, where open_ acc
= Number of credit lines open in lendee's file and msld
= Months since last delinquency.
 To retrain, edit the original analysis by adding these parameters as Input Fields in the Regression node.
 Rebuild the analysis.
By rebuilding the analysis, a new child model is created within the analytic model.
The analysis outputs a new data set, containing all six Input Fields, the Label Field, and the Prediction Field.
 To determine the effect of adding the two additional parameters to training, use the Regression node's Evaluate operation. This time, select the new child model.
The evaluation produces an RMSE, in this case a slightly improved value:
ModelDisplayName 
ChildModelDisplayName 
Rank 
RMSE 

Regression Model 
Child Model 2 
1 
3.79 
Regression scoring
 Prerequisite: You have selected a child model within your analytic model to use for scoring, see Creating analytic models.
Once you have selected a child model to use for scoring, you can create another analysis that uses a Regression node to score an unlabeled data set, that is, to predict values for each record. The new data set must contain the same fields that were used as parameters when the scoring model was trained. During scoring, the Regression node will compare the values in these fields to values in the scoring model.
Loan example: Scoring
This example continues the from the previous examples in this topic. In evaluation, "Child Model 2" performed slightly better, so this is the model that will be used for scoring.
You have an unlabeled data set containing the model's six Input Fields:
id 
amt 
yEm 
dti 
inc 
open_acc 
msld 

001 
6000 
2 
2.98 
50000 
11 

002 
35000 
10 
14.39 
86000 
13 

003 
10000 
1 
24.44 
60000 
10 
59 
004 
25675 
10 
18.8 
95000 
21 

005 
20000 
2 
17.18 
200000 
31 

006 
9900 
1 
21.96 
45000 
10 
56 
007 
10000 
10 
10.22 
150000 
11 
23 
008 
14000 
1 
12.39 
110000 
11 
80 
009 
18000 
7 
36.91 
85000 
15 
48 
010 
28000 
6 
18.09 
165000 
17 

 Provide the unlabeled data set as an input to the Regression node.
 Select Score in the Operation property.
The following results are produced:
id
amt
yEm
dti
inc
open_acc
msld
sIR
001
6000
2
2.98
50000
11
11.21
002
35000
10
14.39
86000
13
13.54
003
10000
1
24.44
60000
10
59
11.82
004
25675
10
18.8
95000
21
11.14
005
20000
2
17.18
200000
31
9.44
006
9900
1
21.96
45000
10
56
11.71
007
10000
10
10.22
150000
11
23
11.26
008
14000
1
12.39
110000
11
80
11.38
009
18000
7
36.91
85000
15
48
14.33
010
28000
6
18.09
165000
17
9.83
You can then output this data to a new data store for use in other data stages, such as a dashboard.
Regression or Recommendation: Root Mean Square Error (RMSE)
The Root Mean Square Error (RMSE) is a measure used to evaluate Regression or Recommendation models. RMSE is the square root of the mean of the square of the summation of all errors between predicted values and labeled values.
In general, the lower the RMSE, the better the performance of a model. What typifies a "low" RMSE depends on the range of values in the model's label field.
If there are large errors between predicted values and labeled values (i.e. a high ), this will magnify the RMSE because this value is squared.
Properties
Display Name
Specify a name for the node.
The default value is Regression.
Model tab
Operation
Select an operation type. Choose from:
 Train
 Score
 Evaluate
Input Fields
Click Add Field to select input fields to analyze.
Analytic Model
Select an analytic model. You can only choose from Regression type models.
Label Field
Enter a name for a label field which will be included in the output of the node.
Prediction Field
Enter a name for a prediction field which will be included in the output of the node.
Prediction Field Type
Select a data type for the field specified in the Prediction Field. Choose from:
 Boolean
 Date
 String
 DateTime
 Time
 Integer
 Floating Point
 Big Integer
 Decimal
 Currency
Regression tab
Algorithm
Select an algorithm. Choose from:
 Random Forest
 Gradient Boosted Tree
Automatically Tune Parameters
Select this option if you want to automatically tune parameters.
Number Of Trees
Specify the number of trees.
This property is not available if you have selected Automatically Tune Parameters.
Specify no: of classes
Select this option if you want to specify the number of classes, then enter a numeric value.
Max tree depth
Select this option if you want to specify a maximum tree depth, then enter a numeric value.