Uses a linear regression model to predict the value of a dependent variable based on the specified values of the independent variables.
The node accepts the file path to a file that contains an R serialized linear regression model object. The file path can be specified as a Literal value or be obtained from a specified field on the node's optional second input pin.
An optional property can be configured to specify whether confidence limit (confidence bound) values are to be output. If not specified, the confidence limit values are output by default. If confidence limits are to be output, the confidence limit percentile value can also be specified. If the confidence limit percentile value is not specified then the 95 percentile is used by default.
The node can be configured to include or exclude the original observations from the serialized model in the output data. If not specified, the observation data are excluded by default.
When run, the node uses the embedded R engine to predict the value of the model's dependent (response) variable based on the values of the independent variables that are present on the node's data input pin. The names of the fields in the input data must correspond to the names of the independent variables that were used to construct the linear regression model.
The Summary pin contains a summary of the input serialized model and the file path to the model. The model summary includes information on:
- The call used to generate the model.
- Range and quartile values for the residual errors.
- The estimates for the coefficients of the independent variables used in the model and the estimate of the intercept, the standard error, the t-statistic value and p-value.
- Significance code indicators for the independent variables and the intercept.
- The residual standard error.
- The Multiple R-squared (coefficient of determination) value and Adjusted R-squared value.
- The F-statistic with the corresponding p-value for that test.
Predicted values for the dependent variable, together with the corresponding (predictor) values for the independent variables are output on the Results pin. If the node has been configured to generate the confidence limits, these values are also output on the results pin. The type field indicates the type of value in the record (corresponding to the lower confidence limit, higher confidence limit and original model observations).
Powered by TIBCO®
Properties
ModelFilePath
Specify the filepath of the Linear Regression model to be used when predicting dependent values.
Choose the (from Field) variant of this property to look up the value from an input field with the name specified. A value is required for this property.
ShowConfidenceLimits
Optionally specify whether values for the confidence limits are to be included in the output data. The default value is True.
ConfidenceLevel
Optionally specify the percentile to be used when calculating the confidence limits. If not specified the 95 percentile is used.
OutputModelObservations
Optionally specify whether the values of the dependent variable and independent variables in the Linear Regression model are to be output. Choose from:
- Exclude model observations.
- Include model observations.
The default value is Exclude model observations.
Inputs and outputs
Inputs: data, 1 optional.
Outputs: Summary, Results.