Predict Linear Regression - Data360_Analyze - Latest

Data360 Analyze Server Help

Product type
Software
Portfolio
Verify
Product family
Data360
Product
Data360 Analyze
Version
Latest
Language
English
Product name
Data360 Analyze
Title
Data360 Analyze Server Help
Copyright
2024
First publish date
2016
Last updated
2024-11-28
Published on
2024-11-28T15:26:57.181000

Uses a linear regression model to predict the value of a dependent variable based on the specified values of the independent variables.

Tip: Before working with this node, there are a number of prerequisite steps, see Working with the Statistical and Predictive Analytics nodes.
Note: An additional Statistical and Predictive Analytics node pack license is required to run this node. See Applying a node pack license.This node processes data in-memory. Additional RAM will be required when processing data sets with a large volume of data.

The node accepts the file path to a file that contains an R serialized linear regression model object. The file path can be specified as a Literal value or be obtained from a specified field on the node's optional second input pin.

An optional property can be configured to specify whether confidence limit (confidence bound) values are to be output. If not specified, the confidence limit values are output by default. If confidence limits are to be output, the confidence limit percentile value can also be specified. If the confidence limit percentile value is not specified then the 95 percentile is used by default.

The node can be configured to include or exclude the original observations from the serialized model in the output data. If not specified, the observation data are excluded by default.

When run, the node uses the embedded R engine to predict the value of the model's dependent (response) variable based on the values of the independent variables that are present on the node's data input pin. The names of the fields in the input data must correspond to the names of the independent variables that were used to construct the linear regression model.

The Summary pin contains a summary of the input serialized model and the file path to the model. The model summary includes information on:

  • The call used to generate the model.
  • Range and quartile values for the residual errors.
  • The estimates for the coefficients of the independent variables used in the model and the estimate of the intercept, the standard error, the t-statistic value and p-value.
  • Significance code indicators for the independent variables and the intercept.
  • The residual standard error.
  • The Multiple R-squared (coefficient of determination) value and Adjusted R-squared value.
  • The F-statistic with the corresponding p-value for that test.

Predicted values for the dependent variable, together with the corresponding (predictor) values for the independent variables are output on the Results pin. If the node has been configured to generate the confidence limits, these values are also output on the results pin. The type field indicates the type of value in the record (corresponding to the lower confidence limit, higher confidence limit and original model observations).

Powered by TIBCO®

Properties

ModelFilePath

Specify the filepath of the Linear Regression model to be used when predicting dependent values.

Choose the (from Field) variant of this property to look up the value from an input field with the name specified. A value is required for this property.

ShowConfidenceLimits

Optionally specify whether values for the confidence limits are to be included in the output data. The default value is True.

ConfidenceLevel

Optionally specify the percentile to be used when calculating the confidence limits. If not specified the 95 percentile is used.

OutputModelObservations

Optionally specify whether the values of the dependent variable and independent variables in the Linear Regression model are to be output. Choose from:

  • Exclude model observations.
  • Include model observations.

The default value is Exclude model observations.

Inputs and outputs

Inputs: data, 1 optional.

Outputs: Summary, Results.