Predict Logistic Regression - Data360_Analyze - Latest

Data360 Analyze Server Help

Product type
Software
Portfolio
Verify
Product family
Data360
Product
Data360 Analyze
Version
Latest
Language
English
Product name
Data360 Analyze
Title
Data360 Analyze Server Help
Copyright
2024
First publish date
2016
Last updated
2024-11-28
Published on
2024-11-28T15:26:57.181000

Uses a logistic regression model to predict the probability of a successful outcome of a dependent variable based on the specified values of the independent variables.

Tip: Before working with this node, there are a number of prerequisite steps, see Working with the Statistical and Predictive Analytics nodes.
Note: An additional Statistical and Predictive Analytics node pack license is required to run this node. See Applying a node pack license.This node processes data in-memory. Additional RAM will be required when processing data sets with a large volume of data.

The node accepts the file path to a file that contains an R serialized logistic regression model object. The file path can be specified as a Literal value or be obtained from a specified field on the node's optional second input pin.

The node can output strings with a data type of string or unicode. By default, strings are output as unicode.

When run, the node uses the embedded R engine to predict the probability of a successful outcome for the model's dependent (response) variable based on the values of the independent variables that are present on the node's data input pin. The names of the fields in the input data must correspond to the names of the independent variables that were used to construct the logistic regression model.The Summary pin contains a summary of the input serialized model and the file path to the model. The model summary includes information on:

  • The call used to generate the model.
  • Range and quartile values for the deviance residual errors.
  • The estimates for the coefficients of the independent variables used in the model and the estimate of the intercept.
  • Significance code indicators for the independent variables and the intercept.
  • The null deviance and corresponding degrees of freedom.
  • The residual deviance for the model and the corresponding degrees of freedom.
  • The Akaike information criterion (AIC) measure of relative quality for the model.
  • The number of Fischer iterations performed in generating the model.

The predicted probabilities for the dependent variable, together with the corresponding (predictor) values for the independent variables are output on the Results pin. The scale of the probabilities are in the range 0 - 1.

Powered by TIBCO®

Properties

ModelFilePath

Specify the filepath of the Logistic Regression model to be used when predicting dependent values. Choose the (from Field) variant of this property to look up the value from an input field with the name specified. A value is required for this property.

ExportStringCoercion

Optionally specify how character vectors are exported from the embedded R engine to Data360 Analyze.

R represents all string values in data frames as character vectors or factors, both of which are implemented by Unicode strings. By contrast, Data360 Analyze has two field types for this class: string and Unicode. Unicode can contain all characters while string can only hold a subset (technically, only those found in the Data360 Analyze server's code page).

Therefore, if the exported data has characters that aren't in the Data360 Analyze's code page - usually fancy characters or notations - it is important to set this property to To Unicode to avoid errors when outputting the data. Selecting To String, by contrast, will result in the node failing when these special characters are present. Therefore, To String should only be chosen if the user is certain that all characters in the output data frames are in the Data360 Analyze's code page.

The default value is To Unicode.

Inputs and outputs

Inputs: data, 1 optional.

Outputs: Summary, Results.