Logistic Regression - Data360_Analyze - Latest

Data360 Analyze Server Help

Product type
Software
Portfolio
Verify
Product family
Data360
Product
Data360 Analyze
Version
Latest
Language
English
Product name
Data360 Analyze
Title
Data360 Analyze Server Help
Copyright
2024
First publish date
2016
Last updated
2024-11-28
Published on
2024-11-28T15:26:57.181000

Models data using logistic regression allowing identification of data trends.

Tip: Before working with this node, there are a number of prerequisite steps, see Working with the Statistical and Predictive Analytics nodes.
Note: An additional Statistical and Predictive Analytics node pack license is required to run this node. See Applying a node pack license.This node processes data in-memory. Additional RAM will be required when processing data sets with a large volume of data.

This node uses the embedded R engine to model the log probability relationship between a binomial dependent (response) variable and one or more independent (explanatory) variables. The node fits the data using a logistic regression model.

The node provides two summaries of the model created for the input data together with details of the regression coefficients and residual errors. The glmSummary pin contains a summary of the model and includes information on:

  • The call used to generate the model.
  • Range and quartile values for the deviance residual errors.
  • The estimates for the coefficients of the independent variables used in the model and the estimate of the intercept.
  • Significance code indicators for the independent variables and the intercept.
  • The null deviance and corresponding degrees of freedom.
  • The residual deviance for the model and the corresponding degrees of freedom.
  • The Akaike information criterion (AIC) measure of relative quality for the model.
  • The number of fischer iterations performed in generating the model.

If the node is configured to output the serialized model to a file, the glmSummary pin also includes the file path to the file that contains the serialized model.

The glmResiduals pin contains the value of the working residuals for the final iteration of the model fitting process.

The glmCoefficients pin contains the estimated values of the coefficients of the independent variables and the intercept for the maximum likelihood estimation determined by the model.

The anovaSummary pin contains a summary table for an Analysis Of Variance performed on the regression model using the Chi-square test and includes information on:

  • The model family and link function used.
  • The response variable being modeled.
  • Details of the Degrees of Freedom, Deviance, Residual Degrees of Freedom and p-value for each of the independent variables in the regression model.
  • Significance code indicators for the independent variables.

Powered by TIBCO®

Properties

ModelName

Optionally specify the name of a model which is displayed on the output data. When the node is configured to write the serialized model to a file, the model name is also used as the output filename.

A model name must start with a letter and may contain any of the following:

  • letters
  • numbers
  • period character (".")
  • underscore ("_")

If not specified, the default model name "Logistic" is used.

ModelFormula

Optionally specify the formula for the logistic regression model. For example:

dependent ~ predictor1 + predictor2 + predictor3

This property should not be specified if the properties DependentVariable and IndependentVariables or OmitModelConstant are set.

This property is case sensitive.

DependentVariable

Specify the dependent variable which is to be modeled on the independent variable(s).

Only one dependent variable can be input. A value is required for this property if the ModelFormula is not specified. If the ModelFormula property is specified this property should not be used. This property is case sensitive.

IndependentVariables

Specify the independent variables i.e. the predictors that are to be used to model the dependent variable. A comma-separated list of fields containing independent variables.

A value is required for this property if the ModelFormula is not specified. If the ModelFormula property is specified this property should not be used.

This property is case sensitive.

WeightVariable

Optionally specify the variable used as the weight when using a weighted least squares model.

Only one weight variable can be input. If not specified, weights are not used in the model. This property is case sensitive.

OmitModelConstant

Optionally specify whether a model constant is to be excluded from the model. If the ModelFormula property is specified, this property should not be set.

The default value is False.

ModelType

Optionally specify the binomial family link Model Type to be used. Choose from:

  • logit
  • probit
  • complementary log-log

The default value is logit.

ModelOutputMode

Specify whether the serialized model is written to a file on disk.

This property also determines how ModelOutputField and ModelOutputDirectory behave. The default value is None.

ModelOutputField

Optionally specify a name for the output field that contains the full path of the file where the serialized model has been written. The default value is "glm_ModelOutput".

ModelOutputDirectory

Specify the directory where the serialized model is written when ModelOutputMode is set to File. When ModelOutputDirectory is blank, files are written to the Data360 Analyze temporary directory. Otherwise, the files are written to the specified directory - the specified directory must exist and be writeable. This node will not overwrite existing files by default. This behavior can be set in the ExceptionBehavior tab.

This property should only be filled in when ModelOutputMode is set to File.

OutputAdditionalAttributes

Optionally specify whether an extended set of coefficient attributes is to be provided on the output.

If set to True:

1. The following values are output on the Summary output pin:

  • Null Deviance
  • Total Degrees of Freedom (for the Null Model)
  • Residual Deviance
  • Residual Degrees of Freedom
  • AIC value
  • Number of Fisher Iterations

2. The following values are output on the Coefficients output pin:

  • Standard Error
  • t value
  • p value
  • The exponent of the (existing) coefficient values

The default value is False.

FileExistsBehavior

Optionally specify whether an existing serialized model file will be overwritten. Choose from:

  • Error - Generate an error and do not overwrite the file.
  • Log - Log a warning message and do not overwrite the file.
  • Ignore - Do not overwrite the file.
  • Overwrite - Overwrite the file.

The default value is Error.

Inputs and outputs

Inputs: data.

Outputs: glmSummary, glmResiduals, glmCoefficients, anovaSummary.