Models data using logistic regression allowing identification of data trends.
This node uses the embedded R engine to model the log probability relationship between a binomial dependent (response) variable and one or more independent (explanatory) variables. The node fits the data using a logistic regression model.
The node provides two summaries of the model created for the input data together with details of the regression coefficients and residual errors. The glmSummary pin contains a summary of the model and includes information on:
- The call used to generate the model.
- Range and quartile values for the deviance residual errors.
- The estimates for the coefficients of the independent variables used in the model and the estimate of the intercept.
- Significance code indicators for the independent variables and the intercept.
- The null deviance and corresponding degrees of freedom.
- The residual deviance for the model and the corresponding degrees of freedom.
- The Akaike information criterion (AIC) measure of relative quality for the model.
- The number of fischer iterations performed in generating the model.
If the node is configured to output the serialized model to a file, the glmSummary pin also includes the file path to the file that contains the serialized model.
The glmResiduals pin contains the value of the working residuals for the final iteration of the model fitting process.
The glmCoefficients pin contains the estimated values of the coefficients of the independent variables and the intercept for the maximum likelihood estimation determined by the model.
The anovaSummary pin contains a summary table for an Analysis Of Variance performed on the regression model using the Chi-square test and includes information on:
- The model family and link function used.
- The response variable being modeled.
- Details of the Degrees of Freedom, Deviance, Residual Degrees of Freedom and p-value for each of the independent variables in the regression model.
- Significance code indicators for the independent variables.
Powered by TIBCO®
Properties
ModelName
Optionally specify the name of a model which is displayed on the output data. When the node is configured to write the serialized model to a file, the model name is also used as the output filename.
A model name must start with a letter and may contain any of the following:
- letters
- numbers
- period character (".")
- underscore ("_")
If not specified, the default model name "Logistic" is used.
ModelFormula
Optionally specify the formula for the logistic regression model. For example:
dependent ~ predictor1 + predictor2 + predictor3
This property should not be specified if the properties DependentVariable and IndependentVariables or OmitModelConstant are set.
This property is case sensitive.
DependentVariable
Specify the dependent variable which is to be modeled on the independent variable(s).
Only one dependent variable can be input. A value is required for this property if the ModelFormula is not specified. If the ModelFormula property is specified this property should not be used. This property is case sensitive.
IndependentVariables
Specify the independent variables i.e. the predictors that are to be used to model the dependent variable. A comma-separated list of fields containing independent variables.
A value is required for this property if the ModelFormula is not specified. If the ModelFormula property is specified this property should not be used.
This property is case sensitive.
WeightVariable
Optionally specify the variable used as the weight when using a weighted least squares model.
Only one weight variable can be input. If not specified, weights are not used in the model. This property is case sensitive.
OmitModelConstant
Optionally specify whether a model constant is to be excluded from the model. If the ModelFormula property is specified, this property should not be set.
The default value is False.
ModelType
Optionally specify the binomial family link Model Type to be used. Choose from:
- logit
- probit
- complementary log-log
The default value is logit.
ModelOutputMode
Specify whether the serialized model is written to a file on disk.
This property also determines how ModelOutputField and ModelOutputDirectory behave. The default value is None.
ModelOutputField
Optionally specify a name for the output field that contains the full path of the file where the serialized model has been written. The default value is "glm_ModelOutput".
ModelOutputDirectory
Specify the directory where the serialized model is written when ModelOutputMode is set to File. When ModelOutputDirectory is blank, files are written to the Data360 Analyze temporary directory. Otherwise, the files are written to the specified directory - the specified directory must exist and be writeable. This node will not overwrite existing files by default. This behavior can be set in the ExceptionBehavior tab.
This property should only be filled in when ModelOutputMode is set to File.
OutputAdditionalAttributes
Optionally specify whether an extended set of coefficient attributes is to be provided on the output.
If set to True:
1. The following values are output on the Summary output pin:
- Null Deviance
- Total Degrees of Freedom (for the Null Model)
- Residual Deviance
- Residual Degrees of Freedom
- AIC value
- Number of Fisher Iterations
2. The following values are output on the Coefficients output pin:
- Standard Error
- t value
- p value
- The exponent of the (existing) coefficient values
The default value is False.
FileExistsBehavior
Optionally specify whether an existing serialized model file will be overwritten. Choose from:
- Error - Generate an error and do not overwrite the file.
- Log - Log a warning message and do not overwrite the file.
- Ignore - Do not overwrite the file.
- Overwrite - Overwrite the file.
The default value is Error.
Inputs and outputs
Inputs: data.
Outputs: glmSummary, glmResiduals, glmCoefficients, anovaSummary.