Predict Decision Forest - Data360_Analyze - Latest

Data360 Analyze Server Help

Product type
Software
Portfolio
Verify
Product family
Data360
Product
Data360 Analyze
Version
Latest
Language
English
Product name
Data360 Analyze
Title
Data360 Analyze Server Help
Copyright
2024
First publish date
2016
Last updated
2024-11-28
Published on
2024-11-28T15:26:57.181000

Predicts the value or classification for a dependent variable in a random forest model based on the value of the independent variables.

Tip: Before working with this node, there are a number of prerequisite steps, see Working with the Statistical and Predictive Analytics nodes.
Note: An additional Statistical and Predictive Analytics node pack license is required to run this node. See Applying a node pack license.This node processes data in-memory. Additional RAM will be required when processing data sets with a large volume of data.

The node accepts the new data to be used when making the prediction on its data input pin. It accepts the file path to the file that contains an R serialized decision forest model object. The file path can be specified as a literal value or be obtained from a specified field on the node's optional path input pin.

An optional property enables the node to be configured to utilize input string/unicode data as characters or convert them to factors. If not specified, string/unicode data are treated as characters by default. The setting of this property must correspond with the setting that was used when the decision forest model was created.

Input integer data can be treated as integer values or converted to numeric values. By default, integer values are not converted to numeric.

The node can output strings with a data type of string or unicode. By default, strings are output as unicode.

Similarly, doubles can be output with a data type of double, or with a data type of long provided the tolerance is less than or equal to the specified Epsilon value. By default, double values are output with a data type of double.

When run, action taken by the node depends on the type of decision forest model that is supplied to the node:

  1. If the model type is regression, the node uses the embedded R engine to predict the value for the model's dependent (response) variable based on the values of the independent variables that are present on the node's data input pin.
  2. If the model type is classification, the node uses the embedded R engine to predict the class of the model's dependent (response) variable based on the values of the independent variables that are present on the node's data input pin.

In both of the above situations, the names of the fields in the input data must correspond to the names of the independent variables that were used to construct the decision forest model.The Summary pin contains a summary of the input serialized model and the file path to the model. The model summary includes information on:

  • The call used to generate the model.
  • The model type.
  • The number of trees in the model.
  • The number of variables tried at each split.
  • The mean square value of the residuals (regression only).
  • The percentage of variance explained by the model (regression only).
  • The percentage 'out-of-bag' estimate of error rate (classification only).
  • The confusion matrix (classification only).

The predicted value or class for the dependent variable, together with the corresponding (predictor) values for the independent variables are output on the Results pin.Powered by TIBCO®

Properties

ModelFilePath

Specify the filepath of the Decision Forest (random forest) model to be used when predicting dependent values.

Choose the (from Field) variant of this property to look up the value from an input field with the name specified. A value is required for this property.

InputStringCoercion

Optionally specify how the embedded R engine converts string and Unicode input fields when moving data into R. By default, the embedded R engine converts Data360 Analyze string and Unicode values into characters when creating data frames. Factors take a limited amount of values and are stored as integer vectors, which map to characters when being displayed. They can be used in a variety of modeling functions, but sometimes it is more convenient for strings to simply stay strings and not be converted. The options for this property, To Character and To Factor, determine whether the data frames convert character vectors to factors or leave them as characters.

The default value is To Character.

ExportStringCoercion

Optionally specify how character vectors are exported from the embedded R engine to Data360 Analyze.

It represents all string values in data frames as character vectors or factors, both of which are implemented by Unicode strings. By contrast, Data360 Analyze has two field types for this class: string and Unicode. Unicode can contain all characters while string can only hold a subset (technically, only those found in the Data360 Analyze server's code page).

Therefore, if the exported data has characters that aren't in the Data360 Analyze's code page - usually fancy characters or notations - it is important to set this property to To Unicode to avoid errors when outputting the data. Selecting To String, by contrast, will result in the node failing when these special characters are present. Therefore, To String should only be chosen if the user is certain that all characters in the output data frames are in the Data360 Analyze's code page.

The default value is To Unicode.

Inputs and outputs

Inputs: data, 1 optional.

Outputs: Summary, Results.