Trainer (Regression)

builds model for a regression dataset.

Overview

Trainer (Regression) builds model for a regression dataset.

In the Snap's settings, you can select the target field in the dataset, algorithm, and configure parameters for the selected algorithm.

If you want to build the model on regression dataset, use the Trainer (Classification) Snap instead.


Trainer (Regression) Snap dialog

Prerequisites

  • The data from upstream Snap must be in tabular format (no nested structure).
  • This Snap automatically derives the schema (field names and types) from the first document. Therefore, the first document must not have any missing values.

Limitations and known issues

None.

Snap views

View Description Examples of upstream and downstream Snaps
Input The regression dataset. Any Snap that generates a regression dataset document. Examples:
Output A serialization of the model and metadata which are not human-readable. Additionally, the output includes a human-readable representation of the model if the Readable checkbox is selected. Snaps that require a model input. Or Snaps that store the model to be used in another pipeline. Examples:
Error

Error handling is a generic way to handle errors without losing data or failing the Snap execution. You can handle the errors that the Snap might encounter when running the pipeline by choosing one of the following options from the When errors occur list under the Views tab. The available options are:

  • Stop Pipeline Execution Stops the current pipeline execution when the Snap encounters an error.
  • Discard Error Data and Continue Ignores the error, discards that record, and continues with the remaining records.
  • Route Error Data to Error View Routes the error data to an error view without stopping the Snap execution.

Learn more about Error handling in Pipelines.

Tip: In some cases, the numeric fields may be represented as string. You can use the Type Converter Snap to convert data into appropriate types before feeding into this Snap.

Snap settings

Note:
  • Suggestion icon (): Indicates a list that is dynamically populated based on the configuration.
  • Expression icon (): Indicates whether the value is an expression (if enabled) or a static value (if disabled). Learn more about Using Expressions in SnapLogic.
  • Add icon (): Indicates that you can add fields in the field set.
  • Remove icon (): Indicates that you can remove fields from the field set.
Field / Field set Type Description
Label String Required. Specify a unique name for the Snap. Modify this to be more appropriate, especially if there are more than one of the same Snap in the pipeline.
Label field Required. The label or output field in the dataset that the model will be trained to predict. This value must be numeric.

Example: $price

Algorithm Required. The regression algorithm that builds the model.
Valid values:
  • Decision Stump
  • K-Nearest Neighbors
  • Linear Regression
  • Random Forests
The implementations are from WEKA, an open source machine learning library in Java.

Default value: K-Nearest Neighbors

Options
The parameters to be applied on the selected algorithm. Each algorithm has a different set of parameters to be configured in this property.

If specifying multiple parameters, separate them with a comma ",".

If blank, the default values are applied for all the parameters.

Valid values: Refer to Options for Algorithms.

Default value: None

Examples:
  • batch_size = 120
  • batch_size = 120, collapse_tree = true
Readable checkbox If selected, the model is displayed in a human-readable format. A $readable field is added to the output.

Default status: Not selected

Snap execution Dropdown list Select one of the three modes in which the Snap executes.
Available options are:
  • Validate & Execute. Performs limited execution of the Snap and generates a data preview during pipeline validation. Subsequently, performs full execution of the Snap (unlimited records) during pipeline runtime.
  • Execute only. Performs full execution of the Snap during pipeline execution without generating preview data.
  • Disabled. Disables the Snap and all Snaps that are downstream from it.
Tip: To choose the best possible algorithm for your dataset, use the Cross Validator (Regression) Snap to perform k-fold cross validation on the dataset. The algorithm that produces the best accuracy is likely to be the one most suitable for your dataset. Apply the same algorithm for your dataset in the Trainer (Regression) Snap to build the model.

Troubleshooting

None.