RandomForestRegression

Model

DashAI.back.models.scikit_learn.RandomForestRegression

Random forest regressor that averages predictions from multiple decision trees.

Random Forest is a bagging ensemble that fits n_estimators decision trees, each on a bootstrap sample of the training data. At each split only a random subset of features is considered, decorrelating the trees and reducing variance relative to a single tree. The final prediction is the mean of all individual tree predictions.

Key hyperparameters include n_estimators, criterion, max_depth, min_samples_split, min_samples_leaf, max_features, bootstrap, and random_state. The implementation wraps scikit-learn's RandomForestRegressor.

References

[1] Breiman, L. (2001). "Random Forests." Machine Learning, 45(1), 5-32. https://doi.org/10.1023/A:1010933404324
[2] https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html

Parameters

n_estimators : integer, default=100: The number of trees in the forest.
criterion : string, default=squared_error: The function to measure the quality of a split.
max_depth, default=None: The maximum depth of the tree.
min_samples_split : integer, default=2: The minimum number of samples required to split an internal node.
min_samples_leaf : integer, default=1: The minimum number of samples required to be at a leaf node.
min_weight_fraction_leaf : number, default=0.0: The minimum weighted fraction of the sum total of weights required to be at a leaf node.
max_features, default=sqrt: The number of features to consider when looking for the best split.
max_leaf_nodes, default=None: Grow trees with max_leaf_nodes in best-first fashion.
min_impurity_decrease : number, default=0.0: A node will be split if this split induces a decrease of the impurity greater than or equal to this value.
bootstrap : boolean, default=True: Whether bootstrap samples are used when building trees.
oob_score : boolean, default=False: Whether to use out-of-bag samples to estimate the generalization score.
n_jobs, default=None: The number of jobs to run in parallel for both fit and predict.
random_state, default=None: The seed of the pseudo-random number generator to use when shuffling the data.
warm_start : boolean, default=False: When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble.
ccp_alpha : number, default=0.0: Complexity parameter used for Minimal Cost-Complexity Pruning.
max_samples, default=None: If bootstrap is True, the number of samples to draw from X to train each base estimator.

Methods

calculate_metrics(self, split: DashAI.back.core.enums.metrics.SplitEnum = <SplitEnum.VALIDATION: 'validation'>, level: DashAI.back.core.enums.metrics.LevelEnum = <LevelEnum.LAST: 'last'>, log_index: int = None, x_data: 'DashAIDataset' = None, y_data: 'DashAIDataset' = None)

Defined on BaseModel

Calculate and save metrics for a given data split and level.

Parameters

split : SplitEnum: The data split to evaluate (TRAIN, VALIDATION, or TEST). Defaults to SplitEnum.VALIDATION.
level : LevelEnum: The metric granularity level (LAST, TRIAL, STEP, or BATCH). Defaults to LevelEnum.LAST.
log_index : int, optional: Explicit step index for the metric entry. If None, the next step index is computed automatically. Defaults to None.
x_data : DashAIDataset, optional: Input features. If None, the dataset stored in the model for the given split is used. Defaults to None.
y_data : DashAIDataset, optional: Target labels. If None, the labels stored in the model for the given split are used. Defaults to None.

get_metadata(cls) -> Dict[str, Any]

Defined on BaseModel

Get metadata values for the current model.

Returns

Dict[str, Any]: Dictionary containing UI metadata such as the model icon used in the DashAI frontend.

get_schema(cls) -> dict

Defined on ConfigObject

Generates the component related Json Schema.

Returns

dict: Dictionary representing the Json Schema of the component.

load(filename: str) -> None

Defined on SklearnLikeModel

Deserialise a model from disk using joblib.

Parameters

filename : str: Path to the file previously written by :meth:save.

Returns

SklearnLikeModel: The loaded model instance.

predict(self, x_pred: 'DashAIDataset') -> 'ndarray'

Defined on SklearnLikeRegressor

Make a prediction with the model.

Parameters

x_pred : DashAIDataset: Dataset with the input data columns.

Returns

np.ndarray: Array with the predicted target values for x_pred

prepare_dataset(self, dataset: 'DashAIDataset', is_fit: bool = False) -> 'DashAIDataset'

Defined on SklearnLikeModel

Apply the model transformations to the dataset.

Parameters

dataset : DashAIDataset: The dataset to be transformed.
is_fit : bool, optional: If True, the method will fit encoders on the data. If False, will apply previously fitted encoders.

Returns

DashAIDataset: The prepared dataset ready to be converted to an accepted format in the model.

prepare_output(self, dataset: 'DashAIDataset', is_fit: bool = False) -> 'DashAIDataset'

Defined on SklearnLikeModel

Prepare output targets using Label encoding.

Parameters

dataset : DashAIDataset: The output dataset to be transformed.
is_fit : bool, optional: If True, fit the encoder. If False, use existing encodings.

Returns

DashAIDataset: Dataset with categorical columns converted to integers.

save(self, filename: str) -> None

Defined on SklearnLikeModel

Serialise the model to disk using joblib.

Parameters

filename : str: Destination file path where the model will be written.

train(self, x_train, y_train, x_validation=None, y_validation=None)

Defined on SklearnLikeModel

Train the sklearn model on the provided dataset.

Parameters

x_train : DashAIDataset: The input features for training.
y_train : DashAIDataset: The target labels for training.
x_validation : DashAIDataset, optional: Validation input features (unused in sklearn models). Defaults to None.
y_validation : DashAIDataset, optional: Validation target labels (unused in sklearn models). Defaults to None.

Returns

BaseModel: The fitted scikit-learn estimator (self).

validate_and_transform(self, raw_data: dict) -> dict

Defined on ConfigObject

It takes the data given by the user to initialize the model and returns it with all the objects that the model needs to work.

Parameters

raw_data : dict: A dictionary with the data provided by the user to initialize the model.

Returns

dict: A validated dictionary with the necessary objects.

Compatible with

RegressionTask

References​

Parameters​

Methods​

Compatible with​

References

Parameters

Methods

Compatible with