GradientBoostingR
Gradient boosting regressor that builds an ensemble of decision trees sequentially.
Gradient Boosting builds an additive model in a forward stage-wise fashion. At
each stage a shallow decision tree is fitted to the negative gradient of the
chosen loss function with respect to the current ensemble prediction. A
learning_rate shrinkage factor scales the contribution of each new tree,
trading a slower learning process for better generalisation.
Key hyperparameters include n_estimators (number of boosting stages),
learning_rate, max_depth, subsample (fraction of training samples
per tree, enabling stochastic gradient boosting), loss, and
min_samples_split. The implementation wraps scikit-learn's
GradientBoostingRegressor.
References
- [1] Friedman, J.H. (2001). "Greedy function approximation: a gradient boosting machine." Annals of Statistics, 29(5), 1189-1232. https://doi.org/10.1214/aos/1013203451
- [2] https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingRegressor.html
Parameters
- loss : string, default=
squared_error - Loss function to be optimized.
- learning_rate : number, default=
0.1 - Learning rate shrinks the contribution of each tree.
- n_estimators : integer, default=
100 - The number of boosting stages to be run.
- subsample : number, default=
1.0 - The fraction of samples to be used for fitting the individual base learners.
- criterion : string, default=
friedman_mse - The function to measure the quality of a split.
- min_samples_split : number, default=
0.5 - The minimum number of samples required to split an internal node.
- min_samples_leaf : number, default=
1 - The minimum number of samples required to be at a leaf node.
- min_weight_fraction_leaf : number, default=
0.0 - The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node.
- max_depth, default=
3 - The maximum depth of the individual regression estimators.
- min_impurity_decrease : number, default=
0.0 - A node will be split if this split induces a decrease of the impurity greater than or equal to this value.
- random_state, default=
None - The seed of the pseudo-random number generator to use when shuffling the data.
- max_features, default=
None - The number of features to consider when looking for the best split.
- alpha : number, default=
0.9 - The alpha-quantile of the Huber loss function and the quantile loss function.
- verbose : integer, default=
0 - Enable verbose output.
- max_leaf_nodes, default=
None - Grow trees with max_leaf_nodes in best-first fashion.
- warm_start : boolean, default=
False - When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble.
- validation_fraction : number, default=
0.1 - The proportion of training data to set aside as validation set for early stopping.
- n_iter_no_change, default=
None - The number of iterations with no improvement to wait before stopping the training.
- tol : number, default=
0.0001 - Tolerance for the early stopping.
- ccp_alpha : number, default=
0.0 - Complexity parameter used for Minimal Cost-Complexity Pruning.
Methods
calculate_metrics(self, split: DashAI.back.core.enums.metrics.SplitEnum = <SplitEnum.VALIDATION: 'validation'>, level: DashAI.back.core.enums.metrics.LevelEnum = <LevelEnum.LAST: 'last'>, log_index: int = None, x_data: 'DashAIDataset' = None, y_data: 'DashAIDataset' = None)
BaseModelCalculate and save metrics for a given data split and level.
Parameters
- split : SplitEnum
- The data split to evaluate (TRAIN, VALIDATION, or TEST). Defaults to SplitEnum.VALIDATION.
- level : LevelEnum
- The metric granularity level (LAST, TRIAL, STEP, or BATCH). Defaults to LevelEnum.LAST.
- log_index : int, optional
- Explicit step index for the metric entry. If None, the next step index is computed automatically. Defaults to None.
- x_data : DashAIDataset, optional
- Input features. If None, the dataset stored in the model for the given split is used. Defaults to None.
- y_data : DashAIDataset, optional
- Target labels. If None, the labels stored in the model for the given split are used. Defaults to None.
get_metadata(cls) -> Dict[str, Any]
BaseModelGet metadata values for the current model.
Returns
- Dict[str, Any]
- Dictionary containing UI metadata such as the model icon used in the DashAI frontend.
get_schema(cls) -> dict
ConfigObjectGenerates the component related Json Schema.
Returns
- dict
- Dictionary representing the Json Schema of the component.
load(filename: str) -> None
SklearnLikeModelDeserialise a model from disk using joblib.
Parameters
- filename : str
- Path to the file previously written by :meth:
save.
Returns
- SklearnLikeModel
- The loaded model instance.
predict(self, x_pred: 'DashAIDataset') -> 'ndarray'
SklearnLikeRegressorMake a prediction with the model.
Parameters
- x_pred : DashAIDataset
- Dataset with the input data columns.
Returns
- np.ndarray
- Array with the predicted target values for x_pred
prepare_dataset(self, dataset: 'DashAIDataset', is_fit: bool = False) -> 'DashAIDataset'
SklearnLikeModelApply the model transformations to the dataset.
Parameters
- dataset : DashAIDataset
- The dataset to be transformed.
- is_fit : bool, optional
- If True, the method will fit encoders on the data. If False, will apply previously fitted encoders.
Returns
- DashAIDataset
- The prepared dataset ready to be converted to an accepted format in the model.
prepare_output(self, dataset: 'DashAIDataset', is_fit: bool = False) -> 'DashAIDataset'
SklearnLikeModelPrepare output targets using Label encoding.
Parameters
- dataset : DashAIDataset
- The output dataset to be transformed.
- is_fit : bool, optional
- If True, fit the encoder. If False, use existing encodings.
Returns
- DashAIDataset
- Dataset with categorical columns converted to integers.
save(self, filename: str) -> None
SklearnLikeModelSerialise the model to disk using joblib.
Parameters
- filename : str
- Destination file path where the model will be written.
train(self, x_train, y_train, x_validation=None, y_validation=None)
SklearnLikeModelTrain the sklearn model on the provided dataset.
Parameters
- x_train : DashAIDataset
- The input features for training.
- y_train : DashAIDataset
- The target labels for training.
- x_validation : DashAIDataset, optional
- Validation input features (unused in sklearn models). Defaults to None.
- y_validation : DashAIDataset, optional
- Validation target labels (unused in sklearn models). Defaults to None.
Returns
- BaseModel
- The fitted scikit-learn estimator (self).
validate_and_transform(self, raw_data: dict) -> dict
ConfigObjectIt takes the data given by the user to initialize the model and returns it with all the objects that the model needs to work.
Parameters
- raw_data : dict
- A dictionary with the data provided by the user to initialize the model.
Returns
- dict
- A validated dictionary with the necessary objects.