HistGradientBoostingClassifier

Model

DashAI.back.models.scikit_learn.HistGradientBoostingClassifier

Histogram-based gradient boosting classifier for large datasets.

This classifier is a gradient boosting variant that discretises features into integer-valued bins (histograms) before tree construction. The histogram representation reduces both the number of candidate split points and the memory footprint, allowing efficient training on datasets with tens of thousands of samples or more. The algorithm natively supports missing values and categorical features. It is inspired by the LightGBM algorithm.

Key hyperparameters include learning_rate, max_iter (number of boosting stages), max_depth, max_leaf_nodes, min_samples_leaf, and l2_regularization. The implementation wraps scikit-learn's HistGradientBoostingClassifier.

References

[1] Ke, G. et al. (2017). "LightGBM: A Highly Efficient Gradient Boosting Decision Tree." Advances in Neural Information Processing Systems 30. https://proceedings.neurips.cc/paper/2017/hash/6449f44a102fde848669bdd9eb6b76fa-Abstract.html
[2] https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.HistGradientBoostingClassifier.html

Parameters

learning_rate : number, default=0.1: The learning rate, also known as shrinkage. This is used as a multiplicative factor for the leaves values. Use 1 for no shrinkage.
max_iter : integer, default=100: The maximum number of iterations of the boosting process, i.e. the maximum number of trees for binary classification.
max_depth : integer, default=1: The maximum depth of each tree. The depth of a tree is the number of edges to go from the root to the deepest leaf. Depth isn't constrained by default.
max_leaf_nodes : integer, default=31: The maximum number of leaves for each tree. Must be strictly greater than 1. If None, there is no maximum limit.
min_samples_leaf : integer, default=20: The minimum number of samples required to be at a leaf node.
l2_regularization : number, default=0.0: The L2 regularization parameter. Use 0 for no regularization.

Methods

calculate_metrics(self, split: DashAI.back.core.enums.metrics.SplitEnum = <SplitEnum.VALIDATION: 'validation'>, level: DashAI.back.core.enums.metrics.LevelEnum = <LevelEnum.LAST: 'last'>, log_index: int = None, x_data: 'DashAIDataset' = None, y_data: 'DashAIDataset' = None)

Defined on BaseModel

Calculate and save metrics for a given data split and level.

Parameters

split : SplitEnum: The data split to evaluate (TRAIN, VALIDATION, or TEST). Defaults to SplitEnum.VALIDATION.
level : LevelEnum: The metric granularity level (LAST, TRIAL, STEP, or BATCH). Defaults to LevelEnum.LAST.
log_index : int, optional: Explicit step index for the metric entry. If None, the next step index is computed automatically. Defaults to None.
x_data : DashAIDataset, optional: Input features. If None, the dataset stored in the model for the given split is used. Defaults to None.
y_data : DashAIDataset, optional: Target labels. If None, the labels stored in the model for the given split are used. Defaults to None.

get_metadata(cls) -> Dict[str, Any]

Defined on BaseModel

Get metadata values for the current model.

Returns

Dict[str, Any]: Dictionary containing UI metadata such as the model icon used in the DashAI frontend.

get_schema(cls) -> dict

Defined on ConfigObject

Generates the component related Json Schema.

Returns

dict: Dictionary representing the Json Schema of the component.

load(filename: str) -> None

Defined on SklearnLikeModel

Deserialise a model from disk using joblib.

Parameters

filename : str: Path to the file previously written by :meth:save.

Returns

SklearnLikeModel: The loaded model instance.

predict(self, x_pred: 'DashAIDataset') -> 'ndarray'

Defined on SklearnLikeClassifier

Make a prediction with the model

Parameters

x_pred : DashAIDataset: Dataset with the input data columns.

Returns

np.ndarray: Array with the predicted target values for x_pred

prepare_dataset(self, dataset: 'DashAIDataset', is_fit: bool = False) -> 'DashAIDataset'

Defined on CategoricalEncoderMixin

Encode categorical feature columns into a numeric representation.

Parameters

dataset : DashAIDataset: The input dataset to preprocess.
is_fit : bool: If True, fit the encoders on the data. If False, apply previously fitted encoders. Defaults to False.

Returns

DashAIDataset: The dataset with categorical columns converted to numeric columns.

prepare_output(self, dataset: 'DashAIDataset', is_fit: bool = False) -> 'DashAIDataset'

Defined on CategoricalEncoderMixin

Prepare output targets using label encoding.

Parameters

dataset : DashAIDataset: The output dataset to be transformed.
is_fit : bool, optional: If True, fit the encoder. If False, use existing encodings.

Returns

DashAIDataset: Dataset with categorical columns converted to integers.

save(self, filename: str) -> None

Defined on SklearnLikeModel

Serialise the model to disk using joblib.

Parameters

filename : str: Destination file path where the model will be written.

train(self, x_train, y_train, x_validation=None, y_validation=None)

Defined on SklearnLikeModel

Train the sklearn model on the provided dataset.

Parameters

x_train : DashAIDataset: The input features for training.
y_train : DashAIDataset: The target labels for training.
x_validation : DashAIDataset, optional: Validation input features (unused in sklearn models). Defaults to None.
y_validation : DashAIDataset, optional: Validation target labels (unused in sklearn models). Defaults to None.

Returns

BaseModel: The fitted scikit-learn estimator (self).

validate_and_transform(self, raw_data: dict) -> dict

Defined on ConfigObject

It takes the data given by the user to initialize the model and returns it with all the objects that the model needs to work.

Parameters

raw_data : dict: A dictionary with the data provided by the user to initialize the model.

Returns

dict: A validated dictionary with the necessary objects.

Compatible with

TabularClassificationTask

References​

Parameters​

Methods​

Compatible with​

References

Parameters

Methods

Compatible with