SGDClassifier
SGD classifier with probability calibration for consistent predict_proba output.
SGDClassifier supports multiple loss functions that correspond to different
linear models (SVM with 'hinge', logistic regression with 'log_loss', etc.).
Stochastic Gradient Descent allows efficient training on large datasets. Because
not all loss functions expose predict_proba natively, this wrapper
consistently calibrates the model with CalibratedClassifierCV.
Key hyperparameters include loss, alpha, max_iter, tol, and
learning_rate. The implementation wraps scikit-learn's SGDClassifier.
References
Parameters
- loss : string, default=
hinge - The loss function to use. 'hinge' gives a linear SVM; 'log_loss' gives logistic regression; 'modified_huber' is smoother; 'squared_hinge' is like hinge but quadratically penalised; 'perceptron' is the linear loss used by the perceptron algorithm.
- alpha : number, default=
0.0001 - Regularisation parameter. Higher values result in stronger regularisation.
- max_iter : integer, default=
1000 - The maximum number of passes over the training data (epochs).
- tol : number, default=
0.001 - The stopping criterion. Training stops when loss > best_loss - tol.
- learning_rate : string, default=
optimal - The learning rate schedule. 'optimal' uses 1/(alpha*(t+t0)); 'constant' keeps eta0 constant; 'invscaling' decreases as 1/t^power; 'adaptive' halves the rate when training stops.
- random_state, default=
None - The seed of the pseudo-random number generator. Pass an int for reproducible output, or None to not set a specific seed.
Methods
predict(self, x_pred) -> 'ndarray'
SGDClassifierReturn class-probability matrix using the calibrated model.
Parameters
- x_pred : DashAIDataset or pd.DataFrame
- Input data.
Returns
- np.ndarray
- Class probability matrix.
train(self, x_train, y_train, x_validation=None, y_validation=None)
SGDClassifierTrain using CalibratedClassifierCV to guarantee predict_proba availability.
Parameters
- x_train : DashAIDataset
- The input features for training.
- y_train : DashAIDataset
- The target labels for training.
- x_validation : DashAIDataset, optional
- Unused (sklearn models ignore validation split).
- y_validation : DashAIDataset, optional
- Unused.
Returns
- self
calculate_metrics(self, split: DashAI.back.core.enums.metrics.SplitEnum = <SplitEnum.VALIDATION: 'validation'>, level: DashAI.back.core.enums.metrics.LevelEnum = <LevelEnum.LAST: 'last'>, log_index: int = None, x_data: 'DashAIDataset' = None, y_data: 'DashAIDataset' = None)
BaseModelCalculate and save metrics for a given data split and level.
Parameters
- split : SplitEnum
- The data split to evaluate (TRAIN, VALIDATION, or TEST). Defaults to SplitEnum.VALIDATION.
- level : LevelEnum
- The metric granularity level (LAST, TRIAL, STEP, or BATCH). Defaults to LevelEnum.LAST.
- log_index : int, optional
- Explicit step index for the metric entry. If None, the next step index is computed automatically. Defaults to None.
- x_data : DashAIDataset, optional
- Input features. If None, the dataset stored in the model for the given split is used. Defaults to None.
- y_data : DashAIDataset, optional
- Target labels. If None, the labels stored in the model for the given split are used. Defaults to None.
get_metadata(cls) -> Dict[str, Any]
BaseModelGet metadata values for the current model.
Returns
- Dict[str, Any]
- Dictionary containing UI metadata such as the model icon used in the DashAI frontend.
get_schema(cls) -> dict
ConfigObjectGenerates the component related Json Schema.
Returns
- dict
- Dictionary representing the Json Schema of the component.
load(filename: str) -> None
SklearnLikeModelDeserialise a model from disk using joblib.
Parameters
- filename : str
- Path to the file previously written by :meth:
save.
Returns
- SklearnLikeModel
- The loaded model instance.
prepare_dataset(self, dataset: 'DashAIDataset', is_fit: bool = False) -> 'DashAIDataset'
SklearnLikeModelApply the model transformations to the dataset.
Parameters
- dataset : DashAIDataset
- The dataset to be transformed.
- is_fit : bool, optional
- If True, the method will fit encoders on the data. If False, will apply previously fitted encoders.
Returns
- DashAIDataset
- The prepared dataset ready to be converted to an accepted format in the model.
prepare_output(self, dataset: 'DashAIDataset', is_fit: bool = False) -> 'DashAIDataset'
SklearnLikeModelPrepare output targets using Label encoding.
Parameters
- dataset : DashAIDataset
- The output dataset to be transformed.
- is_fit : bool, optional
- If True, fit the encoder. If False, use existing encodings.
Returns
- DashAIDataset
- Dataset with categorical columns converted to integers.
save(self, filename: str) -> None
SklearnLikeModelSerialise the model to disk using joblib.
Parameters
- filename : str
- Destination file path where the model will be written.
validate_and_transform(self, raw_data: dict) -> dict
ConfigObjectIt takes the data given by the user to initialize the model and returns it with all the objects that the model needs to work.
Parameters
- raw_data : dict
- A dictionary with the data provided by the user to initialize the model.
Returns
- dict
- A validated dictionary with the necessary objects.