TfIdfLogRegTextClassificationModel

Model

DashAI.back.models.scikit_learn.TfIdfLogRegTextClassificationModel

TF-IDF vectorizer combined with Logistic Regression for text classification.

This model converts raw text into TF-IDF feature vectors using scikit-learn's TfidfVectorizer with a configurable n-gram range and IDF weighting, then trains a LogisticRegression classifier on the resulting sparse matrix. It is a strong baseline for text classification tasks, particularly when training data is limited or computational resources are constrained.

References

Parameters

ngram_min_n : integer, default=1: Minimum n-gram size for the TF-IDF vectorizer (≥ 1).
ngram_max_n : integer, default=1: Maximum n-gram size for the TF-IDF vectorizer (≥ 1).
use_idf : boolean, default=True: Enable inverse-document-frequency re-weighting.
sublinear_tf : boolean, default=False: Apply sublinear TF scaling (replace TF with 1 + log(TF)).
C : number, default=1.0: Regularization parameter for logistic regression. Smaller values mean stronger regularization.
max_iter : integer, default=1000: Maximum number of iterations for the logistic regression solver.
solver : string, default=lbfgs: Optimization algorithm for logistic regression.

Methods

load(filename: Union[str, ForwardRef('Path')])

Defined on TfIdfLogRegTextClassificationModel

Restore a model instance from disk.

Parameters

filename : str: Path where the model was previously saved.

Returns

Any: The restored model instance.

predict(self, x)

Defined on TfIdfLogRegTextClassificationModel

prepare_output(self, dataset: 'DashAIDataset', is_fit: bool = False)

Defined on TfIdfLogRegTextClassificationModel

Hook for model-specific preprocessing of output targets.

Parameters

dataset : DashAIDataset: The output dataset (target labels) to preprocess.
is_fit : bool: Whether the call is part of a fitting phase. Defaults to False.

Returns

DashAIDataset: The preprocessed output dataset.

save(self, filename: Union[str, ForwardRef('Path')]) -> None

Defined on TfIdfLogRegTextClassificationModel

Store the model to disk.

Parameters

filename : str: Path where the model will be saved.

train(self, x, y, x_validation=None, y_validation=None)

Defined on TfIdfLogRegTextClassificationModel

Train the model with the provided data.

Parameters

x_train : DashAIDataset: The input features for training.
y_train : DashAIDataset: The target labels for training.
x_validation : DashAIDataset, optional: Input features for validation. Defaults to None.
y_validation : DashAIDataset, optional: Target labels for validation. Defaults to None.

Returns

BaseModel: The trained model instance.

calculate_metrics(self, split: DashAI.back.core.enums.metrics.SplitEnum = <SplitEnum.VALIDATION: 'validation'>, level: DashAI.back.core.enums.metrics.LevelEnum = <LevelEnum.LAST: 'last'>, log_index: int = None, x_data: 'DashAIDataset' = None, y_data: 'DashAIDataset' = None)

Defined on BaseModel

Calculate and save metrics for a given data split and level.

Parameters

split : SplitEnum: The data split to evaluate (TRAIN, VALIDATION, or TEST). Defaults to SplitEnum.VALIDATION.
level : LevelEnum: The metric granularity level (LAST, TRIAL, STEP, or BATCH). Defaults to LevelEnum.LAST.
log_index : int, optional: Explicit step index for the metric entry. If None, the next step index is computed automatically. Defaults to None.
x_data : DashAIDataset, optional: Input features. If None, the dataset stored in the model for the given split is used. Defaults to None.
y_data : DashAIDataset, optional: Target labels. If None, the labels stored in the model for the given split are used. Defaults to None.

get_metadata(cls) -> Dict[str, Any]

Defined on BaseModel

Get metadata values for the current model.

Returns

Dict[str, Any]: Dictionary containing UI metadata such as the model icon used in the DashAI frontend.

get_schema(cls) -> dict

Defined on ConfigObject

Generates the component related Json Schema.

Returns

dict: Dictionary representing the Json Schema of the component.

prepare_dataset(self, dataset: 'DashAIDataset', is_fit: bool = False) -> 'DashAIDataset'

Defined on BaseModel

Hook for model-specific preprocessing of input features.

Parameters

dataset : DashAIDataset: The input dataset to preprocess.
is_fit : bool: Whether the call is part of a fitting phase. Defaults to False.

Returns

DashAIDataset: The preprocessed dataset ready to be fed into the model.

validate_and_transform(self, raw_data: dict) -> dict

Defined on ConfigObject

It takes the data given by the user to initialize the model and returns it with all the objects that the model needs to work.

Parameters

raw_data : dict: A dictionary with the data provided by the user to initialize the model.

Returns

dict: A validated dictionary with the necessary objects.

Compatible with

TextClassificationTask

References​

Parameters​

Methods​

Compatible with​

References

Parameters

Methods

Compatible with