TfIdfLogRegTextClassificationModel
TF-IDF vectorizer combined with Logistic Regression for text classification.
This model converts raw text into TF-IDF feature vectors using scikit-learn's
TfidfVectorizer with a configurable n-gram range and IDF weighting, then
trains a LogisticRegression classifier on the resulting sparse matrix.
It is a strong baseline for text classification tasks, particularly when
training data is limited or computational resources are constrained.
References
- [1] https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html
- [2] https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
Parameters
- ngram_min_n : integer, default=
1 - Minimum n-gram size for the TF-IDF vectorizer (≥ 1).
- ngram_max_n : integer, default=
1 - Maximum n-gram size for the TF-IDF vectorizer (≥ 1).
- use_idf : boolean, default=
True - Enable inverse-document-frequency re-weighting.
- sublinear_tf : boolean, default=
False - Apply sublinear TF scaling (replace TF with 1 + log(TF)).
- C : number, default=
1.0 - Regularization parameter for logistic regression. Smaller values mean stronger regularization.
- max_iter : integer, default=
1000 - Maximum number of iterations for the logistic regression solver.
- solver : string, default=
lbfgs - Optimization algorithm for logistic regression.
Methods
load(filename: Union[str, ForwardRef('Path')])
TfIdfLogRegTextClassificationModelRestore a model instance from disk.
Parameters
- filename : str
- Path where the model was previously saved.
Returns
- Any
- The restored model instance.
predict(self, x)
TfIdfLogRegTextClassificationModelprepare_output(self, dataset: 'DashAIDataset', is_fit: bool = False)
TfIdfLogRegTextClassificationModelHook for model-specific preprocessing of output targets.
Parameters
- dataset : DashAIDataset
- The output dataset (target labels) to preprocess.
- is_fit : bool
- Whether the call is part of a fitting phase. Defaults to False.
Returns
- DashAIDataset
- The preprocessed output dataset.
save(self, filename: Union[str, ForwardRef('Path')]) -> None
TfIdfLogRegTextClassificationModelStore the model to disk.
Parameters
- filename : str
- Path where the model will be saved.
train(self, x, y, x_validation=None, y_validation=None)
TfIdfLogRegTextClassificationModelTrain the model with the provided data.
Parameters
- x_train : DashAIDataset
- The input features for training.
- y_train : DashAIDataset
- The target labels for training.
- x_validation : DashAIDataset, optional
- Input features for validation. Defaults to None.
- y_validation : DashAIDataset, optional
- Target labels for validation. Defaults to None.
Returns
- BaseModel
- The trained model instance.
calculate_metrics(self, split: DashAI.back.core.enums.metrics.SplitEnum = <SplitEnum.VALIDATION: 'validation'>, level: DashAI.back.core.enums.metrics.LevelEnum = <LevelEnum.LAST: 'last'>, log_index: int = None, x_data: 'DashAIDataset' = None, y_data: 'DashAIDataset' = None)
BaseModelCalculate and save metrics for a given data split and level.
Parameters
- split : SplitEnum
- The data split to evaluate (TRAIN, VALIDATION, or TEST). Defaults to SplitEnum.VALIDATION.
- level : LevelEnum
- The metric granularity level (LAST, TRIAL, STEP, or BATCH). Defaults to LevelEnum.LAST.
- log_index : int, optional
- Explicit step index for the metric entry. If None, the next step index is computed automatically. Defaults to None.
- x_data : DashAIDataset, optional
- Input features. If None, the dataset stored in the model for the given split is used. Defaults to None.
- y_data : DashAIDataset, optional
- Target labels. If None, the labels stored in the model for the given split are used. Defaults to None.
get_metadata(cls) -> Dict[str, Any]
BaseModelGet metadata values for the current model.
Returns
- Dict[str, Any]
- Dictionary containing UI metadata such as the model icon used in the DashAI frontend.
get_schema(cls) -> dict
ConfigObjectGenerates the component related Json Schema.
Returns
- dict
- Dictionary representing the Json Schema of the component.
prepare_dataset(self, dataset: 'DashAIDataset', is_fit: bool = False) -> 'DashAIDataset'
BaseModelHook for model-specific preprocessing of input features.
Parameters
- dataset : DashAIDataset
- The input dataset to preprocess.
- is_fit : bool
- Whether the call is part of a fitting phase. Defaults to False.
Returns
- DashAIDataset
- The preprocessed dataset ready to be fed into the model.
validate_and_transform(self, raw_data: dict) -> dict
ConfigObjectIt takes the data given by the user to initialize the model and returns it with all the objects that the model needs to work.
Parameters
- raw_data : dict
- A dictionary with the data provided by the user to initialize the model.
Returns
- dict
- A validated dictionary with the necessary objects.