DashAI.back.models.BagOfWordsTextClassificationModel
- class BagOfWordsTextClassificationModel(**kwargs)[source]
Text classification meta-model.
The metamodel has two main components:
- Tabular classification model: the underlying model that processes the data and
provides the prediction.
- Vectorizer: a BagOfWords that vectorizes the text into a sparse matrix to give
the correct input to the underlying model.
The tabular_model and vectorizer are created in the __init__ method and stored in the model.
To train the tabular_model the vectorizer is fitted and used to transform the train dataset.
To predict with the tabular_model the vectorizer is used to transform the dataset.
- __init__(**kwargs) None[source]
Initialize the BagOfWordsTextClassificationModel.
- Parameters:
kwargs (dict) – A dictionary containing the parameters for the model, including: - tabular_classifier: The tabular classification model from DashAI to be used. - ngram_min_n: Minimum n-gram value. - ngram_max_n: Maximum n-gram value.
Methods
__init__(**kwargs)Initialize the BagOfWordsTextClassificationModel.
calculate_metrics([split, level, log_index, ...])Calculate and save metrics for a given data split and level.
get_schema()Generates the component related Json Schema.
get_vectorizer(input_column[, output_column])Factory that returns a function to transform a text classification dataset into a tabular classification dataset.
load(filename)Load the model of the specified path.
predict(x)prepare_dataset(dataset[, is_fit])Apply the model transformations to the dataset.
prepare_output(dataset[, is_fit])Hook for model-specific preprocessing of output targets.
save(filename)Save the model in the specified path.
train(x, y[, x_validation, y_validation])Train the model with the provided data.
validate_and_transform(raw_data)It takes the data given by the user to initialize the model and returns it with all the objects that the model needs to work.
Attributes
COLORCOMPATIBLE_COMPONENTSDISPLAY_NAMETYPE