DashAI.back.models.BagOfWordsTextClassificationModel

class BagOfWordsTextClassificationModel(**kwargs)[source]

Text classification meta-model.

The metamodel has two main components:

  • Tabular classification model: the underlying model that processes the data and

    provides the prediction.

  • Vectorizer: a BagOfWords that vectorizes the text into a sparse matrix to give

    the correct input to the underlying model.

The tabular_model and vectorizer are created in the __init__ method and stored in the model.

To train the tabular_model the vectorizer is fitted and used to transform the train dataset.

To predict with the tabular_model the vectorizer is used to transform the dataset.

__init__(**kwargs) None[source]

Initialize the BagOfWordsTextClassificationModel.

Parameters:

kwargs (dict) – A dictionary containing the parameters for the model, including: - tabular_classifier: Configuration for the underlying classifier. - ngram_min_n: Minimum n-gram value. - ngram_max_n: Maximum n-gram value.

Methods

__init__(**kwargs)

Initialize the BagOfWordsTextClassificationModel.

fit(x, y)

Fit the estimator.

get_schema()

Generates the component related Json Schema.

get_vectorizer(input_column[, output_column])

Factory that returns a function to transform a text classification dataset into a tabular classification dataset.

load(filename)

Load the model of the specified path.

predict(x)

save(filename)

Save the model in the specified path.

validate_and_transform(raw_data)

It takes the data given by the user to initialize the model and returns it with all the objects that the model needs to work.

Attributes

COMPATIBLE_COMPONENTS

TYPE