DashAI.back.models.BagOfWordsTextClassificationModel

class BagOfWordsTextClassificationModel(**kwargs)[source]

Text classification meta-model.

The metamodel has two main components:

  • Tabular classification model: the underlying model that processes the data and

    provides the prediction.

  • Vectorizer: a BagOfWords that vectorizes the text into a sparse matrix to give

    the correct input to the underlying model.

The tabular_model and vectorizer are created in the __init__ method and stored in the model.

To train the tabular_model the vectorizer is fitted and used to transform the train dataset.

To predict with the tabular_model the vectorizer is used to transform the dataset.

__init__(**kwargs) None[source]

Initialize the BagOfWordsTextClassificationModel.

Parameters:

kwargs (dict) – A dictionary containing the parameters for the model, including: - tabular_classifier: The tabular classification model from DashAI to be used. - ngram_min_n: Minimum n-gram value. - ngram_max_n: Maximum n-gram value.

Methods

__init__(**kwargs)

Initialize the BagOfWordsTextClassificationModel.

calculate_metrics([split, level, log_index, ...])

Calculate and save metrics for a given data split and level.

get_schema()

Generates the component related Json Schema.

get_vectorizer(input_column[, output_column])

Factory that returns a function to transform a text classification dataset into a tabular classification dataset.

load(filename)

Load the model of the specified path.

predict(x)

prepare_dataset(dataset[, is_fit])

Apply the model transformations to the dataset.

prepare_output(dataset[, is_fit])

Hook for model-specific preprocessing of output targets.

save(filename)

Save the model in the specified path.

train(x, y[, x_validation, y_validation])

Train the model with the provided data.

validate_and_transform(raw_data)

It takes the data given by the user to initialize the model and returns it with all the objects that the model needs to work.

Attributes

COLOR

COMPATIBLE_COMPONENTS

DISPLAY_NAME

TYPE