Skip to main content

TokenizerConverter

Converter
DashAI.back.converters.hugging_face.TokenizerConverter

Converter that tokenizes text and stores each token ID in a separate column.

Parameters

model_name : string, default=bert-base-uncased
Name of the pre-trained tokenizer model
max_length : integer, default=512
Maximum sequence length for tokenization
batch_size : integer, default=32
Number of samples to process at once
device : string, default=cpu
Device to use for computation

Methods

changes_row_count(self) -> 'bool'

Defined on BaseConverter

Indicate whether this converter changes the number of dataset rows.

Returns

bool
True if the converter may add or remove rows, False otherwise.

fit(self, x: 'DashAIDataset', y: 'DashAIDataset' = None) -> Type[DashAI.back.converters.base_converter.BaseConverter]

Defined on HuggingFaceWrapper

Validate the input dataset and load the HuggingFace model.

Parameters

x : DashAIDataset
Input dataset whose columns must all be string-typed.
y : DashAIDataset or None, optional
Ignored. Present for API compatibility. Default None.

Returns

HuggingFaceWrapper
The fitted converter instance (self).

get_metadata(cls) -> 'Dict[str, Any]'

Defined on BaseConverter

Get metadata for the converter, used by the DashAI frontend.

Parameters

cls : type
The converter class (injected automatically by Python for classmethods).

Returns

Dict[str, Any]
Dictionary containing display name, short description, image preview path, category, icon, color, and whether the converter is supervised.

get_output_type(self, column_name: str = None) -> DashAI.back.types.dashai_data_type.DashAIDataType

Defined on HuggingFaceWrapper

Return the DashAI data type produced for the given output column.

Parameters

column_name : str or None, optional
Name of the output column whose type is requested. Default None.

Returns

DashAIDataType
The DashAI type assigned to the output column.

get_schema(cls) -> dict

Defined on ConfigObject

Generates the component related Json Schema.

Returns

dict
Dictionary representing the Json Schema of the component.

transform(self, x: 'DashAIDataset', y: 'DashAIDataset' = None) -> 'DashAIDataset'

Defined on HuggingFaceWrapper

Transform the input dataset by running inference in batches.

Parameters

x : DashAIDataset
The dataset to transform. Must have been fitted first.
y : DashAIDataset or None, optional
Ignored. Present for API compatibility. Default None.

Returns

DashAIDataset
Transformed dataset with output types set per column.

validate_and_transform(self, raw_data: dict) -> dict

Defined on ConfigObject

It takes the data given by the user to initialize the model and returns it with all the objects that the model needs to work.

Parameters

raw_data : dict
A dictionary with the data provided by the user to initialize the model.

Returns

dict
A validated dictionary with the necessary objects.