Skip to main content

Embedding

Converter
DashAI.back.converters.hugging_face.Embedding

HuggingFace embedding converter.

Parameters

model_name : string, default=sentence-transformers/all-MiniLM-L6-v2
Name of the pre-trained model to use
max_length : integer, default=512
Maximum sequence length for tokenization
batch_size : integer, default=32
Number of samples to process at once
device : string, default=cpu
Device to use for computation
pooling_strategy : string, default=mean
Strategy to pool token embeddings into sentence embedding

Methods

get_output_type(self, column_name: str = None) -> DashAI.back.types.dashai_data_type.DashAIDataType

Defined on Embedding

Return Float32 as the output type for all embedding columns.

Parameters

column_name : str or None, optional
Name of the output column. Not used — all embedding columns receive the same Float32 type. Default None.

Returns

Float
A DashAI Float type backed by pyarrow.float32().

changes_row_count(self) -> 'bool'

Defined on BaseConverter

Indicate whether this converter changes the number of dataset rows.

Returns

bool
True if the converter may add or remove rows, False otherwise.

fit(self, x: 'DashAIDataset', y: 'DashAIDataset' = None) -> Type[DashAI.back.converters.base_converter.BaseConverter]

Defined on HuggingFaceWrapper

Validate the input dataset and load the HuggingFace model.

Parameters

x : DashAIDataset
Input dataset whose columns must all be string-typed.
y : DashAIDataset or None, optional
Ignored. Present for API compatibility. Default None.

Returns

HuggingFaceWrapper
The fitted converter instance (self).

get_metadata(cls) -> 'Dict[str, Any]'

Defined on BaseConverter

Get metadata for the converter, used by the DashAI frontend.

Parameters

cls : type
The converter class (injected automatically by Python for classmethods).

Returns

Dict[str, Any]
Dictionary containing display name, short description, image preview path, category, icon, color, and whether the converter is supervised.

get_schema(cls) -> dict

Defined on ConfigObject

Generates the component related Json Schema.

Returns

dict
Dictionary representing the Json Schema of the component.

transform(self, x: 'DashAIDataset', y: 'DashAIDataset' = None) -> 'DashAIDataset'

Defined on HuggingFaceWrapper

Transform the input dataset by running inference in batches.

Parameters

x : DashAIDataset
The dataset to transform. Must have been fitted first.
y : DashAIDataset or None, optional
Ignored. Present for API compatibility. Default None.

Returns

DashAIDataset
Transformed dataset with output types set per column.

validate_and_transform(self, raw_data: dict) -> dict

Defined on ConfigObject

It takes the data given by the user to initialize the model and returns it with all the objects that the model needs to work.

Parameters

raw_data : dict
A dictionary with the data provided by the user to initialize the model.

Returns

dict
A validated dictionary with the necessary objects.