Embedding
HuggingFace embedding converter.
Parameters
- model_name : string, default=
sentence-transformers/all-MiniLM-L6-v2 - Name of the pre-trained model to use
- max_length : integer, default=
512 - Maximum sequence length for tokenization
- batch_size : integer, default=
32 - Number of samples to process at once
- device : string, default=
cpu - Device to use for computation
- pooling_strategy : string, default=
mean - Strategy to pool token embeddings into sentence embedding
Methods
get_output_type(self, column_name: str = None) -> DashAI.back.types.dashai_data_type.DashAIDataType
EmbeddingReturn Float32 as the output type for all embedding columns.
Parameters
- column_name : str or None, optional
- Name of the output column. Not used — all embedding columns receive the same
Float32type. DefaultNone.
Returns
- Float
- A DashAI
Floattype backed bypyarrow.float32().
changes_row_count(self) -> 'bool'
BaseConverterIndicate whether this converter changes the number of dataset rows.
Returns
- bool
- True if the converter may add or remove rows, False otherwise.
fit(self, x: 'DashAIDataset', y: 'DashAIDataset' = None) -> Type[DashAI.back.converters.base_converter.BaseConverter]
HuggingFaceWrapperValidate the input dataset and load the HuggingFace model.
Parameters
- x : DashAIDataset
- Input dataset whose columns must all be string-typed.
- y : DashAIDataset or None, optional
- Ignored. Present for API compatibility. Default
None.
Returns
- HuggingFaceWrapper
- The fitted converter instance (
self).
get_metadata(cls) -> 'Dict[str, Any]'
BaseConverterGet metadata for the converter, used by the DashAI frontend.
Parameters
- cls : type
- The converter class (injected automatically by Python for classmethods).
Returns
- Dict[str, Any]
- Dictionary containing display name, short description, image preview path, category, icon, color, and whether the converter is supervised.
get_schema(cls) -> dict
ConfigObjectGenerates the component related Json Schema.
Returns
- dict
- Dictionary representing the Json Schema of the component.
transform(self, x: 'DashAIDataset', y: 'DashAIDataset' = None) -> 'DashAIDataset'
HuggingFaceWrapperTransform the input dataset by running inference in batches.
Parameters
- x : DashAIDataset
- The dataset to transform. Must have been fitted first.
- y : DashAIDataset or None, optional
- Ignored. Present for API compatibility. Default
None.
Returns
- DashAIDataset
- Transformed dataset with output types set per column.
validate_and_transform(self, raw_data: dict) -> dict
ConfigObjectIt takes the data given by the user to initialize the model and returns it with all the objects that the model needs to work.
Parameters
- raw_data : dict
- A dictionary with the data provided by the user to initialize the model.
Returns
- dict
- A validated dictionary with the necessary objects.