Saltar al contenido principal

OpusMtEsENTransformer

Model
DashAI.back.models.hugging_face.OpusMtEsENTransformer

Pre-trained transformer for Spanish-to-English translation.

This model fine-tunes the Helsinki-NLP opus-mt-es-en checkpoint, which is based on the MarianMT sequence-to-sequence architecture. The base model was trained on parallel Spanish-English corpora from the OPUS collection and supports direct translation without intermediate pivot languages.

Fine-tuning is performed with the HuggingFace Seq2SeqTrainer using the AdamW optimizer. Training and validation metrics are logged at configurable epoch and step intervals via a custom MetricsCallback.

References

Parameters

num_train_epochs : integer, default=1
Total number of training epochs to perform.
batch_size : integer, default=4
The batch size per GPU/TPU core/CPU for training
learning_rate : number, default=2e-05
The initial learning rate for AdamW optimizer
device : string, default=CPU
Hardware on which the training is run. If available, GPU is recommended for efficiency reasons. Otherwise, use CPU. If GPU is selected then it will use all gpus available.
weight_decay : number, default=0.01
Weight decay is a regularization technique used in training neural networks to prevent overfitting. In the context of the AdamW optimizer, the 'weight_decay' parameter is the rate at which the weights of all layers are reduced during training, provided that this rate is not zero.
log_train_every_n_epochs, default=1
Log metrics for train split every n epochs during training. If None, it won't log per epoch.
log_train_every_n_steps, default=None
Log metrics for train split every n steps during training. If None, it won't log per step.
log_validation_every_n_epochs, default=1
Log metrics for validation split every n epochs during training. If None, it won't log per epoch.
log_validation_every_n_steps, default=None
Log metrics for validation split every n steps during training. If None, it won't log per step.

Methods

load(cls, filename: Union[str, ForwardRef('Path')])

Defined on OpusMtEsENTransformer

Restore an OpusMtEsENTransformer instance from disk.

Parameters

filename : str or Path
Directory path from which the model files will be read.

Returns

OpusMtEsENTransformer
The restored model instance with fitted set to the persisted value.

predict(self, x_pred: 'DashAIDataset') -> List

Defined on OpusMtEsENTransformer

Translate Spanish source texts to English.

Parameters

x_pred : DashAIDataset
Source-language dataset. Only the first column is used.

Returns

list of str
One translated string per input sample, in the same order as x_pred.

prepare_dataset(self, dataset: 'DashAIDataset', is_fit: bool = False) -> 'DashAIDataset'

Defined on OpusMtEsENTransformer

Return the dataset unchanged.

Parameters

dataset : DashAIDataset
The dataset to be prepared.
is_fit : bool, optional
Whether the call is made during fitting. Unused here. Default False.

Returns

DashAIDataset
The original dataset, unmodified.

save(self, filename: Union[str, ForwardRef('Path')]) -> None

Defined on OpusMtEsENTransformer

Store the fine-tuned model and its configuration to disk.

Parameters

filename : str or Path
Directory path where the model files will be written. If a file exists at that path it is removed and replaced by a directory.

tokenize_data(self, x: 'DashAIDataset', y: Optional[ForwardRef('DashAIDataset')] = None) -> 'DashAIDataset'

Defined on OpusMtEsENTransformer

Tokenize input and optional target datasets for seq2seq training.

Parameters

x : DashAIDataset
Source-language dataset. Only the first column is used.
y : DashAIDataset, optional
Target-language dataset. When provided, tokenized targets are added as labels. When None, only input_ids and attention_mask are returned (inference mode).

Returns

DashAIDataset
Tokenized dataset with keys input_ids, attention_mask, and optionally labels.

train(self, x_train: 'DashAIDataset', y_train: 'DashAIDataset', x_validation: 'DashAIDataset' = None, y_validation: 'DashAIDataset' = None) -> 'OpusMtEsENTransformer'

Defined on OpusMtEsENTransformer

Fine-tune the opus-mt-es-en model on Spanish-English translation data.

Parameters

x_train : DashAIDataset
Input Spanish text features for training.
y_train : DashAIDataset
Target English translation labels for training.
x_validation : DashAIDataset, optional
Input Spanish text features for validation. Default None.
y_validation : DashAIDataset, optional
Target English translation labels for validation. Default None.

Returns

OpusMtEsENTransformer
The fine-tuned model instance.

calculate_metrics(self, split: DashAI.back.core.enums.metrics.SplitEnum = <SplitEnum.VALIDATION: 'validation'>, level: DashAI.back.core.enums.metrics.LevelEnum = <LevelEnum.LAST: 'last'>, log_index: int = None, x_data: 'DashAIDataset' = None, y_data: 'DashAIDataset' = None)

Defined on BaseModel

Calculate and save metrics for a given data split and level.

Parameters

split : SplitEnum
The data split to evaluate (TRAIN, VALIDATION, or TEST). Defaults to SplitEnum.VALIDATION.
level : LevelEnum
The metric granularity level (LAST, TRIAL, STEP, or BATCH). Defaults to LevelEnum.LAST.
log_index : int, optional
Explicit step index for the metric entry. If None, the next step index is computed automatically. Defaults to None.
x_data : DashAIDataset, optional
Input features. If None, the dataset stored in the model for the given split is used. Defaults to None.
y_data : DashAIDataset, optional
Target labels. If None, the labels stored in the model for the given split are used. Defaults to None.

get_metadata(cls) -> Dict[str, Any]

Defined on BaseModel

Get metadata values for the current model.

Returns

Dict[str, Any]
Dictionary containing UI metadata such as the model icon used in the DashAI frontend.

get_schema(cls) -> dict

Defined on ConfigObject

Generates the component related Json Schema.

Returns

dict
Dictionary representing the Json Schema of the component.

prepare_output(self, dataset: 'DashAIDataset', is_fit: bool = False) -> 'DashAIDataset'

Defined on BaseModel

Hook for model-specific preprocessing of output targets.

Parameters

dataset : DashAIDataset
The output dataset (target labels) to preprocess.
is_fit : bool
Whether the call is part of a fitting phase. Defaults to False.

Returns

DashAIDataset
The preprocessed output dataset.

validate_and_transform(self, raw_data: dict) -> dict

Defined on ConfigObject

It takes the data given by the user to initialize the model and returns it with all the objects that the model needs to work.

Parameters

raw_data : dict
A dictionary with the data provided by the user to initialize the model.

Returns

dict
A validated dictionary with the necessary objects.

Compatible with