Saltar al contenido principal

OpusMtEnDeTransformer

Model
DashAI.back.models.hugging_face.OpusMtEnDeTransformer

Pre-trained transformer for English-to-German translation.

Fine-tunes the Helsinki-NLP opus-mt-en-de checkpoint, a MarianMT seq2seq model trained on parallel English-German corpora from the OPUS collection.

References

Parameters

num_train_epochs : integer, default=1
Total number of training epochs to perform.
batch_size : integer, default=4
The batch size per GPU/TPU core/CPU for training.
learning_rate : number, default=2e-05
The initial learning rate for AdamW optimizer.
device : string, default=CPU
Hardware on which training is run. GPU is recommended when available. If GPU is selected, all available GPUs are used.
weight_decay : number, default=0.01
L2 regularization coefficient applied via the AdamW optimizer to prevent overfitting.
log_train_every_n_epochs, default=1
Log train metrics every N epochs. None disables per-epoch logging.
log_train_every_n_steps, default=None
Log train metrics every N steps. None disables per-step logging.
log_validation_every_n_epochs, default=1
Log validation metrics every N epochs. None disables per-epoch logging.
log_validation_every_n_steps, default=None
Log validation metrics every N steps. None disables per-step logging.

Methods

calculate_metrics(self, split: DashAI.back.core.enums.metrics.SplitEnum = <SplitEnum.VALIDATION: 'validation'>, level: DashAI.back.core.enums.metrics.LevelEnum = <LevelEnum.LAST: 'last'>, log_index: int = None, x_data: 'DashAIDataset' = None, y_data: 'DashAIDataset' = None)

Defined on BaseModel

Calculate and save metrics for a given data split and level.

Parameters

split : SplitEnum
The data split to evaluate (TRAIN, VALIDATION, or TEST). Defaults to SplitEnum.VALIDATION.
level : LevelEnum
The metric granularity level (LAST, TRIAL, STEP, or BATCH). Defaults to LevelEnum.LAST.
log_index : int, optional
Explicit step index for the metric entry. If None, the next step index is computed automatically. Defaults to None.
x_data : DashAIDataset, optional
Input features. If None, the dataset stored in the model for the given split is used. Defaults to None.
y_data : DashAIDataset, optional
Target labels. If None, the labels stored in the model for the given split are used. Defaults to None.

get_metadata(cls) -> Dict[str, Any]

Defined on BaseModel

Get metadata values for the current model.

Returns

Dict[str, Any]
Dictionary containing UI metadata such as the model icon used in the DashAI frontend.

get_schema(cls) -> dict

Defined on ConfigObject

Generates the component related Json Schema.

Returns

dict
Dictionary representing the Json Schema of the component.

load(cls, filename: Union[str, ForwardRef('Path')])

Defined on OpusMtTransformerMixin

Restore a model instance from disk.

predict(self, x_pred: 'DashAIDataset') -> List

Defined on OpusMtTransformerMixin

Translate source texts using the fine-tuned model.

prepare_dataset(self, dataset: 'DashAIDataset', is_fit: bool = False) -> 'DashAIDataset'

Defined on OpusMtTransformerMixin

Return the dataset unchanged (no preprocessing required).

prepare_output(self, dataset: 'DashAIDataset', is_fit: bool = False) -> 'DashAIDataset'

Defined on BaseModel

Hook for model-specific preprocessing of output targets.

Parameters

dataset : DashAIDataset
The output dataset (target labels) to preprocess.
is_fit : bool
Whether the call is part of a fitting phase. Defaults to False.

Returns

DashAIDataset
The preprocessed output dataset.

save(self, filename: Union[str, ForwardRef('Path')]) -> None

Defined on OpusMtTransformerMixin

Persist model weights and hyperparameters to disk.

tokenize_data(self, x: 'DashAIDataset', y: Optional[ForwardRef('DashAIDataset')] = None) -> 'DashAIDataset'

Defined on OpusMtTransformerMixin

Tokenize source (and optionally target) dataset for seq2seq training.

train(self, x_train: 'DashAIDataset', y_train: 'DashAIDataset', x_validation: 'DashAIDataset' = None, y_validation: 'DashAIDataset' = None)

Defined on OpusMtTransformerMixin

Fine-tune the Opus-MT model on translation data.

validate_and_transform(self, raw_data: dict) -> dict

Defined on ConfigObject

It takes the data given by the user to initialize the model and returns it with all the objects that the model needs to work.

Parameters

raw_data : dict
A dictionary with the data provided by the user to initialize the model.

Returns

dict
A validated dictionary with the necessary objects.

Compatible with