TranslationTask
Task for sequence-to-sequence machine translation between languages.
Translation tasks take a single Text input column (source language) and
produce a single Text output column (target language). The compatible
metrics are BLEU and TER, which measure n-gram overlap and translation edit
rate against reference translations respectively.
Methods
num_labels(self, dataset: 'DashAIDataset', output_column: str) -> int | None
TranslationTaskGet the number of unique labels in the output column.
Parameters
- dataset : DashAIDataset
- Dataset used for training
- output_column : str
- Output column
Returns
- int | None
- Number of unique labels or None if not applicable
prepare_for_task(self, dataset: Union[ForwardRef('DatasetDict'), ForwardRef('DashAIDataset')], input_columns: List[str], output_columns: List[str]) -> 'DashAIDataset'
TranslationTaskConvert the dataset to DashAIDataset and check the columns types
Parameters
- dataset : Union[DatasetDict, DashAIDataset]
- Dataset to be changed
Returns
- DashAIDataset
- Dataset with the new types
process_predictions(self, dataset: 'DashAIDataset', predictions: 'ndarray', output_column: str)
TranslationTaskProcess the predictions
Parameters
- dataset : DashAIDataset
- Dataset used for training
- predictions : np.ndarray
- Predictions from the model
- output_column : str
- Output column
Returns
- Processed predictions
get_metadata(cls) -> Dict[str, Any]
BaseTaskReturn serialisable metadata for the current task.
Parameters
- cls : type
- The task class (injected automatically by Python for classmethods).
Returns
- Dict[str, Any]
- Dictionary with keys
"inputs_types","outputs_types","inputs_cardinality", and"outputs_cardinality".
process_manual_input(self, manual_input: List[dict], dataset_path: str) -> 'DashAIDataset'
BaseTaskProcess manual input data into a DashAIDataset with type validation.
Parameters
- manual_input : List[dict]
- List of dictionaries representing manual input data.
- dataset_path : str
- Path to the training dataset (used to get column specs for validation)
Returns
- DashAIDataset
- Processed DashAIDataset from manual input.
validate_dataset_for_task(self, dataset: 'DashAIDataset', dataset_name: str, input_columns: List[str], output_columns: List[str]) -> None
BaseTaskValidate a dataset for the current task.
Parameters
- dataset : DashAIDataset
- Dataset to be validated
- dataset_name : str
- Dataset name