TextClassificationTask
Task for classifying a single text column into discrete categories.
Text classification takes one input column of type Text and maps it to
one categorical output column. The task covers any NLP scenario where a
raw or pre-processed text sequence must be assigned to one of a fixed set
of labels, such as sentiment analysis, spam detection, topic labelling, and
intent recognition. Compatible models consume the text directly and output
a predicted class label for each sample.
Methods
prepare_for_task(self, dataset: Union[ForwardRef('DatasetDict'), ForwardRef('DashAIDataset')], input_columns: List[str], output_columns: List[str]) -> 'DashAIDataset'
TextClassificationTaskConvert the dataset to DashAIDataset and check the columns types
Parameters
- dataset : Union[DatasetDict, DashAIDataset]
- Dataset to be changed
Returns
- DashAIDataset
- Dataset with the new types
get_metadata(cls) -> Dict[str, Any]
BaseTaskReturn serialisable metadata for the current task.
Parameters
- cls : type
- The task class (injected automatically by Python for classmethods).
Returns
- Dict[str, Any]
- Dictionary with keys
"inputs_types","outputs_types","inputs_cardinality", and"outputs_cardinality".
num_labels(self, dataset: 'DashAIDataset', output_column: str) -> int | None
ClassificationTaskGet the number of unique labels in the output column.
Parameters
- dataset : DashAIDataset
- Dataset used for training
- output_column : str
- Output column
Returns
- int | None
- Number of unique labels or None if not applicable
process_manual_input(self, manual_input: List[dict], dataset_path: str) -> 'DashAIDataset'
BaseTaskProcess manual input data into a DashAIDataset with type validation.
Parameters
- manual_input : List[dict]
- List of dictionaries representing manual input data.
- dataset_path : str
- Path to the training dataset (used to get column specs for validation)
Returns
- DashAIDataset
- Processed DashAIDataset from manual input.
process_predictions(self, dataset: 'DashAIDataset', predictions: 'ndarray', output_column: str) -> 'ndarray'
ClassificationTaskProcess the predictions to return the class labels.
Parameters
- dataset : DashAIDataset
- Dataset used for training
- predictions : np.ndarray
- Predictions from the model (probabilities for each class)
- output_column : str
- Output column
Returns
- np.ndarray
- Processed predictions with class labels
validate_dataset_for_task(self, dataset: 'DashAIDataset', dataset_name: str, input_columns: List[str], output_columns: List[str]) -> None
BaseTaskValidate a dataset for the current task.
Parameters
- dataset : DashAIDataset
- Dataset to be validated
- dataset_name : str
- Dataset name