ARFFDataLoader
Data loader that ingests tabular data from ARFF files into DashAI datasets.
Reads Weka ARFF files using scipy, decodes nominal attributes from bytes to UTF-8 strings, and converts the result into DashAI datasets. Handles multi-file uploads via ZIP archives containing train/test/val split folders.
Methods
load_data(self, filepath_or_buffer: str, temp_path: str, params: Dict[str, Any], n_sample: int | None = None) -> 'DashAIDataset'
ARFFDataLoaderLoad uploaded ARFF files into a DatasetDict.
Parameters
- filepath_or_buffer : str
- Path or URL to an ARFF file or a ZIP archive with split folders.
- temp_path : str
- Temporary directory for file extraction.
- params : Dict[str, Any]
- Dataloader parameters (unused; ARFF is self-describing).
- n_sample : int | None
- Maximum rows to load, or None for all.
Returns
- DashAIDataset
- Dataset with loaded data.
load_preview(self, filepath_or_buffer: str, params: Dict[str, Any], n_rows: int = 100)
ARFFDataLoaderLoad a preview of the ARFF dataset.
Parameters
- filepath_or_buffer : str
- Path to the ARFF file.
- params : Dict[str, Any]
- Unused parameters.
- n_rows : int, optional
- Maximum rows to return. Default is 100.
Returns
- pd.DataFrame
- Preview DataFrame.
extract_files(self, file_path: str, temp_path: str) -> str
BaseDataLoaderExtract a ZIP archive into a subdirectory of temp_path.
Parameters
- file_path : str
- Path to the ZIP archive to extract.
- temp_path : str
- Base temporary directory; extraction target is
<temp_path>/files/.
Returns
- str
- Path of the directory containing the extracted files (
<temp_path>/files/).
get_metadata(cls) -> Dict[str, Any]
BaseDataLoaderget_schema(cls) -> dict
ConfigObjectGenerates the component related Json Schema.
Returns
- dict
- Dictionary representing the Json Schema of the component.
prepare_files(self, file_path: str, temp_path: str) -> str
BaseDataLoaderResolve a file path or URL into a local path suitable for loading.
Parameters
- file_path : str
- Path to a local file, a ZIP archive, or an HTTP(S) URL.
- temp_path : str
- Temporary directory used for extraction of ZIP or URL downloads.
Returns
- tuple of (str, str)
(path, type_path)wheretype_pathis"dir"for extracted archives/URLs or"file"for plain local files.
validate_and_transform(self, raw_data: dict) -> dict
ConfigObjectIt takes the data given by the user to initialize the model and returns it with all the objects that the model needs to work.
Parameters
- raw_data : dict
- A dictionary with the data provided by the user to initialize the model.
Returns
- dict
- A validated dictionary with the necessary objects.