OpenMLDatasetSource
Dataset source that fetches public datasets from OpenML.
Uses the OpenML Python library - no authentication required.
Methods
download_dataset(self, dataset_id: str, temp_path: str) -> str
OpenMLDatasetSourceDownload an OpenML dataset's raw data file from its source URL.
Parameters
- dataset_id : str
- OpenML dataset ID (integer as string, e.g.
"61"). - temp_path : str
- Local directory to download into.
Returns
- str
- Path to the downloaded data file inside
temp_path.
search(self, query: str, limit: int = 20, cursor: str | None = None, **filters: Any) -> DashAI.back.dataset_sources.base_dataset_source.SearchPage
OpenMLDatasetSourceReturn active OpenML datasets matching a name query.
Parameters
- query : str
- Dataset name search string.
- limit : int, optional
- Maximum number of results, by default 20.
- cursor : str or None, optional
- Opaque pagination token (encodes the numeric offset).
Nonefetches the first page. - **filters : Any
- Unused; reserved for future filters.
Returns
- SearchPage
- Matching datasets and cursor for the next page (or
None).
get_info(self, dataset_id: str) -> DashAI.back.dataset_sources.base_dataset_source.DatasetEntry | None
BaseDatasetSourceReturn full metadata for a single dataset, including description and tags.
Parameters
- dataset_id : str
- Source-specific dataset identifier.
Returns
- DatasetEntry or None
- Full metadata entry, or None if not available.
get_schema(cls) -> dict
ConfigObjectGenerates the component related Json Schema.
Returns
- dict
- Dictionary representing the Json Schema of the component.
validate_and_transform(self, raw_data: dict) -> dict
ConfigObjectIt takes the data given by the user to initialize the model and returns it with all the objects that the model needs to work.
Parameters
- raw_data : dict
- A dictionary with the data provided by the user to initialize the model.
Returns
- dict
- A validated dictionary with the necessary objects.