HuggingFaceDatasetSource
Dataset source that fetches public datasets from HuggingFace Hub.
Uses huggingface_hub.HfApi — no authentication required for public
datasets. HfApi.list_datasets exposes an iterator rather than native
cursors, so pagination is implemented by treating the cursor as a numeric
offset and slicing the iterator.
Methods
download_dataset(self, dataset_id: str, temp_path: str) -> str
HuggingFaceDatasetSourceDownload the raw dataset files from HuggingFace Hub.
Parameters
- dataset_id : str
- HuggingFace dataset identifier (e.g.
"stanfordnlp/imdb"). - temp_path : str
- Local directory to download into.
Returns
- str
- Path to the directory containing the downloaded files.
get_info(self, dataset_id: str) -> 'DatasetEntry | None'
HuggingFaceDatasetSourceReturn full metadata for a single HuggingFace dataset, including size.
Parameters
- dataset_id : str
- HuggingFace dataset identifier in
"namespace/repo"form.
Returns
- DatasetEntry or None
- Full metadata entry, or None on error.
search(self, query: str, limit: int = 20, cursor: str | None = None, **filters: Any) -> DashAI.back.dataset_sources.base_dataset_source.SearchPage
HuggingFaceDatasetSourceReturn public HuggingFace datasets matching a query.
Parameters
- query : str
- Search string passed to
HfApi.list_datasets. - limit : int, optional
- Maximum number of results, by default 20.
- cursor : str or None, optional
- Pagination cursor returned by the previous call (encoded numeric offset).
Nonefetches the first page. - **filters : Any
- Unused; reserved for future tag/task filters.
Returns
- SearchPage
- Matching datasets and cursor for the next page (or
None).
get_schema(cls) -> dict
ConfigObjectGenerates the component related Json Schema.
Returns
- dict
- Dictionary representing the Json Schema of the component.
validate_and_transform(self, raw_data: dict) -> dict
ConfigObjectIt takes the data given by the user to initialize the model and returns it with all the objects that the model needs to work.
Parameters
- raw_data : dict
- A dictionary with the data provided by the user to initialize the model.
Returns
- dict
- A validated dictionary with the necessary objects.