JSONDataLoader
Data loader that ingests record-oriented JSON files into DashAI datasets.
Parses JSON files containing an array of record objects (one object per
row) and converts them to DashAIDataset train/validation/test splits.
An optional data_key parameter allows the records to be nested under a
top-level key (e.g. {"data": [{...}, ...]}) rather than at the root.
Multi-file uploads are concatenated before splitting, and the split ratios are validated before loading to provide early failure feedback.
Parameters
- name : string, default=
- Custom name to register your dataset. If no name is specified, the name of the uploaded file will be used.
- data_key, default=
data - In case the data has the form {"data": [{"col1": val1, "col2": val2, ...}]} (also known as "table" in pandas), name of the field "data", where the list with dictionaries with the data should be found. In case the format is only a list of dictionaries (also known as "records" orient in pandas), set this value as null.
Methods
load_data(self, filepath_or_buffer: str, temp_path: str, params: Dict[str, Any], n_sample: int | None = None) -> 'DashAIDataset'
JSONDataLoaderLoad the uploaded JSON dataset into a DatasetDict.
Parameters
- filepath_or_buffer : str
- An URL where the dataset is located or a FastAPI/Uvicorn uploaded file object.
- temp_path : str
- The temporary path where the files will be extracted and then uploaded.
- params : Dict[str, Any]
- Dict with the dataloader parameters. The options are: - data_key (str): The key of the json where the data is contained.
- n_sample : int | None
- Indicates how many rows load from the dataset, all rows if null.
Returns
- DatasetDict
- A HuggingFace's Dataset with the loaded data.
load_preview(self, filepath_or_buffer: str, params: Dict[str, Any], n_rows: int = 100)
JSONDataLoaderLoad a preview of the JSON dataset using streaming.
Parameters
- filepath_or_buffer : str
- Path to the JSON file.
- params : Dict[str, Any]
- Parameters for loading the JSON (data_key).
- n_rows : int, optional
- Number of rows to preview. Default is 100.
Returns
- pd.DataFrame
- A DataFrame containing the preview rows.
extract_files(self, file_path: str, temp_path: str) -> str
BaseDataLoaderExtract a ZIP archive into a subdirectory of temp_path.
Parameters
- file_path : str
- Path to the ZIP archive to extract.
- temp_path : str
- Base temporary directory; extraction target is
<temp_path>/files/.
Returns
- str
- Path of the directory containing the extracted files (
<temp_path>/files/).
get_schema(cls) -> dict
ConfigObjectGenerates the component related Json Schema.
Returns
- dict
- Dictionary representing the Json Schema of the component.
prepare_files(self, file_path: str, temp_path: str) -> str
BaseDataLoaderResolve a file path or URL into a local path suitable for loading.
Parameters
- file_path : str
- Path to a local file, a ZIP archive, or an HTTP(S) URL.
- temp_path : str
- Temporary directory used for extraction of ZIP or URL downloads.
Returns
- tuple of (str, str)
(path, type_path)wheretype_pathis"dir"for extracted archives/URLs or"file"for plain local files.
validate_and_transform(self, raw_data: dict) -> dict
ConfigObjectIt takes the data given by the user to initialize the model and returns it with all the objects that the model needs to work.
Parameters
- raw_data : dict
- A dictionary with the data provided by the user to initialize the model.
Returns
- dict
- A validated dictionary with the necessary objects.