Skip to main content

JSONDataLoader

DataLoader
DashAI.back.dataloaders.classes.JSONDataLoader

Data loader that ingests record-oriented JSON files into DashAI datasets.

Parses JSON files containing an array of record objects (one object per row) and converts them to DashAIDataset train/validation/test splits. An optional data_key parameter allows the records to be nested under a top-level key (e.g. {"data": [{...}, ...]}) rather than at the root.

Multi-file uploads are concatenated before splitting, and the split ratios are validated before loading to provide early failure feedback.

Parameters

name : string, default=
Custom name to register your dataset. If no name is specified, the name of the uploaded file will be used.
data_key, default=data
In case the data has the form {"data": [{"col1": val1, "col2": val2, ...}]} (also known as "table" in pandas), name of the field "data", where the list with dictionaries with the data should be found. In case the format is only a list of dictionaries (also known as "records" orient in pandas), set this value as null.

Methods

load_data(self, filepath_or_buffer: str, temp_path: str, params: Dict[str, Any], n_sample: int | None = None) -> 'DashAIDataset'

Defined on JSONDataLoader

Load the uploaded JSON dataset into a DatasetDict.

Parameters

filepath_or_buffer : str
An URL where the dataset is located or a FastAPI/Uvicorn uploaded file object.
temp_path : str
The temporary path where the files will be extracted and then uploaded.
params : Dict[str, Any]
Dict with the dataloader parameters. The options are: - data_key (str): The key of the json where the data is contained.
n_sample : int | None
Indicates how many rows load from the dataset, all rows if null.

Returns

DatasetDict
A HuggingFace's Dataset with the loaded data.

load_preview(self, filepath_or_buffer: str, params: Dict[str, Any], n_rows: int = 100)

Defined on JSONDataLoader

Load a preview of the JSON dataset using streaming.

Parameters

filepath_or_buffer : str
Path to the JSON file.
params : Dict[str, Any]
Parameters for loading the JSON (data_key).
n_rows : int, optional
Number of rows to preview. Default is 100.

Returns

pd.DataFrame
A DataFrame containing the preview rows.

extract_files(self, file_path: str, temp_path: str) -> str

Defined on BaseDataLoader

Extract a ZIP archive into a subdirectory of temp_path.

Parameters

file_path : str
Path to the ZIP archive to extract.
temp_path : str
Base temporary directory; extraction target is <temp_path>/files/.

Returns

str
Path of the directory containing the extracted files (<temp_path>/files/).

get_schema(cls) -> dict

Defined on ConfigObject

Generates the component related Json Schema.

Returns

dict
Dictionary representing the Json Schema of the component.

prepare_files(self, file_path: str, temp_path: str) -> str

Defined on BaseDataLoader

Resolve a file path or URL into a local path suitable for loading.

Parameters

file_path : str
Path to a local file, a ZIP archive, or an HTTP(S) URL.
temp_path : str
Temporary directory used for extraction of ZIP or URL downloads.

Returns

tuple of (str, str)
(path, type_path) where type_path is "dir" for extracted archives/URLs or "file" for plain local files.

validate_and_transform(self, raw_data: dict) -> dict

Defined on ConfigObject

It takes the data given by the user to initialize the model and returns it with all the objects that the model needs to work.

Parameters

raw_data : dict
A dictionary with the data provided by the user to initialize the model.

Returns

dict
A validated dictionary with the necessary objects.

Compatible with