Skip to main content

ExcelDataLoader

DataLoader
DashAI.back.dataloaders.classes.ExcelDataLoader

Data loader that ingests tabular data from Excel workbooks into DashAI datasets.

Reads .xlsx / .xls files, optionally selecting a specific sheet, samples rows, and splits the result into train/validation/test DashAIDataset splits. Delegates to pandas.read_excel after normalising the schema parameters (sheet name/index, header row, column selection, row limits).

Handles multi-file uploads by concatenating all workbooks before splitting.

Parameters

name : string, default=
Custom name to register your dataset. If no name is specified, the name of the uploaded file will be used.
sheet, default=0
The name of the sheet to read or its zero-based index. If a string is provided, the reader will search for a sheet named exactly as the string. If an integer is provided, the reader will select the sheet at the corresponding index. By default, the first sheet will be read.
header, default=0
The row number where the column names are located, indexed from 0. If null, the file will be considered to have no column names.
usecols, default=None
If None, the reader will load all columns. If str, then indicates comma separated list of Excel column letters and column ranges (e.g. "A:E" or "A,C,E:F"). Ranges are inclusive of both sides.
skiprows, default=None
Number of rows to skip at the start of the file. Leave empty to not skip any rows.
nrows, default=None
Number of rows to read. Leave empty to read all rows.
names, default=None
Comma-separated list of column names to use. Example: 'col1,col2,col3'. Leave empty to use header row.
na_values, default=None
Comma-separated additional strings to recognize as NA/NaN. Example: 'NA,N/A,null'.
keep_default_na : boolean, default=True
Whether to include the default NaN values when parsing the data.
true_values, default=None
Comma-separated values to consider as True. Example: 'yes,true,1'.
false_values, default=None
Comma-separated values to consider as False. Example: 'no,false,0'.

Methods

load_data(self, filepath_or_buffer: str, temp_path: str, params: Dict[str, Any], n_sample: int | None = None) -> 'DashAIDataset'

Defined on ExcelDataLoader

Load the uploaded Excel files into a DatasetDict.

Parameters

filepath_or_buffer : str
An URL where the dataset is located or a FastAPI/Uvicorn uploaded file object.
temp_path : str
The temporary path where the files will be extracted and then uploaded.
params : Dict[str, Any]
Dict with the dataloader parameters.
n_sample : int | None
Indicates how many rows load from the dataset, all rows if null.

Returns

DatasetDict
A HuggingFace's Dataset with the loaded data.

load_preview(self, filepath_or_buffer: str, params: Dict[str, Any], n_rows: int = 10)

Defined on ExcelDataLoader

Load a preview of the Excel dataset.

Parameters

filepath_or_buffer : str
Path to the Excel file.
params : Dict[str, Any]
Parameters for loading Excel (sheet, header, etc.).
n_rows : int, optional
Number of rows to preview. Default is 10.

Returns

pd.DataFrame
A DataFrame containing the preview rows.

extract_files(self, file_path: str, temp_path: str) -> str

Defined on BaseDataLoader

Extract a ZIP archive into a subdirectory of temp_path.

Parameters

file_path : str
Path to the ZIP archive to extract.
temp_path : str
Base temporary directory; extraction target is <temp_path>/files/.

Returns

str
Path of the directory containing the extracted files (<temp_path>/files/).

get_schema(cls) -> dict

Defined on ConfigObject

Generates the component related Json Schema.

Returns

dict
Dictionary representing the Json Schema of the component.

prepare_files(self, file_path: str, temp_path: str) -> str

Defined on BaseDataLoader

Resolve a file path or URL into a local path suitable for loading.

Parameters

file_path : str
Path to a local file, a ZIP archive, or an HTTP(S) URL.
temp_path : str
Temporary directory used for extraction of ZIP or URL downloads.

Returns

tuple of (str, str)
(path, type_path) where type_path is "dir" for extracted archives/URLs or "file" for plain local files.

validate_and_transform(self, raw_data: dict) -> dict

Defined on ConfigObject

It takes the data given by the user to initialize the model and returns it with all the objects that the model needs to work.

Parameters

raw_data : dict
A dictionary with the data provided by the user to initialize the model.

Returns

dict
A validated dictionary with the necessary objects.

Compatible with