ExcelDataLoader
Data loader that ingests tabular data from Excel workbooks into DashAI datasets.
Reads .xlsx / .xls files, optionally selecting a specific sheet,
samples rows, and splits the result into train/validation/test
DashAIDataset splits. Delegates to pandas.read_excel after
normalising the schema parameters (sheet name/index, header row, column
selection, row limits).
Handles multi-file uploads by concatenating all workbooks before splitting.
Parameters
- name : string, default=
- Custom name to register your dataset. If no name is specified, the name of the uploaded file will be used.
- sheet, default=
0 - The name of the sheet to read or its zero-based index. If a string is provided, the reader will search for a sheet named exactly as the string. If an integer is provided, the reader will select the sheet at the corresponding index. By default, the first sheet will be read.
- header, default=
0 - The row number where the column names are located, indexed from 0. If null, the file will be considered to have no column names.
- usecols, default=
None - If None, the reader will load all columns. If str, then indicates comma separated list of Excel column letters and column ranges (e.g. "A:E" or "A,C,E:F"). Ranges are inclusive of both sides.
- skiprows, default=
None - Number of rows to skip at the start of the file. Leave empty to not skip any rows.
- nrows, default=
None - Number of rows to read. Leave empty to read all rows.
- names, default=
None - Comma-separated list of column names to use. Example: 'col1,col2,col3'. Leave empty to use header row.
- na_values, default=
None - Comma-separated additional strings to recognize as NA/NaN. Example: 'NA,N/A,null'.
- keep_default_na : boolean, default=
True - Whether to include the default NaN values when parsing the data.
- true_values, default=
None - Comma-separated values to consider as True. Example: 'yes,true,1'.
- false_values, default=
None - Comma-separated values to consider as False. Example: 'no,false,0'.
Methods
load_data(self, filepath_or_buffer: str, temp_path: str, params: Dict[str, Any], n_sample: int | None = None) -> 'DashAIDataset'
ExcelDataLoaderLoad the uploaded Excel files into a DatasetDict.
Parameters
- filepath_or_buffer : str
- An URL where the dataset is located or a FastAPI/Uvicorn uploaded file object.
- temp_path : str
- The temporary path where the files will be extracted and then uploaded.
- params : Dict[str, Any]
- Dict with the dataloader parameters.
- n_sample : int | None
- Indicates how many rows load from the dataset, all rows if null.
Returns
- DatasetDict
- A HuggingFace's Dataset with the loaded data.
load_preview(self, filepath_or_buffer: str, params: Dict[str, Any], n_rows: int = 10)
ExcelDataLoaderLoad a preview of the Excel dataset.
Parameters
- filepath_or_buffer : str
- Path to the Excel file.
- params : Dict[str, Any]
- Parameters for loading Excel (sheet, header, etc.).
- n_rows : int, optional
- Number of rows to preview. Default is 10.
Returns
- pd.DataFrame
- A DataFrame containing the preview rows.
extract_files(self, file_path: str, temp_path: str) -> str
BaseDataLoaderExtract a ZIP archive into a subdirectory of temp_path.
Parameters
- file_path : str
- Path to the ZIP archive to extract.
- temp_path : str
- Base temporary directory; extraction target is
<temp_path>/files/.
Returns
- str
- Path of the directory containing the extracted files (
<temp_path>/files/).
get_schema(cls) -> dict
ConfigObjectGenerates the component related Json Schema.
Returns
- dict
- Dictionary representing the Json Schema of the component.
prepare_files(self, file_path: str, temp_path: str) -> str
BaseDataLoaderResolve a file path or URL into a local path suitable for loading.
Parameters
- file_path : str
- Path to a local file, a ZIP archive, or an HTTP(S) URL.
- temp_path : str
- Temporary directory used for extraction of ZIP or URL downloads.
Returns
- tuple of (str, str)
(path, type_path)wheretype_pathis"dir"for extracted archives/URLs or"file"for plain local files.
validate_and_transform(self, raw_data: dict) -> dict
ConfigObjectIt takes the data given by the user to initialize the model and returns it with all the objects that the model needs to work.
Parameters
- raw_data : dict
- A dictionary with the data provided by the user to initialize the model.
Returns
- dict
- A validated dictionary with the necessary objects.