Skip to main content

CSVDataLoader

DataLoader
DashAI.back.dataloaders.classes.CSVDataLoader

Data loader that ingests tabular data from CSV files into DashAI datasets.

Reads one or more CSV files, optionally samples rows, and splits the result into train/validation/test DashAIDataset splits according to the ratios specified in the schema. The separator is normalised from human-readable aliases ("blank space", "tab") to Python character literals before delegating to pandas.read_csv.

Handles multi-file uploads by concatenating all CSVs before splitting, and supports header detection, column selection, and row skipping via the CSVDataloaderSchema parameters.

Parameters

name : string, default=
Custom name to register your dataset. If no name is specified, the name of the uploaded file will be used.
separator : string, default=,
A separator character delimits the data in a CSV file.
header : string, default=infer
Row number(s) containing column labels and marking the start of the data (zero-indexed). Default behavior is to infer the column names. If column names are passed explicitly, this should be set to '0'. Header can also be a list of integers that specify row locations for MultiIndex on the columns.
names, default=None
Comma-separated list of column names to use. If the file contains a header row, then you should explicitly pass header=0 to override the column names. Example: 'col1,col2,col3'. Leave empty to use file headers.
encoding : string, default=utf-8
Encoding to use for UTF when reading/writing. Most common encodings provided.
na_values, default=None
Comma-separated additional strings to recognize as NA/NaN. Example: 'NULL,missing,n/a'
keep_default_na : boolean, default=True
Whether to include the default NaN values when parsing the data (True recommended).
true_values, default=None
Comma-separated values to consider as True. Example: 'yes,true,1,on'
false_values, default=None
Comma-separated values to consider as False. Example: 'no,false,0,off'
skip_blank_lines : boolean, default=True
If True, skip over blank lines rather than interpreting as NaN values.
skiprows, default=None
Number of data rows to skip after reading the header. Leave empty to skip none.
nrows, default=None
Number of rows to read from the file. Leave empty to read all rows.

Methods

load_data(self, filepath_or_buffer: str, temp_path: str, params: Dict[str, Any], n_sample: int | None = None) -> 'DashAIDataset'

Defined on CSVDataLoader

Load the uploaded CSV files into a DatasetDict.

Parameters

filepath_or_buffer : str, optional
An URL where the dataset is located or a FastAPI/Uvicorn uploaded file object.
temp_path : str
The temporary path where the files will be extracted and then uploaded.
params : Dict[str, Any]
Dict with the dataloader parameters. The options are: - separator (str): The character that delimits the CSV data.
n_sample : int | None
Indicates how many rows load from the dataset, all rows if null.

Returns

DatasetDict
A HuggingFace's Dataset with the loaded data.

load_preview(self, filepath_or_buffer: str, params: Dict[str, Any], n_rows: int = 100)

Defined on CSVDataLoader

Load a preview of the CSV dataset using streaming.

Parameters

filepath_or_buffer : str
Path to the CSV file.
params : Dict[str, Any]
Parameters for loading the CSV (separator, encoding, etc.).
n_rows : int, optional
Number of rows to preview. Default is 100.

Returns

pd.DataFrame
A DataFrame containing the preview rows.

extract_files(self, file_path: str, temp_path: str) -> str

Defined on BaseDataLoader

Extract a ZIP archive into a subdirectory of temp_path.

Parameters

file_path : str
Path to the ZIP archive to extract.
temp_path : str
Base temporary directory; extraction target is <temp_path>/files/.

Returns

str
Path of the directory containing the extracted files (<temp_path>/files/).

get_schema(cls) -> dict

Defined on ConfigObject

Generates the component related Json Schema.

Returns

dict
Dictionary representing the Json Schema of the component.

prepare_files(self, file_path: str, temp_path: str) -> str

Defined on BaseDataLoader

Resolve a file path or URL into a local path suitable for loading.

Parameters

file_path : str
Path to a local file, a ZIP archive, or an HTTP(S) URL.
temp_path : str
Temporary directory used for extraction of ZIP or URL downloads.

Returns

tuple of (str, str)
(path, type_path) where type_path is "dir" for extracted archives/URLs or "file" for plain local files.

validate_and_transform(self, raw_data: dict) -> dict

Defined on ConfigObject

It takes the data given by the user to initialize the model and returns it with all the objects that the model needs to work.

Parameters

raw_data : dict
A dictionary with the data provided by the user to initialize the model.

Returns

dict
A validated dictionary with the necessary objects.

Compatible with