Skip to main content

IncrementalPCA

Converter
DashAI.back.converters.scikit_learn.IncrementalPCA

Reduce dimensionality using PCA computed incrementally over mini-batches.

IncrementalPCA (IPCA) implements an online variant of PCA that processes data one batch at a time and updates the component estimates after each batch using a singular value merging strategy. This allows the algorithm to fit datasets that are too large to hold in memory simultaneously, while still converging to results that closely approximate full-batch PCA.

The algorithm maintains a running estimate of the mean and the principal components, merging each new batch with the accumulated SVD from previous batches. When batch_size is None, it defaults to 5 * n_features.

Key properties:

  • Constant memory footprint regardless of dataset size.
  • Supports the partial_fit API for true out-of-core usage.
  • The whiten option rescales components to unit variance, which can improve downstream estimators that assume spherical features.
  • Produces output numerically close to full-batch PCA when the batch size is reasonably large relative to the number of components.

Wraps scikit-learn's IncrementalPCA.

References

Parameters

n_components, default=2
Number of components to keep.
whiten : boolean, default=False
When True the components_ are scaled to ensure uncorrelated outputs with unit variances.
use_copy : boolean, default=True
If False, data passed to fit are overwritten. Use fit_transform(X) instead.
batch_size, default=None
The number of samples to use for each batch.

Methods

changes_row_count(self) -> 'bool'

Defined on BaseConverter

Indicate whether this converter changes the number of dataset rows.

Returns

bool
True if the converter may add or remove rows, False otherwise.

fit(self, x: 'DashAIDataset', y: Optional[ForwardRef('DashAIDataset')] = None) -> DashAI.back.converters.base_converter.BaseConverter

Defined on SklearnWrapper

Fit the scikit-learn transformer to the data.

Parameters

x : DashAIDataset
The input dataset to fit the transformer on.
y : DashAIDataset, optional
Target values for supervised transformers. Defaults to None.

Returns

BaseConverter
The fitted transformer instance (self).

get_metadata(cls) -> 'Dict[str, Any]'

Defined on BaseConverter

Get metadata for the converter, used by the DashAI frontend.

Parameters

cls : type
The converter class (injected automatically by Python for classmethods).

Returns

Dict[str, Any]
Dictionary containing display name, short description, image preview path, category, icon, color, and whether the converter is supervised.

get_output_type(self, column_name: str = None) -> DashAI.back.types.dashai_data_type.DashAIDataType

Defined on SklearnWrapper

Return the DashAI data type produced by this transformer for a column.

Parameters

column_name : str, optional
The name of the column. Defaults to None.

Returns

DashAIDataType
The DashAI data type for the output column.

get_schema(cls) -> dict

Defined on ConfigObject

Generates the component related Json Schema.

Returns

dict
Dictionary representing the Json Schema of the component.

transform(self, x: 'DashAIDataset', y: Optional[ForwardRef('DashAIDataset')] = None) -> 'DashAIDataset'

Defined on SklearnWrapper

Transform the data using the fitted scikit-learn transformer.

Parameters

x : DashAIDataset
The input dataset to transform.
y : DashAIDataset, optional
Not used. Present for API consistency. Defaults to None.

Returns

DashAIDataset
The transformed dataset with updated DashAI column types.

validate_and_transform(self, raw_data: dict) -> dict

Defined on ConfigObject

It takes the data given by the user to initialize the model and returns it with all the objects that the model needs to work.

Parameters

raw_data : dict
A dictionary with the data provided by the user to initialize the model.

Returns

dict
A validated dictionary with the necessary objects.