IncrementalPCA
Reduce dimensionality using PCA computed incrementally over mini-batches.
IncrementalPCA (IPCA) implements an online variant of PCA that processes data one batch at a time and updates the component estimates after each batch using a singular value merging strategy. This allows the algorithm to fit datasets that are too large to hold in memory simultaneously, while still converging to results that closely approximate full-batch PCA.
The algorithm maintains a running estimate of the mean and the principal
components, merging each new batch with the accumulated SVD from previous
batches. When batch_size is None, it defaults to 5 * n_features.
Key properties:
- Constant memory footprint regardless of dataset size.
- Supports the
partial_fitAPI for true out-of-core usage. - The
whitenoption rescales components to unit variance, which can improve downstream estimators that assume spherical features. - Produces output numerically close to full-batch PCA when the batch size is reasonably large relative to the number of components.
Wraps scikit-learn's IncrementalPCA.
References
Parameters
- n_components, default=
2 - Number of components to keep.
- whiten : boolean, default=
False - When True the components_ are scaled to ensure uncorrelated outputs with unit variances.
- use_copy : boolean, default=
True - If False, data passed to fit are overwritten. Use fit_transform(X) instead.
- batch_size, default=
None - The number of samples to use for each batch.
Methods
changes_row_count(self) -> 'bool'
BaseConverterIndicate whether this converter changes the number of dataset rows.
Returns
- bool
- True if the converter may add or remove rows, False otherwise.
fit(self, x: 'DashAIDataset', y: Optional[ForwardRef('DashAIDataset')] = None) -> DashAI.back.converters.base_converter.BaseConverter
SklearnWrapperFit the scikit-learn transformer to the data.
Parameters
- x : DashAIDataset
- The input dataset to fit the transformer on.
- y : DashAIDataset, optional
- Target values for supervised transformers. Defaults to None.
Returns
- BaseConverter
- The fitted transformer instance (self).
get_metadata(cls) -> 'Dict[str, Any]'
BaseConverterGet metadata for the converter, used by the DashAI frontend.
Parameters
- cls : type
- The converter class (injected automatically by Python for classmethods).
Returns
- Dict[str, Any]
- Dictionary containing display name, short description, image preview path, category, icon, color, and whether the converter is supervised.
get_output_type(self, column_name: str = None) -> DashAI.back.types.dashai_data_type.DashAIDataType
SklearnWrapperReturn the DashAI data type produced by this transformer for a column.
Parameters
- column_name : str, optional
- The name of the column. Defaults to None.
Returns
- DashAIDataType
- The DashAI data type for the output column.
get_schema(cls) -> dict
ConfigObjectGenerates the component related Json Schema.
Returns
- dict
- Dictionary representing the Json Schema of the component.
transform(self, x: 'DashAIDataset', y: Optional[ForwardRef('DashAIDataset')] = None) -> 'DashAIDataset'
SklearnWrapperTransform the data using the fitted scikit-learn transformer.
Parameters
- x : DashAIDataset
- The input dataset to transform.
- y : DashAIDataset, optional
- Not used. Present for API consistency. Defaults to None.
Returns
- DashAIDataset
- The transformed dataset with updated DashAI column types.
validate_and_transform(self, raw_data: dict) -> dict
ConfigObjectIt takes the data given by the user to initialize the model and returns it with all the objects that the model needs to work.
Parameters
- raw_data : dict
- A dictionary with the data provided by the user to initialize the model.
Returns
- dict
- A validated dictionary with the necessary objects.