Skip to main content

PCA

Converter
DashAI.back.converters.scikit_learn.PCA

Reduce dimensionality using Principal Component Analysis (PCA).

PCA finds a set of orthogonal axes (principal components) that successively capture the greatest amount of variance in the data. Given a centered data matrix X of shape (n_samples, n_features), the method computes the eigen-decomposition of the covariance matrix X^T X / (n-1), retaining only the top n_components eigenvectors. The data are then projected onto this lower-dimensional subspace.

PCA is well suited for preprocessing high-dimensional continuous data before applying machine learning models, for visualisation of multivariate datasets, and for noise reduction. The whiten option rescales each component to unit variance, which can improve the performance of downstream estimators that assume spherical features (e.g. RBF-kernel SVMs).

Key properties:

  • Linear, unsupervised transformation.
  • Components are ordered by descending explained variance.
  • Setting n_components to a float in (0, 1) automatically selects the number of components needed to explain that fraction of total variance.
  • n_components='mle' uses Minka's MLE to estimate the intrinsic dimensionality of the data.
  • Supports full, randomized, and ARPACK solvers for scalability.

Wraps scikit-learn's PCA.

References

Parameters

n_components, default=2
Number of components to keep. If None, all components are kept.
use_copy : boolean, default=True
If False, data passed to fit are overwritten. Use fit_transform(X) instead of fit(X).transform(X).
whiten : boolean, default=False
When True the components_ are scaled to ensure uncorrelated outputs with unit variances. May improve downstream estimators.
svd_solver : string, default=auto
Solver to use for eigendecomposition. 'auto' elige el más apropiado según los datos.
tol : number, default=0.0
Tolerance for singular values when svd_solver == 'arpack'.
iterated_power, default=auto
Number of iterations for the power method when svd_solver == 'randomized'.
n_oversamples : integer, default=10
Number of power iterations used when svd_solver == 'randomized'.
power_iteration_normalizer, default=auto
How the power iteration normalizer should be computed: 'auto', QR o LU. No usado por ARPACK.
random_state, default=None
Used when 'arpack' or 'randomized' solvers are used. Pass an int for reproducible results.

Methods

get_output_type(self, column_name: str = None) -> DashAI.back.types.dashai_data_type.DashAIDataType

Defined on PCA

Return the DashAI data type produced by this converter for a column.

Parameters

column_name : str, optional
Not used; all output columns share the same type. Defaults to None.

Returns

DashAIDataType
A Float type backed by pyarrow.float64().

changes_row_count(self) -> 'bool'

Defined on BaseConverter

Indicate whether this converter changes the number of dataset rows.

Returns

bool
True if the converter may add or remove rows, False otherwise.

fit(self, x: 'DashAIDataset', y: Optional[ForwardRef('DashAIDataset')] = None) -> DashAI.back.converters.base_converter.BaseConverter

Defined on SklearnWrapper

Fit the scikit-learn transformer to the data.

Parameters

x : DashAIDataset
The input dataset to fit the transformer on.
y : DashAIDataset, optional
Target values for supervised transformers. Defaults to None.

Returns

BaseConverter
The fitted transformer instance (self).

get_metadata(cls) -> 'Dict[str, Any]'

Defined on BaseConverter

Get metadata for the converter, used by the DashAI frontend.

Parameters

cls : type
The converter class (injected automatically by Python for classmethods).

Returns

Dict[str, Any]
Dictionary containing display name, short description, image preview path, category, icon, color, and whether the converter is supervised.

get_schema(cls) -> dict

Defined on ConfigObject

Generates the component related Json Schema.

Returns

dict
Dictionary representing the Json Schema of the component.

transform(self, x: 'DashAIDataset', y: Optional[ForwardRef('DashAIDataset')] = None) -> 'DashAIDataset'

Defined on SklearnWrapper

Transform the data using the fitted scikit-learn transformer.

Parameters

x : DashAIDataset
The input dataset to transform.
y : DashAIDataset, optional
Not used. Present for API consistency. Defaults to None.

Returns

DashAIDataset
The transformed dataset with updated DashAI column types.

validate_and_transform(self, raw_data: dict) -> dict

Defined on ConfigObject

It takes the data given by the user to initialize the model and returns it with all the objects that the model needs to work.

Parameters

raw_data : dict
A dictionary with the data provided by the user to initialize the model.

Returns

dict
A validated dictionary with the necessary objects.