PCA
Reduce dimensionality using Principal Component Analysis (PCA).
PCA finds a set of orthogonal axes (principal components) that successively
capture the greatest amount of variance in the data. Given a centered data
matrix X of shape (n_samples, n_features), the method computes the
eigen-decomposition of the covariance matrix X^T X / (n-1), retaining only
the top n_components eigenvectors. The data are then projected onto this
lower-dimensional subspace.
PCA is well suited for preprocessing high-dimensional continuous data before
applying machine learning models, for visualisation of multivariate datasets,
and for noise reduction. The whiten option rescales each component to
unit variance, which can improve the performance of downstream estimators that
assume spherical features (e.g. RBF-kernel SVMs).
Key properties:
- Linear, unsupervised transformation.
- Components are ordered by descending explained variance.
- Setting
n_componentsto a float in (0, 1) automatically selects the number of components needed to explain that fraction of total variance. n_components='mle'uses Minka's MLE to estimate the intrinsic dimensionality of the data.- Supports full, randomized, and ARPACK solvers for scalability.
Wraps scikit-learn's PCA.
References
- [1] https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
- [2] Pearson, K. (1901). "On lines and planes of closest fit to systems of points in space." Philosophical Magazine, 2(11), 559-572.
- [3] Hotelling, H. (1933). "Analysis of a complex of statistical variables into principal components." Journal of Educational Psychology, 24(6), 417-441.
Parameters
- n_components, default=
2 - Number of components to keep. If None, all components are kept.
- use_copy : boolean, default=
True - If False, data passed to fit are overwritten. Use fit_transform(X) instead of fit(X).transform(X).
- whiten : boolean, default=
False - When True the components_ are scaled to ensure uncorrelated outputs with unit variances. May improve downstream estimators.
- svd_solver : string, default=
auto - Solver to use for eigendecomposition. 'auto' elige el más apropiado según los datos.
- tol : number, default=
0.0 - Tolerance for singular values when svd_solver == 'arpack'.
- iterated_power, default=
auto - Number of iterations for the power method when svd_solver == 'randomized'.
- n_oversamples : integer, default=
10 - Number of power iterations used when svd_solver == 'randomized'.
- power_iteration_normalizer, default=
auto - How the power iteration normalizer should be computed: 'auto', QR o LU. No usado por ARPACK.
- random_state, default=
None - Used when 'arpack' or 'randomized' solvers are used. Pass an int for reproducible results.
Methods
get_output_type(self, column_name: str = None) -> DashAI.back.types.dashai_data_type.DashAIDataType
PCAReturn the DashAI data type produced by this converter for a column.
Parameters
- column_name : str, optional
- Not used; all output columns share the same type. Defaults to None.
Returns
- DashAIDataType
- A Float type backed by
pyarrow.float64().
changes_row_count(self) -> 'bool'
BaseConverterIndicate whether this converter changes the number of dataset rows.
Returns
- bool
- True if the converter may add or remove rows, False otherwise.
fit(self, x: 'DashAIDataset', y: Optional[ForwardRef('DashAIDataset')] = None) -> DashAI.back.converters.base_converter.BaseConverter
SklearnWrapperFit the scikit-learn transformer to the data.
Parameters
- x : DashAIDataset
- The input dataset to fit the transformer on.
- y : DashAIDataset, optional
- Target values for supervised transformers. Defaults to None.
Returns
- BaseConverter
- The fitted transformer instance (self).
get_metadata(cls) -> 'Dict[str, Any]'
BaseConverterGet metadata for the converter, used by the DashAI frontend.
Parameters
- cls : type
- The converter class (injected automatically by Python for classmethods).
Returns
- Dict[str, Any]
- Dictionary containing display name, short description, image preview path, category, icon, color, and whether the converter is supervised.
get_schema(cls) -> dict
ConfigObjectGenerates the component related Json Schema.
Returns
- dict
- Dictionary representing the Json Schema of the component.
transform(self, x: 'DashAIDataset', y: Optional[ForwardRef('DashAIDataset')] = None) -> 'DashAIDataset'
SklearnWrapperTransform the data using the fitted scikit-learn transformer.
Parameters
- x : DashAIDataset
- The input dataset to transform.
- y : DashAIDataset, optional
- Not used. Present for API consistency. Defaults to None.
Returns
- DashAIDataset
- The transformed dataset with updated DashAI column types.
validate_and_transform(self, raw_data: dict) -> dict
ConfigObjectIt takes the data given by the user to initialize the model and returns it with all the objects that the model needs to work.
Parameters
- raw_data : dict
- A dictionary with the data provided by the user to initialize the model.
Returns
- dict
- A validated dictionary with the necessary objects.