TruncatedSVD
Reduce dimensionality using Truncated Singular Value Decomposition (LSA).
TruncatedSVD performs linear dimensionality reduction by computing the
thin SVD of the data matrix X, retaining only the top n_components
singular values and their associated left and right singular vectors:
X ≈ U_k Σ_k V_k^T. The transformed data is X_new = X V_k.
Unlike PCA, TruncatedSVD does not center the data before decomposition. This is crucial for sparse matrices such as TF-IDF or bag-of-words representations, where centering would introduce a dense intermediate matrix and destroy memory efficiency. In the text-mining community this algorithm is often called Latent Semantic Analysis (LSA).
Key properties:
- Works on both dense and sparse input matrices.
- No mean-centering: safe for high-dimensional sparse data.
- Supports a randomized solver (fast, approximate) and ARPACK (exact).
- The
n_oversamplesandpower_iteration_normalizerparameters control the accuracy-speed trade-off of the randomized solver.
Wraps scikit-learn's TruncatedSVD.
References
Parameters
- n_components : integer, default=
2 - Desired dimensionality of output data.
- algorithm : string, default=
randomized - SVD solver to use.
- n_iter : integer, default=
5 - Number of iterations for randomized SVD solver.
- n_oversamples : integer, default=
10 - Number of power iterations used in randomized SVD solver.
- power_iteration_normalizer : string, default=
auto - Method to normalize the eigenvectors.
- random_state, default=
None - Used during randomized svd. Pass an int for reproducible results across multiple function calls.
- tol : number, default=
0.0 - Tolerance for ARPACK.
Methods
get_output_type(self, column_name: str = None) -> DashAI.back.types.dashai_data_type.DashAIDataType
TruncatedSVDReturn the DashAI data type produced by this converter for a column.
Parameters
- column_name : str, optional
- Not used; all output columns share the same type. Defaults to None.
Returns
- DashAIDataType
- A Float type backed by
pyarrow.float64().
changes_row_count(self) -> 'bool'
BaseConverterIndicate whether this converter changes the number of dataset rows.
Returns
- bool
- True if the converter may add or remove rows, False otherwise.
fit(self, x: 'DashAIDataset', y: Optional[ForwardRef('DashAIDataset')] = None) -> DashAI.back.converters.base_converter.BaseConverter
SklearnWrapperFit the scikit-learn transformer to the data.
Parameters
- x : DashAIDataset
- The input dataset to fit the transformer on.
- y : DashAIDataset, optional
- Target values for supervised transformers. Defaults to None.
Returns
- BaseConverter
- The fitted transformer instance (self).
get_metadata(cls) -> 'Dict[str, Any]'
BaseConverterGet metadata for the converter, used by the DashAI frontend.
Parameters
- cls : type
- The converter class (injected automatically by Python for classmethods).
Returns
- Dict[str, Any]
- Dictionary containing display name, short description, image preview path, category, icon, color, and whether the converter is supervised.
get_schema(cls) -> dict
ConfigObjectGenerates the component related Json Schema.
Returns
- dict
- Dictionary representing the Json Schema of the component.
transform(self, x: 'DashAIDataset', y: Optional[ForwardRef('DashAIDataset')] = None) -> 'DashAIDataset'
SklearnWrapperTransform the data using the fitted scikit-learn transformer.
Parameters
- x : DashAIDataset
- The input dataset to transform.
- y : DashAIDataset, optional
- Not used. Present for API consistency. Defaults to None.
Returns
- DashAIDataset
- The transformed dataset with updated DashAI column types.
validate_and_transform(self, raw_data: dict) -> dict
ConfigObjectIt takes the data given by the user to initialize the model and returns it with all the objects that the model needs to work.
Parameters
- raw_data : dict
- A dictionary with the data provided by the user to initialize the model.
Returns
- dict
- A validated dictionary with the necessary objects.