TruncatedSVD

Converter

DashAI.back.converters.scikit_learn.TruncatedSVD

Reduce dimensionality using Truncated Singular Value Decomposition (LSA).

TruncatedSVD performs linear dimensionality reduction by computing the thin SVD of the data matrix X, retaining only the top n_components singular values and their associated left and right singular vectors: X ≈ U_k Σ_k V_k^T. The transformed data is X_new = X V_k.

Unlike PCA, TruncatedSVD does not center the data before decomposition. This is crucial for sparse matrices such as TF-IDF or bag-of-words representations, where centering would introduce a dense intermediate matrix and destroy memory efficiency. In the text-mining community this algorithm is often called Latent Semantic Analysis (LSA).

Key properties:

Works on both dense and sparse input matrices.
No mean centering: safe for high dimensional sparse data.
Supports a randomized solver (fast, approximate) and ARPACK (exact).
The n_oversamples and power_iteration_normalizer parameters control the accuracy-speed trade-off of the randomized solver.

Wraps scikit-learn's TruncatedSVD.

References

[1] https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.TruncatedSVD.html

Parameters

n_components : integer, default=2: Desired dimensionality of output data.
algorithm : string, default=randomized: SVD solver to use.
n_iter : integer, default=5: Number of iterations for randomized SVD solver.
n_oversamples : integer, default=10: Number of power iterations used in randomized SVD solver.
power_iteration_normalizer : string, default=auto: Method to normalize the eigenvectors.
random_state, default=None: Used during randomized svd. Pass an int for reproducible results across multiple function calls.
tol : number, default=0.0: Tolerance for ARPACK.

Methods

get_output_type(self, column_name: str = None) -> DashAI.back.types.dashai_data_type.DashAIDataType

Defined on TruncatedSVD

Return the DashAI data type produced by this converter for a column.

Parameters

column_name : str, optional: Not used; all output columns share the same type. Defaults to None.

Returns

DashAIDataType: A Float type backed by pyarrow.float64().

fit(self, x: 'DashAIDataset', y: Optional[ForwardRef('DashAIDataset')] = None) -> DashAI.back.converters.base_converter.BaseConverter

Defined on SklearnWrapper

Fit the scikit-learn transformer to the data.

Parameters

x : DashAIDataset: The input dataset to fit the transformer on.
y : DashAIDataset, optional: Target values for supervised transformers. Defaults to None.

Returns

BaseConverter: The fitted transformer instance (self).

get_metadata(cls) -> 'Dict[str, Any]'

Defined on BaseConverter

Get metadata for the converter, used by the DashAI frontend.

Parameters

cls : type: The converter class (injected automatically by Python for classmethods).

Returns

Dict[str, Any]: Dictionary containing display name, short description, image preview path, category, icon, color, and whether the converter is supervised.

get_schema(cls) -> dict

Defined on ConfigObject

Generates the component related Json Schema.

Returns

dict: Dictionary representing the Json Schema of the component.

transform(self, x: 'DashAIDataset', y: Optional[ForwardRef('DashAIDataset')] = None) -> 'DashAIDataset'

Defined on SklearnWrapper

Transform the data using the fitted scikit-learn transformer.

Parameters

x : DashAIDataset: The input dataset to transform.
y : DashAIDataset, optional: Not used. Present for API consistency. Defaults to None.

Returns

DashAIDataset: The transformed dataset with updated DashAI column types.

validate_and_transform(self, raw_data: dict) -> dict

Defined on ConfigObject

It takes the data given by the user to initialize the model and returns it with all the objects that the model needs to work.

Parameters

raw_data : dict: A dictionary with the data provided by the user to initialize the model.

Returns

dict: A validated dictionary with the necessary objects.

References​

Parameters​

Methods​

References

Parameters

Methods