OneHotEncoder
Encode categorical columns as binary indicator (one-hot) vectors.
For each input feature column every unique category value becomes a
separate binary output column. Given a feature with k categories the
encoding produces k columns (or k - 1 when drop is set) where
exactly one column is 1 and the rest are 0:
- Nominal categories without order — one-hot encoding treats all categories as equidistant, which is appropriate for unordered labels such as city names or product types.
- Avoiding the dummy-variable trap — the
dropparameter can remove one indicator column per feature so that the resulting matrix has full rank, which is required by unregularized linear models. - Infrequent categories —
min_frequencyandmax_categoriescan group rare values into a singleinfrequent_categoriesbin, reducing dimensionality.
The total number of output columns equals the sum of unique category counts across all encoded input columns (minus dropped columns).
References
Parameters
- categories : string, default=
auto - The categories of each feature.
- drop, default=
None - Specifies a methodology to drop one of the categories per feature.
- dtype : string, default=
np.float64 - Desired dtype of output.
- handle_unknown : string, default=
error - How to handle unknown categories during transform.
- min_frequency, default=
None - Minimum frequency of a category to be considered as frequent.
- max_categories, default=
None - Maximum number of categories to encode.
- feature_name_combiner : string, default=
concat - Method used to combine feature names.
Methods
get_output_type(self, column_name: str = None) -> DashAI.back.types.dashai_data_type.DashAIDataType
OneHotEncoderReturn the DashAI data type produced by this converter for a column.
Parameters
- column_name : str, optional
- Not used; all output columns share the same type. Defaults to None.
Returns
- DashAIDataType
- An Integer type backed by
pyarrow.int64(), representing the binary indicator values (0 or 1).
changes_row_count(self) -> 'bool'
BaseConverterIndicate whether this converter changes the number of dataset rows.
Returns
- bool
- True if the converter may add or remove rows, False otherwise.
fit(self, x: 'DashAIDataset', y: Optional[ForwardRef('DashAIDataset')] = None) -> DashAI.back.converters.base_converter.BaseConverter
SklearnWrapperFit the scikit-learn transformer to the data.
Parameters
- x : DashAIDataset
- The input dataset to fit the transformer on.
- y : DashAIDataset, optional
- Target values for supervised transformers. Defaults to None.
Returns
- BaseConverter
- The fitted transformer instance (self).
get_metadata(cls) -> 'Dict[str, Any]'
BaseConverterGet metadata for the converter, used by the DashAI frontend.
Parameters
- cls : type
- The converter class (injected automatically by Python for classmethods).
Returns
- Dict[str, Any]
- Dictionary containing display name, short description, image preview path, category, icon, color, and whether the converter is supervised.
get_schema(cls) -> dict
ConfigObjectGenerates the component related Json Schema.
Returns
- dict
- Dictionary representing the Json Schema of the component.
transform(self, x: 'DashAIDataset', y: Optional[ForwardRef('DashAIDataset')] = None) -> 'DashAIDataset'
SklearnWrapperTransform the data using the fitted scikit-learn transformer.
Parameters
- x : DashAIDataset
- The input dataset to transform.
- y : DashAIDataset, optional
- Not used. Present for API consistency. Defaults to None.
Returns
- DashAIDataset
- The transformed dataset with updated DashAI column types.
validate_and_transform(self, raw_data: dict) -> dict
ConfigObjectIt takes the data given by the user to initialize the model and returns it with all the objects that the model needs to work.
Parameters
- raw_data : dict
- A dictionary with the data provided by the user to initialize the model.
Returns
- dict
- A validated dictionary with the necessary objects.