SMOTEENNConverter
Hybrid sampler combining SMOTE oversampling with Edited Nearest Neighbours cleaning.
SMOTE-ENN is a two-stage resampling strategy for imbalanced classification:
- Over-sampling — SMOTE generates synthetic minority-class examples by interpolating between each minority sample and its k-nearest minority neighbours, increasing the minority class size.
- Cleaning — Edited Nearest Neighbours (ENN) removes any sample (from either class) whose class label disagrees with the majority vote of its nearest neighbours, reducing class overlap and borderline noise.
The combined effect is a more balanced and cleaner decision boundary than either
technique alone. All schema parameters are forwarded to imbalanced-learn's
SMOTEENN and its internal SMOTE sub-estimator.
References
- [1] Chawla, N.V. et al. (2002). "SMOTE: Synthetic Minority Over-sampling Technique." JAIR, 16, 321-357. https://arxiv.org/abs/1106.1813
- [2] Batista, G.E.A.P.A. et al. (2004). "A study of the behaviour of several methods for balancing machine learning training data." ACM SIGKDD Explorations, 6(1), 20-29.
- [3] https://imbalanced-learn.org/stable/references/generated/imblearn.combine.SMOTEENN.html
Parameters
- sampling_strategy, default=
auto - Sampling strategy to apply SMOTE and clean the dataset.
- random_state, default=
None - Seed used for reproducibility.
- k_neighbors : integer, default=
5 - Number of neighbors used by SMOTE.
Methods
get_output_type(self, column_name: str = None) -> DashAI.back.types.dashai_data_type.DashAIDataType
SMOTEENNConverterNot implemented; type preservation is handled in transform.
Parameters
- column_name : str or None, optional
- Name of the column whose output type is queried. Ignored because this method always raises. Default
None.
changes_row_count(self) -> bool
ImbalancedLearnWrapperReturn True because all samplers add or remove rows.
Returns
- bool
- Always
True.
fit(self, x: 'DashAIDataset', y: 'DashAIDataset') -> Type[DashAI.back.converters.base_converter.BaseConverter]
ImbalancedLearnWrapperResample the dataset by calling fit_resample and store the result.
Parameters
- x : DashAIDataset
- The input feature dataset.
- y : DashAIDataset
- The target label dataset (required; must be non-empty).
Returns
- Type[BaseConverter]
- The fitted sampler instance (self).
get_metadata(cls) -> 'Dict[str, Any]'
BaseConverterGet metadata for the converter, used by the DashAI frontend.
Parameters
- cls : type
- The converter class (injected automatically by Python for classmethods).
Returns
- Dict[str, Any]
- Dictionary containing display name, short description, image preview path, category, icon, color, and whether the converter is supervised.
get_schema(cls) -> dict
ConfigObjectGenerates the component related Json Schema.
Returns
- dict
- Dictionary representing the Json Schema of the component.
transform(self, x: 'DashAIDataset', y: Optional[ForwardRef('DashAIDataset')] = None) -> 'DashAIDataset'
ImbalancedLearnWrapperReturn the resampled dataset stored during fit.
Parameters
- x : DashAIDataset
- The original feature dataset (used only for type information).
- y : DashAIDataset, optional
- The original target dataset (used only for type information). Defaults to None.
Returns
- DashAIDataset
- The combined resampled dataset (features + target) produced by
fit.
validate_and_transform(self, raw_data: dict) -> dict
ConfigObjectIt takes the data given by the user to initialize the model and returns it with all the objects that the model needs to work.
Parameters
- raw_data : dict
- A dictionary with the data provided by the user to initialize the model.
Returns
- dict
- A validated dictionary with the necessary objects.