SelectFpr
Select features whose p-value is below a False Positive Rate threshold.
SelectFpr retains every feature whose p-value, computed by a univariate
scoring function against the target, is strictly less than alpha. Under
the null hypothesis that a feature is independent of the target, the
expected proportion of falsely retained features (false positives) is at
most alpha. No multiple-testing correction is applied; each feature is
tested at the raw significance level.
This selector is the most permissive of the three p-value-based filters (FPR, FDR, FWE). It is appropriate when the cost of missing a true feature outweighs the cost of including a small number of irrelevant ones, and when the number of features is moderate enough that the uncorrected type-I error rate is acceptable.
Key properties:
- Supervised: requires the target array
yat fit time. alphais the significance threshold in [0, 1]; typical values are 0.05 or 0.10.- No correction for multiple comparisons: more liberal than FDR and FWE.
- The number of retained features is data-driven and not fixed in advance.
Wraps scikit-learn's SelectFpr.
References
Parameters
- alpha : number, default=
0.05 - The highest p-value for features to be kept.