by Zhimei Ren (University of Pennsylvania)
How can we combine discoveries from multiple FDR-controlling rejection sets without losing statistical validity? In joint work with Rina Foygel Barber, we show that the knockoff procedure can be represented through e-values and e-BH, allowing rejection sets from multiple randomized runs to be aggregated by averaging e-values while preserving FDR control. More broadly, this e-value perspective provides a general framework for merging FDR-controlling procedures and opens new directions for understanding, improving, and aggregating large-scale testing methods.
Modern statistical applications often require testing multiple hypotheses simultaneously – for example, testing which features from a large pool are associated with an outcome of interest, or testing which outcomes are significantly affected by an exposure. In such large-scale testing problems, it is desirable to control the errors among the rejected hypotheses to improve reliability and replicability. A widely used error metric is the false discovery rate (FDR), defined as the expected proportion of the number of false rejections among all rejections.
Since the introduction of FDR, many multiple testing methods have been developed to produce rejection sets with FDR control guarantees. However, it remains unclear how to combine separately obtained FDR-controlling rejection sets while preserving FDR control for the aggregated set. This problem arises in many practical settings: for instance, a randomized multiple testing procedure may produce different rejection sets under different random seeds, or multiple research teams may test the same hypotheses and report different discoveries. How can these discoveries be aggregated in a statistically valid way?
A natural approach is to combine the test statistics underlying the individual rejection sets and then apply a multiple testing procedure to the combined statistics. The challenge is that these test statistics across different runs or teams may exhibit complex dependence structures – for example, when they arise from multiple runs of a randomized procedure or when different teams use overlapping samples in their studies. As a result, the distribution of the combined statistics is generally unclear, making FDR control difficult to establish.
In work joint with Rina Foygel Barber, we study this problem in the context of feature selection with FDR control. Specifically, we consider the knockoff procedure [1], a randomized multiple testing procedure that produces rejection sets with FDR control. To combine rejection sets from different random knockoff runs, we first show that the knockoff procedure can be equivalently represented as the e-BH procedure [3] applied to a set of special e-values. We then average these e-values across runs for each hypothesis and finally apply the e-BH procedure to the resulting aggregated e-values.
Since the e-values are defined as nonnegative random variables with expectation at most one under the null hypothesis, an average of valid e-values remains an e-value. In addition, the e-BH procedure is guaranteed to produce rejection sets with FDR control whenever the individual e-values are valid. Therefore, since averaged knockoff e-values are still valid e-values, the final rejection set obtained by applying e-BH also has FDR control guarantees.
This workflow turns out to extend well beyond the knockoff procedure: it can be shown that any FDR-controlling multiple testing procedure can be equivalently represented as e-BH applied to a suitable collection of e-values. This perspective provides a flexible framework for merging rejection sets without sacrificing FDR control. It also opens several research directions, including the design of more powerful e-value representations for existing procedures, principled methods for efficiently aggregating discoveries across randomized or distributed analyses, and new ways to understand the structure and limitations of FDR-controlling procedures through the lens of e-values.
References:
[1] Z. Ren and R. F. Barber, “Derandomised knockoffs: Leveraging e-values for false discovery rate control”, J. Roy. Stat. Soc. Ser. B (Stat. Methodol.), vol. 86, no. 1, pp. 122–154, 2024.
[2] E. Candès et al., “Panning for gold: ‘Model-X’ knockoffs for high-dimensional controlled variable selection”, J. Roy. Stat. Soc. Ser. B (Stat. Methodol.), vol. 80, no. 3, pp. 551–577, 2018.
[3] R. Wang and A. Ramdas, “False discovery rate control with e-values”, J. Roy. Stat. Soc. Ser. B (Stat. Methodol.), vol. 84, no. 3, pp. 822–852, 2022.
Please contact:
Zhimei Ren
University of Pennsylvania, United States of America

