E-values: Statistical Testing for the 21st Century - Introduction to the Special Theme

by the guest editors Peter Grünwald (CWI and Leiden University, Wouter Koolen (CWI and University of Twente) and Johanna Ziegel (ETH Zurich)

Special Theme: E-values — Cover illustration for the special theme on *e-values*.

As new measurements become available over time, we face the classic problem of updating our information state. In science, this typically means refining our view of hypotheses based on experimental outcomes – either determining if the data allow us to reject a null hypothesis, or estimating which parameter values remain statistically plausible. Anytime-valid methods allow us to reliably refine these assessments sequentially while guaranteeing at most a controlled fraction of mistakes.

How do we quantify and sequentially combine this incoming evidence? E-values provide a concrete answer. In their most basic form, we use them to quantify evidence against a null hypothesis. As is standard in statistics, the null hypothesis is a probabilistic formalization of “nothing special is going on”: the medication under consideration has no effect; the pairs of outcomes we see are independent; the coin we study is unbiased, and so on. An e-variable for such a null hypothesis is a nonnegative random variable with an expectation of at most 1 under the null. When batches of data come in sequentially, we can calculate e-values sequentially for each of them and multiply them, forming what mathematicians call a supermartingale, which now quantifies the total evidence in all data seen so far. A comprehensive statistical theory can be built from these ingredients. One cornerstone of this framework is Ville’s inequality, which strictly bounds the probability of this supermartingale ever exceeding a large threshold under the null. Conversely, ideas from learning theory guide us in designing powerful e-values and supermartingales with optimal growth rates under alternative hypotheses. For example, based on such e-values and supermartingales, one can construct anytime-valid confidence intervals which quantify uncertainty of a quantity of interest such as the strength of an effect (in clinical trials) or the difference in click-through rate for different web page layouts (in A/B testing).

E-values, named for both evidence and expectation, are a recent label for ideas with a much longer history. Closely related objects have appeared repeatedly across several disciplines: as nonnegative martingales and sequential tests in probability and statistics (Ville, Doob, Wald, Robbins); as likelihood ratios in statistics, universal codes in information theory, and Martin-Löf randomness tests in computability theory; and as betting capital in the game-theoretic probability framework of Shafer and Vovk. Parallel developments in concentration inequalities, self-normalized processes, online learning, and adaptive experimentation supplied increasingly powerful methods for constructing and analyzing such processes. The modern theory of e-values may be viewed as a synthesis of these traditions, revealing a common mathematical structure underlying sequential inference, evidence accumulation, and prediction.

While this special issue reflects the highly international nature of the field, researchers at CWI have made major contributions to the development of the modern theory of e-values since the term “e-value” was first introduced in 2019.

The justification for e-values is most sharply revealed in general decision-making problems, where feasibility and optimality considerations force their emergence. We next highlight two remarkable instances of this phenomenon. The first characterization arises in post-hoc decision making. If evidence is to remain valid when reused for arbitrary future analyses and decisions while retaining frequentist guarantees, then it must be an e-variable. In this sense, e-values provide the unique representation of reusable evidence. The second characterization arises in active sequential testing, where learning and experimentation form a feedback loop: current evidence determines the next experiment, whose outcome updates the evidence, and so on. If the goal is to reach a reliable conclusion after as few experiments as possible, then matching lower and upper bounds show that the minimal expected sample size is characterized by e-value growth rates. Thus, e-values also arise as the solution to optimal sequential design problems.

However, e-values offer more than anytime-validity. Any weighted average of e-values is again an e-value. This simple closure property provides a principled way to combine evidence from different sources, even when the data are not generated sequentially in time or are partially dependent, making e-values a fundamental tool in multiple testing. As such, they are considerably more flexible than the classical notion of statistical evidence based on p-values, for which combination over time or under dependence is significantly more delicate.

This flexibility is also reflected in the fact that, unlike classical methods, e-values can provide risk guarantees even when the Type-I loss (roughly, the cost of rejecting a true null hypothesis; strong Type-I guarantees correspond to small significance levels) is not fixed in advance but may itself depend on the data or the decision process.

Of course, flexibility may come at a price. Analyses based on e-values may sometimes require more data than classical fixed-sample procedures. Moreover, for a given statistical problem, there are typically many valid e-values, and it is not always easy to compute the optimal one in terms of the growth rates discussed above. More generally, different tasks lead to different notions of optimality, and no single choice is uniformly best. Which notion is most relevant therefore depends on the context.

Research into e-variables is rapidly advancing the core ideas underlying anytime-valid inference and, more generally, principled combination of evidence. New goals have been introduced, such as e-detectors and notions of asymptotic safety. New optimality criteria have been proposed, including those based on Rényi divergence and concave utility functions. Admissible and optimal e-variables are being characterized more precisely. In addition, connections have been established to other areas, including probability and measure theory, group invariance, nonparametric statistics, game theory and convex duality, information theory, and online learning. This special issue brings together recent developments in the field, highlighting advances across these methodological, conceptual, and technical dimensions.

From Foundations to Applications
More specifically, this special issue covers both foundational and applied aspects of e-values. On the foundational side, contributions address testing by betting (Shafer), the search for optimal e-values in a general setting (Larsson, Ramdas, and Ruf), the relationship between e-values and p-values (Clerico), the role of e-values in resolving counterfactuals (Grünwald), objective probability through sequential mixture learning (Dixit and Martin), and the combination of e-values with conformal prediction to obtain prediction rather than confidence intervals (Gauthier).

Several articles focus on methodology, including safe procedures for multiple testing (de Heide; Ren), the derandomization of such procedures (Monti and Filzmoser), tests for stochastic dominance (Choe and Arnold), tests for sub-Gaussianity (Koolen, Larsson and Agrawal), and near-optimal e-values for testing association (Dickhaus, Giuffrida and Wang).

Applications range from clinical trials (Long and Van Zwet) and replication meta-analyses in psychology (Arias, Ly, Meziu and Reyero-Lobo) to the specification of clinically relevant effect sizes (Koobs and Koning), the analysis of preference data in elections (Tuynman and Mathieu), the assessment of AI systems (Dhillon, Pandeva and Curth) and reinforcement-learning agents (Bongers), and the backtesting of financial risk assessments (Wang). Finally, Michael Lindon, Netflix discusses the use of e-values in the technology industry.

Together, these developments point toward a unified view of statistical evidence in which e-values play the central role. They provide new ways to address old problems, and inspire new challenges to be tackled head-on. We are excited by this prospect, and hope you will be too. Please enjoy this special issue.

Please contact:
Peter Grünwald
CWI and Leiden University, The Netherlands
This email address is being protected from spambots. You need JavaScript enabled to view it.

Wouter Koolen
CWI and University of Twente, The Netherlands
This email address is being protected from spambots. You need JavaScript enabled to view it.

Johanna Ziegel
ETH Zurich, Switzerland
This email address is being protected from spambots. You need JavaScript enabled to view it.