Foreword: From P-values to E-values

by Michael I. Jordan

What is an “e-value” and why has it become an object of intense study in statistics and in the allied fields of machine learning, signal processing, and econometrics? To briefly introduce the basic idea, let us consider one of the core problems in statistics – the “hypothesis testing problem” of deciding whether observed data is consistent with some particular data-generating mechanism (often referred to as a “null hypothesis”) or is better explained by another mechanism (referred to as an “alternative hypothesis”). This problem is addressed by defining some function of the data (a “statistic”) whose distribution is as different as possible under the null and the alternative. Given an observed value of such a statistic, one then makes a choice between the two distributions, doing so in a way that minimizes the probability of errors. Classical statistical theory provides a unifying framework – the “p-value” – by which the choice between the null and alternative hypotheses reduces to a thresholding procedure.

E-values provide an alternative to the classical p-value paradigm. Rather than being tail probabilities under the null hypothesis, e-values are nonnegative random variables whose expectation is less than or equal to one under the null hypothesis. Given that tail probabilities and expectations are related by Markov’s inequality, it may seem that p-values and e-values are not so very different. But the key point is that they arise from different perspectives and in different problem settings, and they accordingly have different strengths and weaknesses.

To elaborate and to set the stage for the current collection, note that the focus of p-values is the analysis of an experiment in which a sample size is chosen, a batch of data having that sample size is collected, a p-value is computed, and a decision is made – with no possibility of revisiting the decision. E-values, on the other hand, focus on an online framework in which a stream of data is observed, with no a priori choice of the number of data points. The statistician is viewed as accruing evidence over time with the possibility of making a decision at any point in time. The decisions are thus tentative and revokable.

This online perspective is turned into mathematics by making use of the machinery of martingale theory. One views the statistician as placing bets over time that aim to reveal whether the null hypothesis or the alternative hypothesis is the source of the data stream. The null hypothesis can be interpreted in terms of a casino in which the odds are not in the statistician’s favor, such that her wealth dwindles over time, no matter how the bets are placed. This dwindling behavior can be modeled mathematically with martingale theory – in particular as a nonnegative supermartingale. On the other hand, the alternative hypothesis corresponds to a casino in which the odds are in the statistician’s favor, so that if she places her bets wisely her wealth will increase over time. Thus, under the alternative the wealth is not a martingale; indeed, it can be made to increase exponentially. Statistical hypothesis testing is thereby viewed as the discrimination between a losing betting process that dwindles to zero and a winning process that grows exponentially.

E-values are then defined as the value of the wealth process at a stopping time of the process. It is a consequence of the optional stopping theorem that e-values are bounded in expectation by one under the null hypothesis. Decisions can be made by thresholding e-values.

This online perspective on hypothesis testing arose slowly over many decades in statistics, in parallel with the rise of p-value-based statistics. Its early development came stochastic processes (Ville, Doob), information theory (Kelly, Breiman), and sequential analysis (Wald, Robbins, Darling). Recent years have seen a flowering of research that not only picks up on classical themes, but also revisits a wide range of other problems in statistics that have traditionally been solved via p-values, tackles challenges that p-value-based statistics have not solved convincingly, and, perhaps most importantly, develops e-value-based solutions to modern problems that have arisen in high-dimensional statistics and machine learning. The articles in the current collection provide an appealing point of entry into this fast-moving and impactful literature.

Sidebar

Contents

Foreword: From P-values to E-values