by Etienne Gauthier (Inria, Ecole Normale Supérieure, PSL Research University)
How can we trust a model’s predictions in the presence of uncertainty? Conformal prediction provides a principled framework for attaching reliable confidence guarantees to machine learning outputs. By incorporating e-values, this framework moves beyond rigid, pre-specified guarantees. It enables finer control over the predictions in a dynamic way, adapting the reliability of AI systems to the constraints of real-world applications.
As machine learning models are increasingly deployed in high-stakes environments, such as autonomous navigation, finance, and healthcare, quantifying their uncertainty is just as critical as the predictions themselves. Conformal Prediction (CP) has become a gold standard for this task. Practically, CP acts as a rigorous mathematical wrapper applicable to any predictive model. It relies on a dedicated “calibration set” of held-out data. By evaluating a non-conformity score across both the calibration and test data, and relying on the foundational assumption that these data points are exchangeable, we can easily obtain a prediction set by calculating the associated quantile.
Instead of generating a single, potentially overconfident guess, CP provides a guaranteed set of possibilities. For instance, rather than simply predicting “dog,” an image classifier using CP with a 99% confidence level might output {dog, cat}, guaranteeing that the true label is within this set 99% of the time.
While classical conformal prediction is highly effective, it enforces a fixed confidence level by allowing the size of the prediction set to adapt to the model’s underlying uncertainty. This adaptive behaviour is essential to its guarantees, but in some applications it may lead to unpredictable or undesirable variation in prediction set size.
Consider a motivating example in the medical field: a doctor using an AI diagnostic tool. If the model is highly uncertain about a complex case, classical conformal prediction might output a set of 15 possible diseases to maintain a strict 99% coverage guarantee. Yet in practice, the doctor faces physical, financial, and temporal constraints that make such a broad set difficult to act upon. For example, slightly relaxing the target coverage level (for instance, to 98%) would already lead to a much smaller, more actionable prediction set of around five candidates (see 1). This illustrates the need for more flexible approaches that explicitly trade off coverage and set size in a controlled way, while remaining useful in real-world decision-making.
To address this, we turn to a more flexible statistical framework based on e-values, which enables post-hoc control of uncertainty. This flexibility allows us to move beyond fixed confidence levels and adapt inference after observing the data.
Unlike traditional p-values, which lose their statistical validity if you modify your parameters after observing the data, e-values allow us to audit and adjust our statistical guarantees retrospectively. Importantly, e-values allow us to derive adaptive coverage levels on the fly, without necessitating additional data or complex data splitting. We can seamlessly achieve this simply by using the exact same calibration set already available in classical CP. They give us the mathematical right to look at our predictions first, apply constraints, and then modify our confidence levels post-hoc while maintaining perfect theoretical validity.
Leveraging this post-hoc power, our project has introduced an approach called Backward Conformal Prediction (BCP) [1]. This method allows practitioners to impose a strict “hard constraint”: a maximum allowed size for the prediction set.
If our doctor can only reasonably manage a maximum of 3 clinical exams, BCP ensures the AI will never suggest more than three diseases. Because e-values permit post-hoc significance testing without penalty, we can first limit the set to match our physical constraints, and then retrospectively calculate the highest mathematical confidence level we can safely attribute to that specific, constrained prediction.
In broader operational scenarios, a strict limit on every single prediction might be unnecessarily rigid. This is where adaptive coverage policies [2] come in. Using the same theoretical foundation, this approach applies a “soft constraint,” which instead provides a guarantee on the expected (or average) set size at test time.
For example, a hospital network might require the AI to recommend an average of 3 tests per patient to align with laboratory capacity over a fiscal quarter. E-values allow us to dynamically adjust the post-hoc certainty so that the predictions seamlessly adapt, ensuring the average size constraint is rigorously met across the entire patient population.
We initiated this research in 2025 at Inria, Ecole Normale Supérieure, PSL University in collaboration with my PhD supervisors Francis Bach and Michael I. Jordan. This work represents a step forward from purely theoretical safety to actionable, real-world reliability. Classical conformal prediction provides statistical guarantees, but this may sometimes result in prediction sets that are too large to be practically actionable. Our methods invert this paradigm, starting from actual physical constraints and using the post-hoc flexibility of e-values to extract the best possible statistical guarantee for those specific limits.
Moving forward, we are eager to explore practical deployments of this method across various industrial and clinical pipelines. By bridging the gap between rigorous statistics and operational realities, our ultimate goal is to improve what people do in practice, making trustworthy AI accessible, scalable, and genuinely effective for heavily constrained environments.
References:
[1] E. Gauthier, et al., “Backward Conformal Prediction”, in Advances in Neural Information Processing Systems (2025).
[2] E. Gauthier, F. Bach, and M. I. Jordan, “Adaptive coverage policies in conformal prediction”, in International Conference on Artificial Intelligence and Statistics, 2026.
Please contact:
Etienne Gauthier
Inria, Ecole Normale Supérieure, PSL University, Paris, France

