The Salto Project: Static Analysis of OCaml Programs by Abstract Interpretation

by Pierre Lermusiaux and Benoît Montagu (Inria)

Functional programming languages, such as OCaml, take advantage of strong static guarantees provided by their type checkers that ensure that well-typed programs cannot “go wrong”. However many correctness and safety properties escape the scope of the guarantees provided by the type system. In order to provide some additional guarantees, the Salto project leverages static analysis techniques, mainly based on abstract interpretation, to develop tools that help OCaml programmers increase the confidence in their programs.

With the wide-spread use of cyber-systems, the programs controlling these systems have become the keepers of our personal data. Malfunctioning programs expose all these systems to attacks and faults that can put our data and our lives at risk. Thus, the questions of safety and correctness of the programs and protocols used in critical systems are now essential issues for the future of information technologies. Providing formal guarantees on the behaviour of programs is, however, a challenging problem that often requires advanced scientific knowledge, which limits the adoption of software analysis tools in industry.

While dynamic techniques based on tests and simulations have been preferred so far in industry, the adoption of abstract-interpretation-based tools, such as Astrée [1], a static analyser for C programs, have proven the interest and applicability of static and formal analysis techniques. Recently, NomadicLabs [L3] has shown interest in using software analysis tools to improve the confidence on the large OCaml code base behind the Tezos crypto-currency. This prompted a collaboration between NomadicLabs and Inria to fund the Salto project [L1] that officially started in November 2022.

OCaml [L2] is a functional programming language with automatic memory management, which is developed and maintained by Inria researchers. It’s strong type system guarantees that a well-typed OCaml programs are memory safe, i.e. that data is used in a consistent way by the program. However, potential errors, such as mishandled exceptions, arithmetic overflow, out-of-bound accesses, and the detection of undefined behaviours escape the scope of these guarantees. The goal of the Salto project is to define sound static analysis techniques to formally detect such cases.

The formal verification of non-trivial properties of a program is a notoriously undecidable problem, meaning that there is no general method able to systematically prove or disprove semantics properties of a program. To circumvent this formal limitation of static analyses, the abstract interpretation framework proposes to compute an over-approximation of a program’s reachable states. An abstract interpreter can do so efficiently by relying on the notion of abstract domains that define over-approximations of the sets of possible states of a program, called abstract values, and the abstractions of the semantic operations of a program on these abstract values.

This approach allows abstract interpreters, like Astrée and Salto, to accurately detect all executions that might contradict the desired semantic properties (i.e. the analysis is sound), while sometimes raising false alarms about executions that cannot happen in reality (see Figure 1). The goal of an abstract interpreter is therefore to be as precise as possible to minimise the number of false alarms, while being efficient enough to analyse programs in a reasonable time. In the context of functional languages such as OCaml, applying the abstract interpretation framework is particularly challenging, because data flow and control flow are interdependent: the control flow of programs is dynamically derived from the application of functional terms, whose values generally depend on all evaluations performed before. While other work [3] has relied on techniques such as inference of types and effects to statically analyse OCaml programs, our work on Salto constitutes one of the first efforts to propose a wide-scale analysis of OCaml programs based on abstract interpretation.

Figure 1: Representation of three possible analysis results.

To define and implement this static analyser, the Salto project has led to the definition of an intermediate language, where redundant features of the OCaml language are conflated, and disambiguation of pattern-matching is performed. This intermediate language simplifies both the definition of the semantics of programs, and the implementation of our abstract interpreter. The abstract interpreter relies on a novel abstract domain that can represent recursively defined sets of values, including function closures. This abstract domain features a dual representation of abstract values, that can be viewed both as tree terms, which can be hash-consed and memoised to ensure the analysis performance, as well as graphs, in order to implement tree-automaton-inspired operations on the set of values they represent.

The Salto project has reached a first milestone with the implementation of a static analyser for OCaml programs, parameterised over the abstract domains of OCaml primitive data-types, that can detect uncaught exceptions accurately [2], and arithmetical errors, such as integer overflow. This analyser has been tested on a large test suite, and already supports a large subset of features of OCaml, including dynamically extensible data-types, mutable records, and the module system. The support in Salto of more advanced OCaml features is under active development (such as recursive non-functional terms and modules), that will permit to analyse the Tezos code base ultimately. Finally, we believe that the Salto abstract interpreter constitutes a solid basis to experiment with analyses that detect more advanced security issues in OCaml programs, such as undesirable side effects, resource leaks, or cases where the behaviour of a program may depend on the evaluation order chosen by the compiler.

Links:
[L1] Salto: https://salto.gitlabpages.inria.fr
[L2] OCaml: https://ocaml.org
[L3] NomadicLabs: https://www.nomadic-labs.com

References:
[1] P. Cousot et al: “The ASTREÉ Analyzer,” ESOP 2005.
[2] P. Lermusiaux and B. Montagu, “Detection of uncaught exceptions in functional programs by abstract interpretation,” ESOP 2024.
[3] X. Leroy and F. Pessaux, “Type-based analysis of uncaught exceptions,” ACM TOPLAS, vol. 22 Issue 2, March 2000.

Please contact:
Pierre Lermusiaux, Inria, France
This email address is being protected from spambots. You need JavaScript enabled to view it.

Benoît Montagu, Inria, France
This email address is being protected from spambots. You need JavaScript enabled to view it.