by Leonel Sousa and Aleksandar Ilic (INESC-ID)
Researchers from INESC-ID, Instituto Superior Técnico, University of Lisbon proposed a set of fundamental Cache-aware Roofline models, which provide a simple and intuitive way of visually representing the limits of parallel processing on multi-core processors.
As computing systems evolve towards complex multi-core designs with deep and diverse memory hierarchies, improving the performance and optimising the execution of real-world applications become of fundamental importance. In high-performance computing environments, it is crucial to determine which hardware resources represent the main execution bottlenecks that limit the application performance, especially when deciding on the most adequate software optimisation technique to be applied.In this process, simple but insightful models are particularly useful, since they provide the means to quickly and easily assess the main characteristics of the architectures and the features of the applications.
To support this decision process, researchers from INESC-ID, Instituto Superior Técnico (IST), University of Lisbon, Aleksandar Ilic, and Leonel Sousa, together with Frederico Pratas, PhD from IST, now with Imagination Technologies, proposed a set of fundamental Cache-aware Roofline models [1,2], which provide a simple and intuitive way of visually representing the limits of parallel processing on contemporary multi-core processors. These Cache-aware Roofline models evaluate how key micro-architectural aspects, such as accessing different functional units or different memory hierarchy levels, affect realistically achievable upper-bounds for performance, power consumption and energy-efficiency on a given multi-core architecture.
In 2017, a team of Intel software developers (led by Zakhar Matveev, Roman Belenov and Philippe Thierry) successfully integrated the performance Cache-aware Roofline model as an official feature of Intel® Advisor, which is part of the Parallel Studio XE suite (Intel’s main application development framework) [3,4]. Within Intel® Advisor, the process of building the roofline plots and in-depth application characterisation are fully automated with respect to the hardware platform where the applications are executed. The support for a wide range of Intel devices is also provided, which covers all contemporary Intel CPU micro-architectures (from Nehalem to Skylake) up to massively parallel coprocessors (e.g., Intel Xeon Phi Knights Landing).
A brief overview of the Cache-aware Roofline in Intel® Advisor
The performance Cache-aware Roofline is plotted with the X axis as arithmetic intensity (measured in FLOPs/Byte) and the Y axis as the performance in GFLOPs/Second, both in logarithmic scale. Before collecting data from a specific application, the Intel® Advisor automatically runs a set of quick benchmarks to measure the hardware limitations of the used processor, which it then plots as lines on the chart, called roofs (see Figure 1). The horizontal lines represent the number of floating point computations (of a given type) that the underlying hardware can perform in a given span of time. The diagonal lines are representative of how many bytes of data a given memory hierarchy level can deliver per second.
Figure 1: Cache-aware Roofline in Intel® Advisor.
Each dot represents a loop or function in the program, and its position in the Roofline plot indicates performance and arithmetic intensity. The size and colour of the dots in Intel® Advisor’s Roofline chart indicate how much of the total program time a loop or function takes: small, green dots take up relatively little time, so are likely not worth optimising; large, red dots take up the most time, so they are the best candidates for optimisation, especially those with a large gap to the topmost attainable roofs. In general, the further a dot is from the topmost roofs, the more room for improvement there is. For example, the Scalar Add Peak represents the maximum possible performance without taking advantage of vectorisation, as indicated by the next roof up being the Vector Add Peak.
Where can I get Intel® Advisor with Cache-aware Roofline?
As stated in the Intel early access program: The Intel Advisor offers a great step forward in memory performance optimization with a new vivid Advisor “Roofline” bounds and bottlenecks analysis. Cache-aware Roofline is currently a feature of Intel® Advisor beginning officially with version 2017 Update 2, which is part of the Parallel Studio XE suite (Cluster Edition and Professional Edition) .
 A. Ilic, F. Pratas, and L. Sousa: “Cache-aware Roofline model: Upgrading the loft,” IEEE Computer Architecture Letters, vol. 13, n. 1, pp. 21-24, 2014.
 A. Ilic, F. Pratas, and L. Sousa: “Beyond the Roofline: Cache-aware Power and Energy-Efficiency Modeling for Multi-cores,” IEEE Transactions on Computers, vol. 66, n. 1, pp. 52-58, 2017.
 Intel. (2017) Intel® Advisor Roofline. http://tiny.cc/1i82ly
Leonel Sousa and Aleksandar Ilic, INESC-ID, Portugal