Scaling Future Software: The Manycore Challenge

by Frank S. de Boer, Einar Broch Johnsen, Dave Clarke, Sophia Drossopoulou, Nobuko Yoshida and Tobias Wrigstad

Existing software cannot benefit from the revolutionary potential increases in computational power provided by manycore chips unless their design and code are polluted by an unprecedented amount of low-level, fine-grained concurrency detail. As a consequence, the advent of manycore chips threatens to make current main-stream programming approaches obsolete, and thereby, jeopardizes the benefits gained from the last 20 years of development in industrial software engineering. In this article we put forward an argument for a fundamental breakthrough in how parallelism and concurrency are integrated into the software of the future.

A chip processor wafer. Chip manufacturers are moving from single-processor chips to new architectures that utilise the same silicon real estate for a conglomerate of multiple independent processors known as multicores. Photo: Intel Corporation

A radical paradigm shift is currently taking place in the computer industry: chip manufacturers are moving from single-processor chips to new architectures that utilise the same silicon real estate for a conglomerate of multiple independent processors known as multicores. It is predicted that this development will continue and in the near future multicores will become manycores. These chips will feature an estimated one million cores. How will this hardware development affect the software? The dominant programming paradigm in the industry today is object-orientation. The use of objects to structure data and drive processing operations in software programs has proven to be a powerful concept for mastering the increasing complexity of software. As a consequence, in the last few decades, industry has invested heavily in object-oriented software engineering practices.

However, the current concurrency model commonly used for object-oriented programs in industry is multithreading. A thread is a sequential flow of control which processes data by invoking the operations of the objects storing the data. Multithreading is provided through small syntactic additions to the programming language which allow several such threads to run in parallel. Nevertheless, the development of efficient and precise concurrent programs for multicore processors is very demanding. Further, an inexperienced user may cause errors because different parallel threads can interfere with each other, simultaneously reading and writing the data of a single object and thus, undoing each other’s work. To control such interference, programmers have to use low-level synchronization mechanisms, such as locks or fences, that feature subtle and intricate semantics but whose use is error-prone. These mechanisms can be introduced to avoid interference but generate additional overhead that is caused by threads that need to wait for one another, and thus, cannot be run in parallel. This overhead can also occur because the data are distributed across different parts of the architecture (i.e., cache and memory). If the data access pattern used by the various threads does not match their distribution pattern, the program generates a large amount of overhead transferring data across processors, caches and memory.

To address these issues, increasingly advanced language extensions, concurrency libraries and program analysis techniques are currently being developed to explicitly control thread concurrency and synchronization. However, despite these advances in programming support, concurrency is still a difficult task. Only the most capable programmers can explicitly control concurrency and efficiently make use of the relatively small number of cores readily available today.

Thus, manycore processors require radically new software abstractions to coordinate interactions among the concurrent processes and between the processing and storage units. This task requires a fundamental breakthrough in how parallelism and concurrency are integrated into programming languages, substantiated by a complete inversion of the current canonical language design. By inverting design decisions, which have largely evolved in a sequential setting, new programming models can be developed that are suitable for mainstream concurrent programming and deployment onto parallel hardware. This could be achieved without imposing a heavy syntactic overhead on the programmer.

The authors of this article are the principal investigators of the three year EU Upscale project (From Inherent Concurrency to Massive Parallelism through Type-Based Optimizations) which started in March 2014. In this project we take as starting point of the inverted language design existing actor-based languages [1] and libraries (i.e., Akka Actor API; see Links section). In contrast to an object, an actor executes its own thread of control in which the provided operations are processed as requested, by the actors which run in parallel. These requests are processed according to a particular scheduling policy, e.g., in order of their arrival. Sending a request to execute a provided operation involves the asynchronous passing of a corresponding message. That is, the actor that sends this message continues the execution of its own thread. Both concurrency and the features which typically make concurrency easier to exploit, such as immutability, locality and asynchronous message passing, will be default behaviour of the actors. This inversion produces a programming language that can be easily analysed as properties which may potentially inhibit parallelism (e.g., synchronous communication and shared mutable state) must be explicitly declared.

The key feature of the Encore language that is currently under development is that everything will be designed to leverage deployment issues. Deployment is the mapping of computations to processors and the scheduling of such computations. The main rationale of the inverted language design is to support the automated analysis of the code in order that deployment-related information can be obtained. This information can then be used to facilitate optimisations by both the compiler and at run-time. These automated optimisations will alleviate the design of parallel applications for manycore architectures and thus will make the potential computing power of this hardware available to mainsteam developers.

Links:
Upscale project: http://www.upscale-project.eu
Akka Actor API: http://akka.io/docs/

Reference:
[1] Gul A. Agha: “ACTORS - a model of concurrent computation in distributed systems”, MIT Press series in artificial intelligence, MIT Press 1990, ISBN 978-0-262-01092-4, pp. I-IX, 1-144

Please contact:
Frank S. de Boer
CWI, The Netherlands
Tel: +31 20 5924139
E-mail: This email address is being protected from spambots. You need JavaScript enabled to view it.

Sidebar

Contents

Scaling Future Software: The Manycore Challenge