by Manuel Le Gallo, Abu Sebastian and Evangelos Eleftheriou (IBM Research Zurich)
There is a pressing need for energy-efficient non-von Neumann computing systems for highly data-centric artificial intelligence related applications. We have developed an approach that efficiently performs a wide range of machine learning tasks such as compressed sensing, unsupervised learning, solving systems of linear equations, and deep learning.
We are at the pinnacle of a revolution in artificial intelligence (AI) and cognitive computing. The computing systems that run today’s AI algorithms are based on the von Neumann architecture which is inefficient at the task of shuttling huge amounts of data back and forth at high speeds. Thus, it is becoming increasingly clear that to build efficient cognitive computers, we need to transition to novel architectures where memory and processing are better collocated. In a first level of inspiration, the idea would be to build computing units where memory and processing co-exist in some form. In-memory computing is one such approach where the physical attributes and state dynamics of memory devices are exploited to perform certain computational tasks in place with very high areal and energy efficiency.
In a conventional computer, the processing and memory units are physically separated. Consequently, a significant amount of data need to be shuttled back and forth during computation, which creates a performance bottleneck commonly referred to as the “von Neumann Bottleneck” (see Figure 1). This physical separation and associated data transfers are arguably one of the main hurdles of traditional computers, as a memory access typically consumes 100 to 1000 times more energy than a processor operation. In in-memory computing, computation is performed in place by exploiting the physical attributes of memory devices organised as a “computational memory” unit. For example, if data A is stored in a computational memory unit and if we would like to perform f(A), then A is not required to be brought to the processing unit (see Figure 1). This is more energy and time efficient than the process performed by a conventional computing system.
Figure 1: Comparison between a conventional computing architecture (left) and in-memory computing (right). Adapted from .
However, there are significant challenges that need to be overcome to enable the practical use of in-memory computing for AI algorithms. It is only possible to perform a limited set of operations in the memory units, as opposed to the general-purpose processors that are used in conventional computing systems which can perform any type of computation. Moreover, in-memory computing can only offer limited precision due to the analog nature of the operations performed in the memory units, as opposed to the usual digital computing which can offer arbitrarily high precision (typically 64-bit in conventional computers). These aspects imply a radical rethink in the way an algorithm needs to be designed to solve a certain problem efficiently using in-memory computing.
In the last few years, we have been working towards overcoming those challenges by designing algorithms that can take advantage of in-memory computing hardware to efficiently solve AI related tasks. We have developed prototype in-memory computing hardware based on phase-change memory comprising one million memory cells and aimed at understanding what type of algorithm can be implemented with the set of operations and precision of computation that are achievable with this chip. With this hardware, we were able to successfully demonstrate compression and reconstruction of images , unsupervised learning of temporal correlations , and solving systems of linear equations with arbitrarily high accuracy . In each of these three applications, we estimate that we achieve energy savings of at least one order of magnitude with respect to using a conventional computer for the same tasks.
One additional application that has been of utmost interest in recent years is the training of neural networks. Deep artificial neural networks have shown remarkable human-like performance in tasks such as image processing and voice recognition. Deep neural networks are loosely inspired by biological neural networks. Parallel processing units called neurons are interconnected by plastic synapses. By tuning the weights of these interconnections, these networks are able to solve certain problems remarkably well. However, because of the need to repeatedly show very large datasets to very large neural networks, it can take multiple days or weeks to train state-of-the-art networks on conventional computing systems. In-memory computing could greatly accelerate the training of neural networks by eliminating the need to move the weight data back and forth between memory and processor.
The key idea in our approach is to encode the synaptic weights as the conductance values of phase-change memory devices organised in a computational memory unit and use it to perform the forward and backward propagation, while the weight changes are accumulated in high precision (see Figure 2). This mixed-precision approach enables training the network to reach high classification accuracy, while performing the bulk of the computation as in-memory computing. Figure 2 shows a simulation result of training a multi-layer perceptron to recognise handwritten digits. The accuracy achieved with our approach is less than one percent lower than that obtained using a conventional computer. Most importantly, the trained weights will be retained in the computational memory for many months or even years without the need to supply any power, thanks to the non-volatility of the phase-change memory devices. A chip trained in this way can be used for inference tasks within sensor devices at a fraction (<1%) of the power that would be used in a conventional computer. More details can be found in our recent paper presented at the ISCAS 2018 conference .
Figure 2: Neural network training algorithm using in-memory computing (top), and simulation results of training a multi-layer perceptron to recognise handwritten digits (bottom). Adapted from .
 M. Le Gallo et al.: “Compressed sensing recovery using computational memory”, in Proc. of the IEEE International Electron Devices Meeting (IEDM) 2017.
 A. Sebastian et al.: “Temporal correlation detection using computational phase-change memory”, Nature Communications 8, 1115, 2017.
 M. Le Gallo et al.: “Mixed-precision in-memory computing”, Nature Electronics 1, 246-253, 2018.
 S.R. Nandakumar et al.: “Mixed-precision architecture based on computational memory for training deep neural networks”, in Proc. of the IEEE International Symposium on Circuits and Systems (ISCAS), 1-5, 2018.
Manuel Le Gallo,
IBM Research – Zurich, Switzerland