by Lyes Khacef (University of Groningen), Laurent Rodriguez and Benoît Miramond (Université Côte d’Azur, CNRS)
Local plasticity mechanisms enable our brains to self-organize, both in structure and function, in order to adapt to the environment. This unique property is the inspiration for this study: we propose a brain-inspired computational model for self-organization, then discuss its impact on the classification accuracy and the energy-efficiency of an unsupervised multimodal association task.
Our brain-inspired computing approach attempts to simultaneously reconsider AI and von Neumann's architecture. Both are formidable tools responsible for digital and societal revolutions, but also intellectual bottlenecks linked to the ever-present desire to ensure the system is under control. The brain remains our only reference in terms of intelligence: we are still learning about its functioning, but it seems to be built on a very different paradigm in which its developmental autonomy gives it an efficiency that we haven’t yet attained in computing.
Our research focuses on the cortical plasticity that is the fundamental mechanism enabling the self-organization of the brain, which in turn leads to the emergence of consistent representations of the world. According to the neurobiologist F. Varela, self-organization can be defined as a global behaviour emerging from local and dynamic interactions, i.e., unifying structure and function in a single process: the plasticity mechanism. It is hence the key to our ability to build our representation of the environment based on our experiences, so that we may adapt to it. It is also the basis of an extremely interesting characteristic of the human brain: multimodal association.
In fact, most processes and phenomena in the natural environment are expressed under different physical guises, which we refer to as different modalities. Multimodality is considered a fundamental principle for the development of embodied intelligence, as pointed out by the neuroscientist A. Damasio, who proposed the Convergence-Divergence Zone framework . Such a framework models the neural mechanisms of memorisation and recollection. Despite the diversity of the sensory modalities, such as sight, sound and touch, the brain arrives at similar representations and concepts (convergence). On the other hand, biological observations show that one modality can activate the internal representation of another modality. For example, when watching a specific lip movement without any sound, the activity pattern induced in the early visual cortices activates in early auditory cortices the representation of the sound that usually accompanies the lip movement (divergence).
Here we summarise our work on the Reentrant Self-Organizing Map (ReSOM) , a brain-inspired computational neural system based on the reentry theory from G. Edelman  and J.P. Changeux using Kohonen Self-Organizing Maps (SOMs) and Hebbian-like learning to perform multimodal association (see Figure 1).
Figure 1: Reentrant Self-Organizing Map: (left) Processing pipeline from data acquisition at input to multimodal association for decision making at the output with unimodal and multimodal accuracies for a hand gestures recognition task based on a DVS camera and EMG sensor; (right) FPGA-based neuromorphic implementation of the proposed self-organizing artificial neural network on multiple SCALP boards for real-time and energy-efficient processing.
The brain’s plasticity can be divided into two distinct forms: (i) structural plasticity, which, according to the Selection Neural Groups Theory , changes the neurons’ connectivity by sprouting (creating) or pruning (deleting) synaptic connections, and (ii) synaptic plasticity that modifies (increases or decreases) the existing synaptic strength. We explore both mechanisms for multimodal association through Hebbian-like learning. In the resulting network, the excitement of one part spreads to all the others and a fragment of memory is enough to awaken the entire memorised experience. The network becomes both a detector and a producer of signals.
First, the unimodal learning is performed independently for each modality using the SOM, a brain-inspired artificial neural network that learns in an unsupervised manner (without labels). Then, based on co-occurrent multimodal inputs, the neurons of different SOMs create and reinforce the reentrant multimodal association via sprouting and Hebbian-like learning. At the end of the multimodal binding, the neural group selection is made, and each neuron prunes up to 90% of the possible connections to keep only the strongest ones. The third step is then to give sense to these self-associating groups of neurons. This is made by labelling one of the SOMs maps using very few labels (typically 1%), so that each neuron is assigned the class it represents. The fourth step is to label the entire network (the other maps) by using the divergent activity from the first labelled map. This way, the system breaks with the general principle of classical machine learning by exploiting the strength of the multimodal association and takes advantage of the coherence of the data from its experience to build in an incremental way a robust representation of the environment. From an application point of view, this means that the system only needs few annotations from a single modality to label the maps of all the other modalities. Finally, once the multimodal learning is done and all neurons from all SOMs are labelled, the system computes the convergence of the information from all the modalities to achieve a better representation of the multi-sensory input. This global behaviour emerges from local interactions among connected neurons.
Results and discussion
Our experiments  show that the divergence labelling leads to approximately the same unimodal accuracy as when using labels, while the convergence mechanism leads to a gain in the multimodal accuracy of +8.03% for a written/spoken digits database [L1] and +5.67% for a DVS/EMG hand gestures database [L2]. We also gained +5.75% when associating visual hand gestures with spoken digits, illustrating the McGurk effect. Indeed, studies in cognitive and developmental psychology show that spoken labels and auditory modality in general add complementary information that improves object categorisation.
In summary, the ReSOM model exploits the natural complementarity between different modalities so that they complete each other and improve multimodal classification. Furthermore, it induces a form of hardware plasticity where the system’s topology is not fixed by the user but learned along the system’s experience through self-organization. It reduces the inter-map communication and thus reduces the system’s energy consumption. This result could open up a whole lot of new directions, inspired by the brain’s plasticity, for future designs and implementations of self-organizing hardware architectures in autonomous systems such as vehicles, robots, drones or even cortical prosthesis.
 Damasio: “Time-locked multiregional retroactivation: A systems level proposal for the neural substrates of recall and recognition”.
 Khacef et al.: “Brain-Inspired Self-Organization with Cellular Neuromorphic Computing for Multimodal Unsupervised Learning”.
 Edelman: “Group selection and phasic reentrant signaling a theory of higher brain function”.