The evolution of electronic technology and the growing presence of computer science in the music field have greatly transformed ways of 'making music', involving various aspects from creation to production and performance, and leading to the appearance of new artistic forms. The audio functionalities are often closely intertwined with the world of graphics, video, performance, virtual reality and telecommunications, creating artistic and cultural multimedia products. This means that an efficient data-processing system will play an essential role in ensuring that all the operations planned during the conceptualization and design of a performance can be realized rapidly and smoothly, allowing the performance to take place in real time with a high level of interactivity.
Researchers of the ISTI computerART Lab and the DSP Audio team [1,2,3] have focused on developing systems that detect real-time features from body actions during interactive artistic multimedia performances. Two relevant examples are the 'Palm Driver' (in which the movement of a player's hands controls real-time synthesized music via an infrared interface), and the 'PAGe' (in which the movement of a painter's hands in front of a video camera produces a painting on a virtual canvas projected by a video-beam). Other recent methods for feature extraction from audio signals have also been proposed in the framework of the MUSCLE-NoE EU Project  (see E-Team7: Semantic from Audio and Genre Classification for Music).
The Pandora system tracks the audio parameters of a live musical performance in order to control the video effects, following a pre-designed storyboard for a movie, a 3D sequence or other video content. The project was proposed by the musician Enrico Cerretti , and the video effects have been developed jointly with Infobyte SPA (Rome, ). Pandora involves monitoring the performer (actor), and eventually a video operator (director) who can also modify the execution flow, thus setting up bidirectional feedback between the two, ie between music and video. The sequence of main system computations is represented in Figure 2.
Music (or other sounds) produced by the performer are acquired by microphones; the relative analogue signals are sent to an audio interface and processed in a typical Windows platform in order to dynamically compute the parameters of interest energy and fundamental frequency. The amplitude can be measured by means of an envelope follower detector from which the true value of the effective energy of the signal will subsequently be obtained. Detection and tracking of the fundamental frequency, our second parameter of interest, is a well-known and non-trivial problem in the literature and many methods have been proposed to tackle it. The algorithm we used works in the time domain and implements the 'Average Magnitude Difference Function' (AMDF), better known as the 'fast autocorrelation function', which exploits sums and differences of signal samples rather than products.
The association of sound parameters for controlling 3D/2D video sequences is usually determined during the planning phase of the performance by a special Multimedia Editor. The values extracted are used directly or by applying suitable mapping. Once the system had been implemented, various applications were developed to test its functionalities and performance. The experiments have confirmed the correct tracking/extraction of sound parameters produced by traditional instruments (clarinet, flute), in term of low latency and accuracy. Users can easily link these parameters to video effects in various ways (3D shape transformations, colours, shading etc).
The system is not only suitable for interactive artistic multimedia performances but also for other non-artistic applications such as multimedia authoring, company presentations and musical rehabilitation therapy. In the future, the system will be tested with a variety of instruments in order to tune the appropriate settings of the algorithm for a wide range of applications.
Graziano Bertini, ISTI-CNR
Leonello Tarabella, ISTI-CNR