ERCIM news 135
ERCIM news 135
ERCIM news 134
ERCIM news 134
ERCIM news 133
ERCIM news 133
ERCIM news 132
ERCIM news 132
ERCIM news 131
ERCIM news 131
ERCIM news 130
ERCIM news 130
Back Issues Online
Back Issues Online

by Lea Landucci, Stefano Baraldi and Nicola Torpei

The advent of new technology means we are continuously offered high-tech devices that should make our life more agreeable, safe and pleasant. Do they succeed? In his book "The Psychology of Everyday Things", Donald Norman, a leading cognitive psychologist, warns us about the technology paradox: innovation technology risks making our life more complex every day. The intention of this warning is to make us conscious of the importance of 'human-centred design', particularly when we talk about human-computer interaction.

The aim of Natural Human-Computer Interaction (NHCI) research is to create new interactive frameworks that integrate human language and behaviour into tech applications, focusing on the way we live, work, play and interact with each other. Such frameworks must be easy to use, intuitive, entertaining and non-intrusive. They must support interaction with computerized systems without the need for special external input-output equipment like mice, keyboards, remote control or data gloves. Instead, these will be replaced by hand gestures, speech, context awareness and body movement.

An interesting challenge for NHCI is to make such systems self-explanatory by working on their 'affordance' and introducing simple and intuitive interaction languages. Ultimately, according to M. Cohen, "Instead of making computer interfaces for people, it is of more fundamental value to make people interfaces for computers".

Figure 1: VIDIFACE interactive bookshop, a tabletop multi-user natural interaction system.
Figure 1: VIDIFACE interactive bookshop, a tabletop multi-user natural interaction system.

Our research team at the Media Integration and Communication Center (MICC) is working on natural interactive systems that exploit Computer Vision (CV) techniques. The main advantages of using visual input in this context are that visual information allows users to communicate with computerized equipment at a distance, without the need for physical contact with the equipment to be controlled. Compared to speech commands, hand gestures are advantageous in noisy environments, in situations where speech commands would be disturbed, and for communicating quantitative information and spatial relationships.

We have developed a framework called VIDIFACE to obtain hand gestures and perform analysis of smart objects (detection and tracking) in order to understand their position and movement in the interactive area. VIDIFACE exploits a monochrome camera equipped with a near-infrared band-pass filter that captures the interaction scene at thirty frames per second. By using infrared light, the computer vision is less sensitive to visible light and is therefore robust enough to be used in public spaces. A chain of image-processing operations is applied in order to remove noise, adapt to the background and find feature dimensions, optimized to obtain a real-time flow of execution.

What Kind of Interfaces?
Our early results concerned vertical screens showing multimedia contents. Users point at a content display in order to access information: one significant installation is the PointAt system at the Palazzo Medici Riccardi museum in Florence. It is considered to be a good vanguard experiment in museum didactics, and has been functioning successfully since 2004. With the PointAt system, users can view a replica of the Benozzo Gozzoli fresco 'Cavalcata dei Magi' in the palace chapel, and request information on the displayed figures simply by pointing at them. Information is provided to the user in audio.

More recently, we have focused our research on TableTop multi-user frameworks that exploit table-like horizontal screens. This decision was effected by the concept of 'affordance'. According to D. Norman, this is the ability of an object to suggest the correct way in which it should be used. A table is a natural place to share with other people and can also be used to hold other objects.

Multi-user systems often involve interfaces that are rich in interactive actions and multimedia content. Our early solution used richer interaction languages to develop CV algorithms able to recognize different gestures and hand postures. However, we observed that such a solution causes an increasing cognitive load for users, who were forced to learn different gestures before using the system. This went against the concept of a self-explanatory system.

PointAt, an interactive replica of Benozzo Gozzoli fresco 'Cavalcata dei Magi', installed in Palazzo Medici Riccardi Museum in Florence, Italy.
PointAt, an interactive replica of Benozzo Gozzoli fresco 'Cavalcata dei Magi', installed in Palazzo Medici Riccardi Museum in Florence, Italy.

During the last year we have worked on different solutions for developing new natural interaction frameworks that exploit a minimal set of natural object-related operations (eg zoom-in for 2D images, zoom and rotate for 3D objects, open and turn page for books, unroll for foulards and so on). An interesting result is our Interactive Bookshop, which was recently prototyped and will soon be installed in the bookshop of Palazzo Medici Riccardi in Florence to display artistic objects as digital replicas.

Another solution exploits Tangible User Interfaces (TUIs) to enrich interaction in the case of hinged interfaces. In our implementation of TUIs, users employ 'smart' physical objects in order to interact with Tabletop interfaces. Obviously the employment of such objects must be as intuitive as possible. The TANGerINE project, developed in collaboration with the Micrel Lab of the University of Bologna, exploits a 'smart cube': a wireless Bluetooth wooden object equipped with sensor node (tri-axial accelerometer), a vibra motor, an infrared LED matrix on every face and a microcontroller. We chose a cube once again because of its clear affordance: users intuitively consider the uppermost face 'active' (as if reading the face of a dice), thus conceiving the object as able to embody six different actions or roles.

Links:
http://www.micc.unifi.it
http://www.micc.unifi.it/vidiface
http://www.palazzo-medici.it/ita/sperimenta.htm
http://www.micc.unifi.it/interactive-bookshop
http://www.tangerineproject.org

Please contact:
Lea Landucci, Stefano Baraldi and Nicola Torpei
MICC Media Integration and Communication Center, University of Florence, Italy
E-mail: lea.landucci@gmail.com, stefano.baraldi@gmail.com, nicolatorpei@gmail.com

Next issue: January 2024
Special theme:
Large Language Models
Call for the next issue
Get the latest issue to your desktop
RSS Feed