Special Theme

Image ERCIM News 95 cover page

ERCIM News 95
October 2013
Special theme: Image Understanding
Guest editors: Michal Haindl, Institute of Information Theory and Automation, Academy of Sciences of the Czech Republic, and Josef Kittler, University of Surrey, UK

This issue in pdf (56 pages)

Back Issues Online

Back Issues Online

Contents

Image Understanding - Introduction to the Special Theme

by Michal Haindl and Josef Kittler

Vision is the most important sense on which the majority of organisms depend for life. Scene reflectance properties in various spectral bands provide invaluable information about an object’s characteristics, including its shape, material, temperature, illumination and dynamism. This information, however, is very difficult to capture with an electronic device. A real visual scene to be captured is subject to variable illumination as well as variable observation conditions. Furthermore, single objects of interest can be partially occluded or shaded, may be positioned at various distances from the capturing device, data can be noisy and / or incomplete; thus successful interpretation of imaging sensor data requires sophisticated and complex analytical methods and computing power.

Boat Extraction in Harbours From High Resolution Satellite Images Using Marked Point Processes

by Paula Crăciun and Josiane Zerubia

Earth observation satellites represent a significant resource when it comes to acquiring data about the Earth. Satellite data is used in a range of fields, including environmental monitoring, map updating and meteorology. Since the launch of the first Earth observation satellite, the resolution of the optical sensors installed on board has greatly improved, thus, nowadays, panchromatic images can be acquired at a resolution equal or lower than 0.7 [m] (ie GeoEye, Pleiades). This makes it possible to recognize small objects, such as boats and cars.

FIM: Frustrated Total Internal Reflection Based Imaging for Biomedical Applications

by Benjamin Risse, Xiaoyi Jiang, and Christian Klämbt

Video-based imaging of animal behaviour is commonly used in biomedical studies. Imaging small and translucent organisms, such as worms or larvae, however, tends to require sophisticated illumination strategies. We developed a novel technique to image the contact surface between organisms and substrate utilizing Frustrated Total Internal Reflection. This technique has a wide range of potential applications.

AXES - Finding Video Clips Using Speech and Image Recognition

by Peggy van der Kreeft, Kay Macquarrie and Martijn Kleppe

Searching for clips or segments of videos in large archives can be a daunting task. In which clip was a person mentioned and where in the clip is he or she shown? Even after you locate the correct video, you still need to watch the entire video to find that one segment containing the person that you are looking for. The novel technologies being developed by AXES make finding what you are looking for in large archives and libraries significantly easier.

Random Mosaics for Network Extraction

by Marie-Colette van Lieshout

Arak and Surgailis [1] introduced a class of random mosaics with remarkable mathematical properties. A collaborative project between researchers in Poland and at CWI shows that such models are useful for intermediate level image analysis because they can capture global aspects of an image without requiring a detailed description of the objects within it.

Computer-Aided Leaf Recognition Visual System

by Tomáš Suk, Petr Novotný and Jan Flusser

Plant identification is an important task in botany and related areas, such as agriculture, forestry, and nature conservation. It is also of interest of general public. While botanists usually have no problem identifying a species, non-specialists would often welcome a computer-aided system for species recognition. Creating such a system is a challenge that we have resolved using visual pattern recognition methods.

Automatic Recognition of Human Activities in Realistic Videos

by Adrien Gaidon, Zaid Harchaoui and Cordelia Schmid

Automatic video understanding is a growing need for many applications in order to manage and exploit the enormous – and ever-increasing – volume of available video data. In particular, recognition of human activities is important, since videos are often about people doing something. Modelling and recognizing actions is as yet an unsolved issue. We have developed original methods that yield significant performance improvements by leveraging both the content and the spatio-temporal structure of videos.

Egovision4Health - Assessing Activities of Daily Living from a Wearable RGB-D Camera for In-Home Health Care Applications

by Grégory Rogez, Deva Ramanan and J. M. M. Montiel

Camera miniaturization and mobile computing now make it feasible to capture and process videos from body-worn cameras such as the Google Glass headset. This egocentric perspective is particularly well-suited to recognizing objects being handled or observed by the wearer, as well as analysing the gestures and tracking the activities of the wearer. Egovision4Health is a joint research project between the University of Zaragoza, Spain and the University of California, Irvine, USA. The objective of this three-year project, currently in its first year, is to investigate new egocentric computer vision techniques to automatically provide health professionals with an assessment of their patients’ ability to manipulate objects and perform daily activities.

Applying Random Matrix Theory Filters on SenseCam Images

by Na Li, Martin Crane, Cathal Gurrin and Heather J. Ruskin

Even though Microsoft’s SeneseCam can be effective as a memory-aid device, there exists a substantial challenge in effectively managing the vast amount of images that are maintained by this device. Deconstructing a sizeable collection of images into meaningful events for users represents a significant task. Such events may be identified by applying Random Matrix Theory (RMT) to a cross-correlation matrix C that has been constructed using SenseCam lifelog data streams. Overall, the RMT technique proves promising for major event detection in SenseCam images.

Multi-Modal Human Behaviour Analysis from Visual Data Sources

by Sergio Escalera Guerrero

The Human Pose Recovery and Behaviour Analysis group (HuPBA), University of Barcelona, is developing a line of research on multi-modal analysis of humans in visual data. The novel technology is being applied in several scenarios with high social impact, including sign language recognition, assisted technology and supported diagnosis for the elderly and people with mental/physical disabilities, fitness conditioning, and Human Computer Interaction.

Tracking the Articulated Motion of Human Hands in 3D

by Iason Oikonomidis, Nikolaos Kyriazis and Antonis A. Argyros

The FORTH 3D hand tracker recovers the articulated motion of human hands robustly, accurately and in real time (20Hz). This is achieved by employing a carefully designed model-based approach that capitalizes on a powerful optimization framework, GPU processing and the visual information provided by the Kinect sensor.

KAD - An Intelligent System for Categorizing and Assessing the State of Patients with Multiple Sclerosis

by Spiros Fotopoulos and Dimitrios Kastaniotis

Neurological disorders can be reliably assessed using the low-cost Microsoft Kinect depth sensor to record human gait, coupled with our Kinect Assessment Disorders (KAD) system to process the information.

GAIMS: A Reliable Non-Intrusive Gait Measuring System

by Sébastien Piérard, Samir Azrour, Rémy Phan-Ba and Marc Van Droogenbroeck

Gait observation and analysis can provide invaluable information about an individual [1]. Studies that have interpreted gait using traditional imaging devices have demonstrated that it is difficult to make reliable measurements with colour cameras. GAIMS, our new system resulting from a multidisciplinary project born from collaboration between engineers and neurologists, aims at developing non-intrusive and reliable tools to provide quantitative measures of gait and interpretations of the acquired data. Following a current trend in imaging, it takes advantage of imaging sensors that measure distance instead of colour. While its principles are general, GAIMS is currently used for the diagnosis of multiple sclerosis (MS) and the continued evaluation of disease progression [2]. It is the first available system to fully satisfy the clinical routine and its associated constraints.

Mixed Reality by Understanding and Integrating Spatio-Temporal Data of a LIDAR and a 4D Studio

by Csaba Benedek, Zsolt Jankó, Dmitry Chetverikov and Tamás Szirányi

Two labs of SZTAKI have jointly developed a system for creation and visualization of mixed reality by combining the spatio-temporal model of a real outdoor environment with the models of people acting in a studio. We use a LIDAR sensor to measure an outdoor scene with walking pedestrians, detect and track them, then reconstruct the static part of the scene. The scene is then modified and populated by human avatars created in a 4D reconstruction studio.

Visual 3D Environment Reconstruction for Autonomous Vehicles

by Thomas Kadiofsky, Robert Rößler and Christian Zinner

In the foreseeable future it will be commonplace for various land vehicles to be equipped with 3D sensors and systems that reconstruct the surrounding area in 3D. This technology can be used as part of an advanced driver assistance system (ADAS) for semi-autonomous operation (auto-pilot), or for fully autonomous operation, depending on the level of technological maturity and legal regulations. Existing robotic systems are mostly equipped with active 3D sensors such as laser scanning devices or time-of-flight (TOF) sensors. 3D sensors based on stereo cameras cost less and work well even in bright ambient light, but the 3D reconstruction process is more complex. We present recent results from our visual 3D reconstruction and mapping system based on stereo vision, which has been developed within the scope of several research projects.

Automatic MRI Brain Tissue Classification

by Loredana Murino, Umberto Amato and Bruno Alfano

Involvement and morphological changes of brain structures both in aging processes and in neurodegenerative diseases can be analysed using Magnetic Resonance imaging. Our aim is to automate the procedure through supervised brain tissue classification.

Connected Morphological Operators for Tensor Images

by Jos Roerdink

The processing, analysis, and visualization of tensor images has become very important in many application domains, such as brain imaging and seismology. In a tensor image the value at each pixel is not just a scalar (as in a grey scale image), but a matrix or tensor, hence the name. In the project COMOTI – Connected Morphological Operators for Tensor Images, funded by the Dutch National Science Foundation (NWO), we address the development of techniques for morphological filtering and visualization of tensor fields. Potentially, this could lead to new tools for the analysis of brain connectivity and diagnosis of connectivity-related disorders.

Exploiting Computational Models of the Human Visual System

by Franco Alberto Cardillo, Giuseppe Amato and Richard Connor

Biological models of the human visual system can be exploited to improve the current state of the art in Content-Based Image Retrieval (CBIR) systems.

Person Re-identification

by Slawomir Bak and François Bremond

A retrieval tool that helps a human operator browse a network of cameras is being developed at Inria Sophia Antipolis. This tool addresses the problem of person re-identification: determining whether a particular individual has already appeared over a network of cameras.

Large Scale Image Retrieval Using Vectors of Locally Aggregated Descriptors

by Giuseppe Amato, Paolo Bolettieri, Fabrizio Falchi and Claudio Gennaro

We propose using vectors of locally aggregated descriptors (VLAD) to address the problem of image search on a very large scale. We expect that this technique will overcome the quantization error problem faced in Bag-of-Words (BoW) representations.

Graph Based Keyword Spotting in Handwritten Historical Slavic Documents

by Kaspar Riesen and Darko Brodic

Many libraries globally have started digitizing their most valuable old handwritings in order to preserve the world's cultural heritage. To improve the accessibility of the large number of available handwritten document images they must be made amenable to searching and browsing. A recent research project aims at a novel graph based keyword spotting framework applicable to historical documents. For testing the novel framework, isolated word images from the Miroslav Gospels (one of the oldest surviving documents written in old church Slavonic) will be represented by graphs.

Highly Degraded Recto-verso Document Image Processing and Understanding

by Emanuele Salerno and Anna Tonazzini

The ITACA project (Innovative tools for cultural heritage archiving and restoration) is investigating new approaches to treat severe back-to-front interference in digital images of two-sided documents. This work is part of a vast research program on the study and preservation of historical documents, which, since 2004, has been supported in various forms by European funds.