by Furqan M. Khan and Francois Bremond (Inria)

Computers now excel at face recognition under severely constrained environments; however, most of the surveillance networks capture un-constrained data in which person (re)identification is a challenging task. The STARS team at INRIA is making considerable progress towards solving the person re-identification problem in a traditional visual surveillance setup.

Imagine a surveillance system operator wants to quickly examine the recent activities of a particular individual, perhaps in an airport, a grocery store or an amusement park. To do so, the operator has to manually re-identify (locate) the person in different camera feeds, which may take significant time if crowd density is high and there are many cameras. As a part of the EU project CENTAUR, the STARS team at INRIA Sophia Antipolis is developing technology that reduces operator workload in re-identifying the concerned individual over the network of cameras.

Person re-identification is a challenging task because individuals move in all directions, including away from the camera and across the field of view (see Figure 1). Therefore, biometric cues such as face or iris cannot be reliably extracted. Instead, holistic appearance of the person (clothing) or gait is used, which is inherently not as discriminative. Furthermore, due to low resolution, subtleties in gait are difficult to measure. In addition, a person’s appearance in a video is susceptible to illumination, occlusion, camera properties and viewing angle. Finally, for a fully automated system, individuals must be localised using a person detection and tracking algorithm before building their appearance models. Existing detection and tracking algorithms are imperfect and induce noise in appearance models and hence affect model matching.

Figure 1: Three groups of people captured from two different cameras in a surveillance network. Persons are often facing away from the camera and their appearance changes from one camera to another due to illumination.
Figure 1: Three groups of people captured from two different cameras in a surveillance network. Persons are often facing away from the camera and their appearance changes from one camera to another due to illumination.

A semi-automated system was developed by the STARS group last year, which shows a small list of candidate matches to the operator for browsing. However, owing to the challenges, the difference in desired and achieved performance of the underlying algorithm was large. This has now been significantly reduced by improving multiple aspects of the re-identification algorithm.

The main catalysts for improvement are the novel modelling of person’s appearance as a set of parametric probability densities of low-level features over different body regions and the bi-directional matching of appearance models. Combined with recently proposed low level features and metric learning techniques, the algorithm achieves significant advances in both precision and recall of retrieval. The algorithm has been benchmark against a number of competing methods on multiple publicly available datasets, where it comprehensively outperformed all other methods [1]. The improved algorithm is transferred to a local startup which is developing an end-to-end re-identification tool for the retail domain.

Another aspect of this work deals with reduction in dependency on manual annotations for metric learning. More so than object detection, fully supervised metric learning is not scalable in real-world applications due to re-training requirements. Therefore, a novel strategy to automatically label data is employed to learn metric in an unsupervised manner without considerable degradation in performance [2].

Advancements in the re-identification algorithm were made possible by support received under European Project CENTAUR. Going forward, the group plans to further improve performance of the re-identification algorithm and benchmark its performance on larger datasets with noisy inputs. We are also interested in investigating how deep networks can be employed successfully in this domain when labelled training data is limited.


[1] F. M. Khan, F. Bremond: “Person Re-identification for Real-world Surveillance Systems”, ArXiv 2016.
[2] F. M. Khan, F. Bremond: “Unsupervised data association for Metric Learning in the context of Multi-shot Person Re-identification”, AVSS 2016.

Please contact:
Furqan M Khan, Inria, France
+33 4 9238 7634
This email address is being protected from spambots. You need JavaScript enabled to view it.

Next issue
April 2018
Next special theme:
Autonomous Vehicles
Call for the next issue
Image ERCIM News 108 epub
This issue in ePub format

Get the latest issue to your desktop
RSS Feed