by Stephan Veigl, Hartwig Fronthaler and Bernhard Strobl

Exciting perspectives are emerging in the field of visual surveillance. Due to the rapidly growing number of cameras and volume of video data, there is an increasing need for a method that enables quick pinpointing of significant data within the “sea” of irrelevance.

Today’s visual surveillance systems reduce the input data to a great extent by simple motion detection; nevertheless, the resulting amount of data produced by such systems remains unmanageable. Therefore, further automated means for its analysis are required. For this purpose we resort to state of-the-art visual surveillance algorithms for object detection, tracking and activity analysis. We developed a surveillance data analysis framework which is capable of efficiently assisting in data exploration from distributed video archives with different data formats and which allows rule and/or content-based object and event searches in large surveillance datasets.

With our system it is possible to seek a distinctive object in a huge archive of videos (multiple cameras, 24 hours). A typical use case is one or more surveillance cameras watching the entrance of a parking lot. We can for example, establish a rule to detect all cars entering or exiting the parking lot, but to ignore all vehicles just passing by. The system will detect all objects in the video and filter the results according to user defined rules. It will present a separate list for every rule with the matching objects. Additionally, we can search for a given sample image, resulting in a list, which is sorted by the according match score (see Figure 1).

Figure 1: Detection results of user defined rule.
Figure 1: Detection results of user defined rule.

System Architecture
The design of our distributed high-performance archive search system defines the following components (see Figure 2):

  • Multiple Video Archives / Camera Metadata Databases
  • Analytics Core
  • Configuration and Detection Database
  • Several GUI Clients.

Figure 2: System overview.
Figure 2: System overview.

Each of the above services is supposed to run on a dedicated machine, optimized for the respective task. However, for demonstration purposes, it is also possible to run the whole system on a single computer.

The analytics core (see Figure 2) is a three stage system following a modular programming paradigm:
1. detection and tracking modules
2. filtering modules
3. matching modules.

At the moment we have implemented a blob tracking module (Moving Objects) and a person detection module in the first stage. As filtering module we use an Event Engine module as core of our rule based approach. A generic appearance-based matching module is used to sort the results in the matching stage. The above-mentioned modules are detailed in the following:

Blob Tracker (Moving Objects): This module does not specialize in any particular type of object, but rather detects every moving region (blob) in a scene. For this purpose, we employ a robust background model, which bases its decisions (foreground or background) on a compressed history of each pixel, referred to as codebook. For every new frame, each pixel is compared with its corresponding codebook and classified into either a foreground or a background pixel. This decision is based on the distribution of the previously observed pixel values.

Person Detector: Blob-based object detection suffers from sudden performance decay if the density of image objects becomes high and frequent dynamic occlusions are present. To overcome this problem, we have developed a human detection framework incorporating Bayesian statistics, which is capable of detecting and coarsely segmenting humans in moderately crowded scenes in real-time. Shape and motion cues are combined to obtain a maximum a posteriori solution for human configurations consisting of many - possibly occluded - pedestrians viewed by a stationary camera.

High parallelization of the computations of both the blob tracker and the person detector enables an efficient implementation on graphics hardware. This yields real-time performance for the person detector and more than 100x real-time in case of blob tracking.

Event Engine: This is the central filtering module. The detected objects can be filtered by a highly flexible combination of user configurable events (eg crossing a tripwire or entering an area) and properties (eg object height or width). All rules on one video are processed in parallel. So, for instance, it is possible to statistically analyse the whole traffic-flow in a roundabout, or similar, using a single-pass video processing technique.

Generic Appearance-Based Matcher: An appearance-based matching functionality allows the user to search for an object with specific appearance attributes. The search is initiated by specifying a query image, which can be provided either by selecting an image region in a video or uploading a bitmap. An object descriptor is computed based on the covariance matrix of the image feature vector. This target descriptor is compared to the descriptors of found objects producing a ranking of the objects.

In the future, we plan to augment the analytics core (see Figure 2) with a face and a license plate detector together with the according matching modules.

Please contact:
Stephan Veigl
AIT Austrian Institute of Technology GmbH
Tel: +43 50550-4270
E-mail: This email address is being protected from spambots. You need JavaScript enabled to view it.

{jcomments on}
Next issue: January 2019
Special theme:
Transparency in Algorithmic Decision Making
Call for the next issue
Get the latest issue to your desktop
RSS Feed