Wednesday, September 1


8:00 — 9:00


9:00 — 9:10


9:10 — 10:00


10:00 — 10:20


10:20 — 12:00

Oral Session — Tracking

Establishing correspondence in distributed cameras by observing humans

Roland Mörzinger and Marcus Thaler JOANNEUM RESEARCH Forschungsgesellschaft mbH

Correspondence between distributed cameras with overlapping views is needed for several tasks in surveillance and smart environments. This paper proposes an adaptive correspondence estimation technique by observing humans in a planar scene. In contrast to many other based approaches, it does not require prior information about corresponding features. The proposed techniques uses only results from a person detector and scene specific detection filtering for estimating the inter-image homography. The method is self-configurable, adaptive and provides robustness over time by exploiting temporal and geometric constraints. The correspondence is accurately estimated in spite of error sources such as missed detections, false detections and non overlapping fields of view. Results on a variety of datasets demonstrate the general applicability. Experiments show that the proposed correspondence estimation approach outperforms a common baseline approach.

Distributed Tracking in a Large-Scale Network of Smart Cameras

Honggab Kim and Marilyn Wolf Georgia Institute of Technology

In recent year, there has been increasing interest in the use of a distributed smart camera network for wide-area surveillance. The nature of distributed visual sensor networks gives a rise to new challenges such as frequent gaps between disjoint camera views, the limited communication capability, and massive data growth. Tracking objects in such a distributed visual sensor network requires a scalable distributed algorithm that can tackle all the challenges. This paper presents a tracking algorithm suitable for a distributed network of massive cameras spread over a wide area. In our approach, a distributed camera is more than a smart sensor that performs on-board image processing and transmits information to a server where tracking is actually performed; a distributed camera involves itself in tracking as a distributed computing component. Without a central processor or a powerful camera node that collects all the measurements over the network, each camera collaborates with a small number of neighbors. Sharing measurements with neighbors enables a nonoverlapping camera to secure a local measurement set that partially overlaps with that of its neighbors. Then, by using the measurement set and the local probabilistic transition model in its neighborhood, each camera independently estimates local paths. Locally estimated paths are put to a vote, and the wining local paths are combined into global paths. Our experiments with simulated data demonstrate that the proposed distributed tracking algorithm is accurate, scalable, and fast.

Distributed Target Tracking using Self Localizing Smart Camera Networks

Babak Shirmohammadi and Camillo Taylor University of Pennsylvania

This paper describes a novel decentralized target tracking scheme for distributed smart cameras. This approach is built on top of  a distributed localization protocol which allows the smart camera nodes to automatically identify neighboring sensors with overlapping fields of regard and establish a communication graph which reflects how the nodes will interact to fuse measurements in the network. The new protocol distributes the detection and tracking problems evenly throughout the network accounting for sensor handoffs in a seamless manner. The approach also distributes knowledge about the state of tracked objects amongst the nodes in the network. This information can then be harvested through distributed queries which allow network participants to subscribe to different kinds of events that they may be interested in. The proposed scheme has been used to track targets in real time using a collection of custom designed smart camera nodes. Results from these experiments are presented.

3D Target Tracking in Distributed Smart Camera Networks with In-Network Aggregation

Manish Kushwaha and Xenofon Koutsoukos Vanderbilt University

With the technology advancements in wireless sensor networks and embedded cameras, distributed smart camera networks are emerging for surveillance applications. Wireless networks, however, introduce bandwidth constraints that need to be considered. Existing approaches for target tracking typically utilize target handover mechanisms between cameras or combine results from 2D trackers into 3D target estimation. Such approaches suffer from scale selection, target rotation, and occlusion, drawbacks associated with 2D tracking. This paper presents an approach for tracking multiple targets in 3D space using a network of smart cameras. The approach employs multiview histograms to characterize targets in 3D space using color and texture as the visual features. The visual features from each camera, along with the target models are used in a probabilistic tracker to estimate the target state. One of the main innovations in the proposed tracker is in-network aggregation in order to reduce communication cost. The effectiveness of the proposed approach is demonstrates using a camera network deployed in a building.

Object Tracking on FPGA-based Smart Cameras using Local Oriented Energy and Phase Features

Ehsan Norouznezhad, Abbas Bigdeli, Adam C. Postula and Brian C. Lovell NICTA, The University of Queensland

This paper presents the use of local oriented energy and phase features for real-time object tracking on smart cameras. In our proposed system, local energy features are used as spatial feature set for representing the target region while the local phase information are used for estimating the motion pattern of the target region. The motion pattern information of the target region is used for displacement of search area. Local energy and phase features are extracted by filtering the incoming images with a bank of complex Gabor filters. The effectiveness of the chosen feature set is tested using a meanshift tracker. Our experiments show that the proposed system can significantly enhance the performance of the tracker in presence of photometric variations and geometric transformation. The real-time implementation of the system is also described in this paper. To achieve the desired performance, a hardware/software co-design approach is pursued. Apart from mean-shift vector calculation, the other blocks are implemented on hardware resources. The system was synthesized onto a Xilinx Virtex-5 XC5VSX50T using Xilinx ML506 development board and the implementation results are presented.

12:00 — 14:20


14:20 — 16:00

Oral Session — Object Detection and Recognition in Smart Cameras

Dictionary Learning based Object Detection and Counting in Traffic Scenes

Ravishankar Sivalingam, Guruprasad Somasundaram, Vassilios Morellas, Nikolaos Papanikolopoulos, Osama Lotfallah and Youngchoon Park University of Minnesota

The objective of object recognition alogrithms in computer vision is to quantify the presence or absence of a certain class of object, for e.g.: bicycles, cars, people, etc. which is highly useful in traffic estimation applications. Sparse signal models and dictionary learning techniques can be utilized to not only classify images as belonging to one class or another, but also to detect the case when two or more of these classes co-occur with the help of augmented dictionaries. We present results comparing the classification accuracy when different image classes occur together. Practical scenarios where such an approach can be applied include forms of intrusion detection i.e., where an object of class B should not co-occur with objects of class A. An example is when there are bicyclists riding on prohibited sidewalks, or a person is trespassing a hazardous area. Mixed class detection in terms of determining semantic content can be performed in a global manner on downscaled versions of images or thumbnails. However to accurately classify an image as belonging to one class or the other, we resort to higher resolution images and localized content examination. With the help of blob tracking we can use this classification method to count objects in traffic videos. The method of feature extraction illustrated in this paper is highly suited to images obtained in practical cases, which are usually poor quality and lack enough texture for the popular gradient based methods to produce adequate feature points. We demonstrate that by training different types of dictionaries appropriately, we can perform various tasks required for traffic monitoring.

Traffic Pattern Modeling and Prediction with Sensor Networks

Zaihong Shuai, Songhwai Oh and Ming-Hsuan Yang University of California at Merced

We propose a Bayesian framework for modeling and predicting trac patterns using information obtained from wireless sensor networks. For concreteness, we apply the proposed framework to a smart building application in which trac patterns of humans are modeled and predicted through detection and matching of their images taken from cameras at dierent locations. Experiments with more than 2,500 images of 20 subjects demonstrate promising results in trac pattern prediction using the proposed algorithm. The algorithm can also be applied to other applications including surveillance, trac monitoring, abnormality detection, and location-based services. In addition, the long-term deployment of the network can be used for security, energy conservation and utilization improvement of smart buildings.

Path Recovery of a Disappearing Target in a Large Network of Cameras

Amir Lev-Tov and Yael Moses IDC (the Inter-Disciplinary Center)

A large network of cameras is necessary for covering large areas in surveillance applications. In such systems, gaps between the fields of view of different cameras are often unavoidable. We present a method for path recovery of a single target in such a network of cameras. The solution is robust, efficient, and scalable with the network size. It is probably the first that can cope with hundreds of cameras and thousands of objects. The spatio-temporal topology of the network is assumed to be given, and object identities are computed by an available local tracker. Due to low video quality and limitations of the local tracker algorithm, possible confusion between the target and other objects is assumed. The suggested method overcomes this challenge using a new modified particle filtering framework that produces at each time step a small set of candidate solutions represented by states. Each state consists of an object location and identity. Since invisible locations are explicitly modeled by states, the detection of disappearing and reappearing targets is inherent in the algorithm. A second phase recovers the path using a dynamic programing algorithm on a layered graph that consists of the computed candidate states. A synthetic system with hundreds of cameras and thousands of moving objects is generated and used to demonstrate the efficiency and robustness of the method. The results depend, as expected, on the network topologies and the confusion level between objects. For challenging cases our method obtained good results.

Mutual Calibration of Camera Motes and RFIDs for People Localization and Identification

Rita Cucchiara, Michele Fornaciari, Andrea Prati and Paolo Santinelli DII - University of Modena and Reggio Emilia

Achieving both localization and identification of people in a wide open area using only cameras can be a challenging task, which requires competing conditions: high resolution for identification, whereas low resolution for having a wide coverage of the localization. Consequently, this paper proposes the joint use of cameras (only devoted to localization) and RFID sensors (devoted to identification) with the final objective of detecting and localizing intruders. To ground the observations on a common coordinate system, a calibration procedure is defined. This procedure only demands a training phase with a single person moving in the scene holding a RFID tag. Although preliminary, the results demonstrate that this calibration is sufficiently accurate to be applied whenever different scenarios, where area of overlap between the field of view (FoV) of a camera and the \dquote{field of sense} (FoS) of a (blind) sensor must be efficiently determined.

Distributed Object Recognition via Feature Unmixing

Jiajia Luo and Hairong Qi University of Tennessee, Knoxville

Performing multi-view object recognition in distributed camera networks is of great importance but also of great challenge since the scarce resource within the network prohibits large amount of data transfer. In this paper, we study the problem of feature-based distributed object recognition where redundancy in SIFT features across multiple views is explored without the requirement of any known statistics of the environment. We present a novel concept to interpret the SIFT features from spectral unmixing point of view where SIFT features from local views of an object are modeled as linear mixtures of a small set of signature vectors, referred to as the endmembers, with associated weight vectors satisfying two conditions, sum to one and nonnegative. We show, through empirical study, that this set of endmembers is unique and sufficient to recognize individual object and yet, the number of endmembers is much smaller than the number of SIFT feature points detected, thus dramatically saving network bandwidth. We perform this feature unmixing process in a two-layer scheme to realize distributed object recognition, where unmixing is first applied at individual camera nodes to extract “local endmembers” based on local views. Only these local endmembers need to be transferred to the base station for further processing. At the base station, the ensemble of the local endmembers is undergone another unmixing process to extract the so-called “global endmembers” for object recognition purpose. Experimental results show that the feature unmixing based distributed object recognition can achieve same level of recognition accuracy compared to the usage of the original set of SIFT features, but with much reduced data transmission.

16:00 — 16:20


16:20 — 18:00

Challenge Session — Mobile Computer Vision

18:30 — 21:00

Welcome Reception