Abstract |
Tracking of human body movements is an important problem in computer
vision with applications in visual surveillance and human-computer interaction.
Tracking of a single hand moving in space is addressed and a set of applications
in human-computer interaction are presented. In this approach, a disparity map
and motion fields extracted from a stereo camera set are modeled using a robust
estimation method. Then, the absolute position and orientation of the hand in
space are estimated and the central region of the hand is tracked over time. Virtual
drawing in space, a virtual marble game, and 3D object construction are shown as
the applications of the single hand tracking.
Algorithms are presented for tracking the hands and head of a person or several
interacting people viewed by a set of cameras in 3D. The problem is first defined as
a general multiple object tracking problem in a multiple sensor environment and a
two layered solution is proposed. The proposed solution includes a low-level particle
filtering layer to track individual targets in parallel, and a finite state machine to
analyze the interactions between the targets and apply application specific heuristics.
A set of activity recognition experiments in visual surveillance show the usefulness
of the system. The recognized activities involve interactions between the hands and
head of people and objects. A color analysis scheme and a technique for combining
information from different cameras are presented. They are used to detect carried
objects and exchanges between the hands.
|
Introduction |
One of the major subjects of research in the field of computer vision is understanding
the human movement. Extensive work has been performed on tracking
the entire body to estimate motion paths or body gestures. This is especially useful
in the area of visual surveillance to recognize activities. On the other hand, much research has focused on a single or a
few body limbs. The most important limbs are the hands and head. Most work
on visual analysis of the head focuses on modeling the face and its parts for face
recognition, but head gaze tracking is also an important problem that has received
considerable attention. [More...]
|
Visual Tracking of a Single Hand
|
Introduction
|
The human hand serves a dual purpose as a communication and manipulation
device. This part of the research is focused on employing a single hand as an interface device to
a computer. It presents applications that require accurate estimation of the position
and orientation of the hand in space with respect to a camera system. I describe
a real-time stereo system to estimate the position and orientation of the hand in
the camera and world coordinate systems and also track its spatial trajectory over
time. I also demonstrate the utility of our system in virtual and real spaces using
three applications:
1. A virtual drawing application, in which a user can write letters or draw on a
virtual plane in space.
2. A 3D model construction application, in which the user runs his hand along
the edges of a physical polyhedral object, and the system constructs a 3D
model of that object.
3. A 3D virtual marble game, in which the user controls the inclination of a
virtual plane through hand motions to manipulate the movement of a ball
through a maze.
The first two applications demonstrate the accuracy of the position and orientation
estimation algorithms, while the third demonstrates the real time capabilities
of the algorithms.
|
Approach
|
 |
The above figure shows the block diagram of the proposed system; Its main steps are as follows: [More...]
|
Details
|
 |
Further details of the approach including region of interest segmentation, parametric disparity map and motion field estimation,
estimating 3D palm position and orientation, tracking a reference point in 3D, and experimental results showing the accuracy of the method could be found in the publications. Meanwhile, you can see a few video clips in the following links.
|
Sample Hand Tracking (20.6MB) | Low-Textured Hand Tracking (16.8 MB) | Disparity and Motion Field Modeling (32.0MB)  |
Palm Segmentation (21.5MB) | First Joint Motion Tracking (10.3 MB)  |
Sample Applications |
1. Virtual Drawing in Space
|
 |
Employing the hand as a means for human-computer interaction has been explored extensively in the past few
years. Using the hand as a 3D mouse, a virtual gun, and a remote controller are just a few
examples. Communicating alphabets to a computer through hand movements is a powerful way for entering
information. Much research has been performed to interpret hand gestures as sign language alphabets, a
method useful mostly for people with disability. However, people typically input information through writing
natural language and typing at a keyboard if it is available. Using a keyboard requires a virtual visible keyboard
such that the user can move his/her hand to press a desired letter. However, writing the letter does not need such a
feedback. Moreover, other shapes than alphabets can be entered in the system in the same way.
In this approach, we employ parametric models for fitting disparity in stereo pairs and for tracking the hand
region in 3D for recognition of writing or drawing on a virtual board and communicating the drawn letters or shapes
to a computer. Our system does not require the user to maintain the hand in any particular pose (e.g., stretched
finger). It tracks the hand in the natural pose people typically use while writing with a pen. The underlying
idea is that when a person writes (especially using large fonts such as writing on a board), he/she usually keeps
his/her hand almost rigid and maintain the hand pose throughout the writing. As a result, the projection between a
particular point on the hand and the pen point is almost constant. Hence, we can track a fixed point (in 3D) on the
hand to determine what the person is writing or drawing. We describe a vision-based system for virtual drawing
in the air without pen and paper (or board). [More...]
|
Writing English Letters in Space (10.9MB) | Multi-Stroke Drawing in Space (25.2 MB)  |
2. 3D Model Construction
|
 |
In this application, a user moves the hand over the edges of a physical 3-D
object and the system tracks the hand as a means for measuring the dimensions
of the object and uses this information to render the object virtually. This system assumes that the user's hand is held rigid with respect to the edges
of the object and back of the hand remains visible throughout the scanning
process. The following figure shows sample frames of the system where a user traverses
three orthogonal sides of a box. Measurements performed in this example
demonstrate the accuracy of the proposed hand tracking method. [More...]
|
Tracking Edges of a Box (10.8MB)  |
3. Virtual Marble Game
|
 |
One of the applications of a hand tracking system is manipulating virtual objects.
These objects could not only eliminate the need for making expensive physical simulators but also introduce new flexibilities. Virtual marble, which resembles a physical toy marble game, is an example of such a virtual object.
In this game, the user tries to move a ball through hallways of a maze to reach a certain predefined location.
The user performs this by moving his/her hand thereby making a suitable ramp for the ball so that it moves due
to gravity. In a virtual marble game, the user rotates the hand while the system tracks the hand orientation and
simulates the marble tilts. The system also provides visual feedback of the virtual marble and the current position
of the ball so that the user adjusts the hand orientation based on that to navigate the virtual ball toward the desired
direction. [More...]
|
A Sample Virtual Marble Game (10.8MB)  |
Multiple Hand/Head Tracking using Multiple Cameras
|
Introduction
|
In this section, I address the problem of tracking the hands and head of a
person or multiple people interacting with each other in a scene viewed by a set
of cameras. We pose a multiple target tracking problem and propose a two-layer
solution consisting of a particle filtering layer and a finite state machine. Also,
I discuss the activity recognition problem for the set of activities involving heads
and hands of human subjects. Color analysis of the area surrounding the hands is
presented to determine whether a person holds an object or not. A new approach
is suggested to determine the reliability of each image and to combine the color
information extracted from different cameras.
|
Approach
|
The proposed tracking algorithm consists of two layers:
1. Low level layer: Here, a set of parallel particle filters are deployed. Each
particle filter tracks an individual target in the tracking space. In this layer, no
interaction is considered between these filters. The only relationship between
the filters is through the shared observation space where a set of common
observations are made.
2. High level layer: Here, the particle filters are assigned a state based on their
likelihood levels as well as their interactions. Each particle filter can be in
one of the following states: Uninitialized, unlabelled, normal, combined, or
lost. Assigning a state to each target enables handling situations where a few
targets join and separate or disappear and reappear in the scene. The identity
of the targets being tracked is also determined in this layer. Labeling the
targets in the scene is essential in activity recognition as addressed in this
section.
|
Details
|
This part of my research is based on bayesian target tracking using particle filtering. Different particle filtering methods and how they are used for multiple target tracking is discussed in my thesis. Also
a novel approach for Hand/Head tracking is presented. This approach includes the following steps:
1. Pre-processing Steps including Camera Calibration, Background Modeling and Skin-Colored Regions Segmentation.
|
 |
2. Image Observations and Addressing Accuracy Problem
|
 |
3. Computing 3D Candidate Points
|
 |
4. Prior Probability Estimation for 3D Candidate Points
|
 |
For more details, refer to my Ph.D. thesis. A few demo video clips can be seen in the following links. You need to have the appropriate CODEC to watch the videos.
|
Tracking Hand Clap (15.4MB) | Tracking Two-People Hand Shake (6.3 MB) | Particle Filters Tracking Hands and Head at a Desk (54.0MB)  |
Sample Application: Activity Recognition for Visual Surveillance |
 |
The proposed hand/head tracking system can be applied to visual surveillance
and human-computer interaction. In visual surveillance applications, we are usually interested in recognizing the type of activities happening in the scene. Even though
solving this problem in general is an extremely difficult problem, in my research, I showed how to use the information generated by the system to recognize and classify
a certain class of activities. The goal is to classify activities merely by knowing the
trajectory of the hand and head motions throughout the sequence, estimated using
our tracking method.
|
 |
There are two classes of activities we are interested in: the first class includes
activities which involve only the motions of the hands and/or the head and interactions
between them. Examples of these activities are clapping and hand shaking. The second class of activities involves carrying
objects with the hand. Examples are object exchanges between hands of a person
or two people, picking up an object from the scene, and placing an object in the
scene. For the activities of this class, we need
to detect at certain time instants during the act whether the person holds an object
or not. For more details, refer to my Ph.D. thesis.
|
Small Target Detection in Night-Time Videos Using Persistence Filter
|
 |
This research work proposes a reliable method for detecting
small targets in noisy and/or low-contrast videos. Examples
of this kind are night-time videos in low-lighting conditions.
Traditional background subtraction methods which rely on
the difference between the image and the background model
suffer from sensitivity to a set of difference thresholds which
result in either a high rate of false alarms or high rate of target
miss detection. The proposed method tracks objects in the scene
and models their persistence over time using a probabilistic
model called Persistence Filter. An adaptive detection
threshold is selected for each object based on the global
noise level of the scene as well as properties of that object
including area, contrast and speed. Experimental results
show the effectiveness of this algorithm especially in low-contrast
and noisy situations where classical background
subtraction methods fail.
I conducted this research work as a summer intern in the company Object Video.
|
Publications |
A. Sepehri, "Visual Tracking of Human Hand and Head Movements and Its Applications", Ph.D. Thesis, Department of Electrical and Computer Engineering, University of Maryland, February 2007.  |
A. Sepehri, Y. Yacoob, L.S. Davis, "3D Tracking of Human Hands and Head Using Multiple Cameras", Submission to Journal of Computer Vision and Image Understanding, 2007.  |
A. Sepehri, N. Haering, Y. Yacoob, L.S. Davis, "Small Target Detection in Night-Time Videos Using Persistence Filter", Submission to IEEE Conference on Computer Vision and Pattern Recognition, 2007.  |
A. Sepehri, Y. Yacoob, L.S. Davis, "Employing the Hand as an Interface Device Through Parametric Models", Journal of Multimedia, November/December 2006.  |
A. Sepehri, Y. Yacoob, L.S. Davis, "Parametric Hand Tracking for Recognition of Virtual Drawings", IEEE International Conference on Computer Vision Systems, New York, January 2006.  |
A. Sepehri, "Parametric Hand Tracking and Its Applications", Dissertation Proposal Submitted to the Faculty of the Graduate Program in the Department of Electrical and Computer Engineering, University of Maryland, October 2005.  |
A. Sepehri, Y. Yacoob, L.S. Davis, "Estimating 3D Hand Position and Orientation Using Stereo", Forth Indian Conference on Computer Vision, Graphics and Image Processing, India, December 2004.  |
A. Sepehri, M.H. Zand, "Robot Motion Planning using Spherical Modeling and Intelligent Search Methods", ICEE, Tehran, Iran, 1998. |
A. Sepehri, C. Lucas, "Evolution of a Fuzzy Rule Base for Pattern Recognition using Genetic Algorithms", Biocomputing Conference, Tehran, Iran, 1997. |
Presentation |
"Visual Tracking of Human Hand and Head Movements and Its Applications", Ph.D. Thesis Defense, Department of Electrical and Computer Engineering, University of Maryland, February 2007.  |
Back to Top |