Projects
Computer Vision
3D Object Localization using Finger Pointing Gestures
Advisor - Prof. Kristin Dana (Rutgers),
Feb 2009 to April 2009
The goal of the project was to obtain the 3D position of an object
from the 2D image of a human pointing at the object from a certain
distance. We performed Image Segmentation using Cascaded
Classifiers and Skin Color tone to obtain the position of the Eye
and the Pointing finger and thereby obtain the Line of Sight
vector. We search along this vector for a significant Object using
a Pixel Matching algorithm. Using a calibrated stereo camera
setup, we estimate pixel disparity and obtain the depth of the
Object. A Lego Robot is then given the task of planning a path to
the object given its 3D position.
You can find the report
here.
Object Recognition using Corner Descriptors
Advisor - Prof. Lawrence Rabiner
(Rutgers),
Oct 2008 to Dec 2008
Consider the task of recognizing an object given a complex 2D
scene. We first reduce the image to a set of important corner
points and connectivity matrix. The corners were chosen in such a
way so that a square has four corner points, a circle has equally
spaced corner points along its circumference. These corners along
with the connectivity matrix are then given attributes based on
their distance and angle with respect to neighbouring corners.
These attributes are then searched and matched across a database
that contains the object attributes alone without any background.
We find that the corner attributes are not completely accurate but
nevertheless produce reasonable results. The main advantage is the
reduced number of corner points used for matching when compared to
SIFT algorithm.
H.264 Video Compression for Mobile Video Applications
Advisor - Mr. Ajit Gupte (Texas
Instruments),
May 2008 to July 2008
We explore reference frame compression in video encoders using 2D
orthogonal transform based compression techniques. In a typical
video encoder loop, there is a reference frame compression block
which compresses the reconstructed frames before sending them to
the DDR SDRAM. The reconstructed data is split into compressed
reference data and the error data. The compressed reference data
is repeatedly fetched from the DDR during the motion estimation
process and the error data is required only during the motion
compensation process. Due to this a net saving in DDR bandwidth is
achieved. Also we know that the motion estimation stage uses lossy
data. To keep this loss at a minimum, we developed an efficient
compression technique using Hadamard Transforms. This ensures that
the energy content of the error data plane is small.
Implementation of FPGA based Object Tracking Algorithm
Advisor - Prof. N. Venkateswaran (SVCE,
India),
Jan 2008 to Apr 2008
Undergraduate Dissertation
In this project we use image processing algorithms for the purpose
of Object Recognition and Tracking and implement the same using an
FPGA. We take advantage of the parallelism, low cost, and low
power consumption offered by FPGAs (Spartan 3E). The individual
frames acquired from the target video are fed into the FPGA
offline. These are then subject to segmentation, thresholding and
filtering stages. Following this the object is tracked by
comparing the background frame and the processed updated frame
containing the new location of the target. The results of the FPGA
implementation in tracking a moving object were found to be
positive and suitable for object tracking.
Download the report
here.
Higher-Order Gabor Spectra, A Mathematical Model for Signal
Processing
Advisor - Prof. Nagarajan Venkateswaran
(WARFT),
Aug 2006 to May 2008
This work is part of the 2 year Research Training Program in
Signal Processing at the Waran Research Foundation, (
WARFT) in India.
You can find the wiki link
here. This
proposal describes a novel approach for computationally efficient
image and speech signal feature extraction using the Higher Order
Statistics of the Gabor Transform - Gabor Polyspectra. The
computation complexity of conventional Gabor transforms using FFT
is O(NlogN). To further reduce this, the Gabor coefficients are
obtained through the Arithmetic Fourier Transform (AFT), which has
a complexity of O(N) real multiplications. The Higher Order
Statistics obtained from available signal information are
transformed to a multidimensional space using the proposed Gabor
Transform and the feature vector consisting of a set of dominant
harmonics and associated Gabor phase components is extracted.
The primary purpose of the framework is formulate a database and
the associated Neural System required to model Vision Networks.
The proposed system takes advantage of the fact that the
operations of Human Vision Network closely resemble that of the
Gabor elementary functions.
You can find the detailed report
here.