Sunday 3 August 2014

An Appearance Based Techinque for Detecting Obstacles

Majority of the upcoming smart cars are employing safety techniques that detect pedestrians, obstacles, vehicles in the path. For example, the car detection algorithm employed by Vicomtech (Video Link) tracks cars in front and calculates collision time. Other than this, an obstacle detection system can be incorporated in smartphones which can be used by the blind for safely navigating an indoor environment.

During my recent internship at Soliton Technologies, Bangalore, known for its work on smart cameras and factory automation using vision systems, I worked on developing a system that can be used for detecting obstacles in a video input from a monocular camera and for segmenting walk-able floor region. The technique may be used for assisting the blind or for moving robots indoors.

We majorly concentrated on using color cues for finding the floor region. Once the floor region is detected, we simply highlight all other regions in an image as obstacles. Our technique was inspired from the research work of Iwan Ulrich and Illah Nourbakhsh, in their paper “Appearance-Based Obstacle Detection with Monocular Color Vision” (Paper Link).

We begin by assuming the bottom most region of an image to contain the floor region as shown. The region inside the red rectangle is taken as reference region that contains the floor.

Figure 1: Marked rectangle (in red) is assumed to contain floor region.
In the next step, we calculate the histogram of this region, and for every pixel in the entire image, we find the probability of that pixel being similar to pixels in the red rectangle region (based on naive bayes theorem). Interested viewers can look into the "backprojection" method implemented in OpenCV. 

Finally, All pixels dissimilar in color to the pixels in the rectangular region are highlighted as obstacles. 

The first approach was based on comparing the RGB values of the pixels. Pixels having high probability of belonging to the histogram of the RGB image inside the rectangular region were considered as non-obstacles.

Figure 2: The pixels which did not match the pixels
inside the rectangular region are highlighted in red
To get a cleaner result, we modify the algorithm to consider larger non-obstacle area for better histogram estimate.

Figure 3: Improved detection results on the same video frame
This technique still suffered from some major drawbacks: the floor regions suffering from specular reflections (due to shining light of the surface) and floor regions containing shadows are still highlighted as obstacles. 

Hence, we investigated the same algorithm, but using the HSV model as it can handle the cases of specular reflection better. The technique is similar to the Mean-Shift algorithm used for tracking moving objects. 

Figure 4: Obstacles highlighted using HSV model
The HSV model handles cases of specular reflection and shadows well, but fails at many locations. Specially regions with darker intensities are not detected as obstacle.

We switched back to the RGB model and find regions of specular reflection (based on intensity and gradient magnitude) and mark them as non-obstacles and regions of shadows are blurred to soften their effects. A technique for storing and retrieving previous histograms was implemented for better performance. The algorithm was implemented on various video in both indoor and outdoor settings. Following were results:

Figure 5 a): Results in outdoor setting
Figure 5 b): Results in Indoor setting
Figure 5 c): Results in Indoor setting

A segmentation based technique was also tried but was rejected owing to the huge computation time. The algorithm is currently being converted into a viable product that can be used for navigation using only a video input and no other sensors. 

My sincere advice for people looking for work in the field of Computer Vision to definitely get in touch with the folks working at Solion Technologies for its awesome start-up like culture and great office environment.

Friday 4 July 2014

Renovating Glucometers

A great chunk of today's biological research focuses on improving existing technologies. For example, using image processing for cell counting in fluorescent stained leaf stem, or  finding techniques for efficient computational methods for studying molecular, structural or cellular biology. Biologists today are also going the smart way by using smartphones as a computing platform. EyeNetra (Link: www.eyenetra.com) , a handheld device that integrates with a smartphone and provides vision correction technology to the masses is a great example.

A similar research is being conducted to replace the tradition glucometers with handheld devices wherein simple image capture of the blood impregnated strip will deduce the glucose content in blood.

Microfluidic paper when saturated with blood yield varied intensities of color where the intensity of the color developed is inversely proportional to the glucose content in blood. Most of us will be familiar with Accu-Chek strips that are used by diabetic patients to test their insulin content. In the particular case of Accu-Chek strips, a greenish color develops when saturated with human blood. The luminosity of the color developed is higher for lesser glucose content (Simply said, higher the glucose concentration, darker is the color developed).

Such is the plot obtained for glucose concentration vs. the color developed on the Accu-Chek strips:

We found that the glucose concentration correlated with the luminosity of color developed better as compared to various other combinations of the red, green or blue components.

We imagined an app that could capture the image of the strip and find the luminosity of the circular region on the strip. A person could thus find his/her blood glucose level without the use of traditional glucometers. This smartphone based app can also suggests remedies and digitally transmit the result along with the phone‘s location to a central database. This can be used to estimate demographics consisting of people with abnormal glucose levels.

A technique was required for segmenting the circular disk in the Accu-Chek strips. We used snakuscules for their ability to segment circular contours. Following is the sequence in which a snakuscule captures circular disk on the strip.


A patient capturing the strip's image may be present in different environment settings and under varied ambient lighting conditions. Although the color produced on the strip will be similar for similar glucose levels, but it may be perceived as different due to presence of external ambient lighting. It was thus necessary to normalize the colors using a color constancy algorithm. 

Von-Kries coffecient Law for color constancy gave the following results, where 2 different strips with same glucose content, but in different ambient lighting were converted into images with similar colors.

a)                                                      b)                                               c)                                                d)
RGB Correction: Von-Kries Coefficient Law can also be used to add illumination to dark images. Figure (b) and (d) are obtained from figure (a) and (c) respectively.
After normalizing the illumination and segmenting the disk, we calculated the luminosity of the color developed by averaging the luminosity of all individual pixels inside the disk. A final equation was plotted for glucose concentration vs. luminosity (Complete data not shown).


The above strategy can be used for finding the glucose concentration of any strip using the information about the color developed upon blood impregnation. A smartphone app developed using such an algorithm can greatly benefit third world countries where in low cost, portable devices can benefit the masses. With more than 347 million people suffering from diabetes worldwide, such technology can be efficiently used for getting quick and reliable results. 


Thursday 1 May 2014

Pedestrian Detection: Why Dalal and Triggs are the godfathers of today's computer vision family!

Detecting objects in an image has always been the hot trend among the computer vision enthusiasts. What initially began as a task of detecting a single object in an image has today extended to large scale competitions that utilize millions of images for training classifiers that can detect more than a hundred categories of objects in a single image. For example, the ILSVRC2014 (ImageNet Large Scale Visual Recognition Challenge) that dare competitors to detect up-to 200 object categories in a single image.

Lowe's SIFT (Scale Invariant Feature Transform) was one of the earliest attempt at matching objects in an unknown image with that of the training image. SIFT, although still considered the best method for object detection fails when an interesting object suffers from in-class variation. An alternative was suggested by Dalal and Triggs in their seminal research work on human detection: "Histogram of Oriented Gradients for Human Detection". The original paper can be found here.

The paper describes an algorithm that can handle the variation in human postures, differently colored clothing, and viewing angle while detecting human figures in an image. To say it simply, the algorithm could identify humans (or any other object) irrespective of its posture and color variation. Here I explain the implementation in detail.

Creating the HOG feature descriptor

The authors compute weighted histograms of gradient orientations over small spatial neighborhoods, gather these neighboring histograms into local groups and contrast normalize them. 

Following are the steps: 

a) Compute centered horizontal and vertical gradients with no smoothing.
b) Compute gradient orientation and magnitudes. 

  • For color image, pick the color channel with the highest gradient magnitude for each pixel.

c) For a 64x128 image,

  • Divide the image into 16x16 blocks of 50% overlap.  (7x15 = 105 blocks in total)
  • Each block should consist of 2x2 cells with 8x8

d) Quantize the gradient orientation into 9 bins

  • The vote is the gradient magnitude
  • Interpolate votes tri-linearly between neighbouring bin center
  • The vote can also be weighted by a Gaussian to downweight the pixels near the edge of the block
e) Concatenate histograms (Feature Dimension: 105x4x9 = 3,780)


The entire technique was summarized nicely in a lecture by Dr. Mubarak Shah (Professor University of California, Florida)



Training Methodology

We construct a SVM classifier using positive images (containing human figures) and negative images (no human figures) using the INRIA dataset. All the images (positive and negative were resized to 128x64 pixel size and HOG feature descriptors were computed for each one of them. The images were fed into the classifier and trained using supervised learning. 


Choosing the Training Dataset 

The INRIA dataset (webpage link) was constructed which contained 1800 pedestrian images, in diverse environments, lighting conditions and large range of poses and backgrounds. The INRIA dataset is much more challenging then the initially used MIT pedestrian dataset.

For training 1208 128x64 size positive images of humans were taken, all cropped from a varied set of photos.


Similarly, 1218 negative images were taken containing no human figures.


Sliding Window Approach

The image is scanned at all scales and positions. Initially windows are extracted at the lowest scale i.e. 128x64 size and then increased every time by a ratio of 1.05. HOG is computed for every part of the image inside the detection window and fed into the classifier.


Results 

Some result as obtained after non-maximal suppression of the detected windows. 







Once the required dataset is provided, the above algorithm can also used for detecting interest objects other than human figures (e.g. Cars and motorbikes). Algorithm can handle the situation of in-class variation along with efficient performance. The HOG descriptor suggested by Dalal and Triggs today is at the frontiers of object recognition systems.  


Tuesday 11 March 2014

Snakuscules

Recently got familiar with a methodology for contour segmentation used in biomedical image processing. Researchers at the 'Biomedical Imaging Group' of École polytechnique fédérale de Lausanne (EPFL), worked on segmenting approximate circular regions in images using the concepts of snakes, known as Snakuscule.

A snakuscule is a simple active counter that preys upon bright blobs in an image. Here is my implementation:

Snakuscule enveloping a bright blob in an image
Such active contours move under the influence of energy gradients. For every snakuscule, the energy difference between the outer adjoining annulus and the inner disk is calculated. Such active contours can be programmed to prey upon bright blobs in an image. Hence, they move in the direction of decreasing energy difference.


Energy to be minimized can be given by the equation:


To minimize energy, and to detect brighter blobs, the snakuscule can move in any of the four directions, or vary its radius. Out of these six possible actions, snakuscule selects the one that maximizes the decrease in energy.

We normalize the energy function by dividing the energy of outer annulus and inner disk by the area constituting them respectively.

Normalized form:

A more rigorous implementation can be seen in the original paper on snakuscules by Philippe Thévenaz and Michael Unser. 

Matlab codes for the same can be found at: github.com/sanyamgarg93/Snakuscules