Thursday, 22 October 2015

A blink detection technique using combination of eye detection cascades

During my recent internship at Microsoft Research, I worked on a blink detection system that could detect the blinks of a person when they are working on their computers. A working webcam connection was of course required for this. For all those who are beginning with OpenCV, the documentation on face and eye detection can be found here: OpenCV (C++): Face & Eye Detection

Many initial thoughts were implemented that may help in detecting a person’s blink:  

1. We can detect the face using Voila Jones’s method and subtract consecutive frames in the input video stream. Blinks can be highlighted using this method only when the user does not move his head at all. Slight change in the head pose can completely ruin the results and is this method is only suitable when no head motion is present. Infact, similar techniques [Method 1, Method 2] have already been proposed for people with severe disabilities such as ALS.

2. Detect both the face and the eyes using Voila Jones’s method. Subtracting consecutive frames of the only the eye regions  in the video followed by thresholding can detect motion changes when a person blinks. However, this also produces an output when the user moves the iris without blinking. Hence, this method was also rejected.

3. We can threshold the eye regions and then calculate the number of pixels in the segmented image. More segmented pixels might indicate an open eye, whereas a low count can signify a closed eye. This is because the dark iris in the open eye can significantly increase the count of thresholded pixels. Such a technique is prone to changes in ambient light and dark shadows. 

4. Adaptive threshold based on cumulative histogram of the eye region always yields a segmented image irrespective of whether the eye is open or closed and hence, it was not used as well.

5. Using pre-learnt cascades
The problem of blink detection can be broken down into two simpler problems of detecting whether the eye is open or closed using pre-learnt Haar cascades available in OpenCV. It was crucial to know which cascades detect open eyes and which cascades detect closed eyes. After an extensive literature survey, it was found that haarcascade_eye_tree_eyeglasses.xml can detect eyes in both open and closed states, while haarcascade_lefteye_2splits.xml and haarcascade_righteye_2splits.xml detect eyes only when they are open.

It is advisable to run the eye detector only in the region where the eye maybe present. Thus, eye detection is always followed after face detection and almost never performed individually. Separate detectors are initialized for the left and right eye using their respective cascades. Left eye detector is run on the right half of the detected face and right eye detector is run on the left half of the detected face. Although, face detection is performed on a size reduced image, eye detection is performed on the actual image because of its small size.

Following are some of the eye detection cascades that are available:

Cascade Classifier Reliability Speed Eyes Found Glasses
haarcascade_mcs_lefteye.xml 80% 18 msec Open or closed No
haarcascade_lefteye_2splits.xml 60% 7 msec Open or closed No
haarcascade_eye.xml 40% 5 msec Open only No
haarcascade_eye_tree_eyeglasses.xml 15% 10 msec Open only Yes

We begin by scanning the the left side of the face with haarcascade_eye_tree_eyeglasses.xml haar cascade which outputs an eye location irrespective of whether the eye is open or closed. Further, the left side of face is again scanned using haarcascade_lefteye_2splits.xml which detects only open eyes. Thus,

a) If the eye is detected using both the detectors, it implies that the eye is open.
b) If the eye is detected by just the initial detector, it implies that the eye is closed.
c) If none of the detectors detect the eye, it may imply that eye was not found, probably due to occlusions, dark shadows or specular reflections on eyeglasses.

Following is a sample example of an eye detection code written in C++ using OpenCV library which can detect if the eye is open or closed. Based on change in the state of the eye, the blinks can be detected.

vector<Rect> eyesRight = storeLeftEyePos(topRightOfFace); //Detect Open or Closed eyes

if (eyesRight.size() > 0)
{            
       // Now look for open eyes only
       vector<Rect> eyesRightNew = storeLeftEyePos_open(topRightOfFace);

       if (eyesRightNew.size() > 0) //Eye is open
       {              
              //..
       }
       else //Eye is closed
       {              
              //..
       }
}

// Method for detecting open and closed eyes in right half of face
vector<Rect> storeLeftEyePos(Mat rightFaceImage)
{
       vector<Rect> eyes;
       leftEyeDetector.detectMultiScale(
                                        rightFaceImage, 
                                        eyes,
                                        1.1,
                                        2,
                                        CASCADE_FIND_BIGGEST_OBJECT, 
                                        Size(0, 0)
                                        );

       return eyes;
}

// Method for detecting open eyes in right half of face
vector<Rect> storeLeftEyePos_open(Mat rightFaceImage)
{
       vector<Rect> eyes;
       leftEyeDetector_open.detectMultiScale(
                                             rightFaceImage, 
                                             eyes,
                                             1.1,
                                             2,
                                             CASCADE_FIND_BIGGEST_OBJECT, 
                                             Size(0, 0)
                                             );
      
       return eyes;
}

//Loading the cascades
string leftEyeCascadeFilename = 
"C:\\opencv\\sources\\data\\haarcascades_cuda\\haarcascade_lefteye_2splits.xml";
leftEyeDetector.load(leftEyeCascadeFilename);

string leftEye_open_CascadeFilename =
"C:\\opencv\\sources\\data\\haarcascades_cuda\\haarcascade_eye_tree_eyeglasses.xml";
leftEyeDetector_open.load(leftEye_open_CascadeFilename);

The methodology was tested on the Talking Face Video which is a 200 seconds video recording of a person engaged in a conversation. The following images demonstrate the effectiveness of the method described above, where a green box implies that an open eye has been detected, while a red bounding box is drawn when a closed eye is detected.

    


The complete video can be seen here: 



In future, using a method that outputs the Percentage Eye Close (PERCLOS) value of each eye could yield an even better estimate of the state of the eye.