Monday, 18 January 2016

Load Testing Project Oxford's API



Last year at Build 2015, Microsoft announced a set of pre-trained computer vision REST APIs and SDK for .NET for writing applications that utilize AI. The APIs were introduced to easily make sense of massive multimedia data while dealing with images, videos and voice.

Let's consider an example of an android app that detects and labels images in your phone gallery based on your friends that are present in a particular image. To build such an app with client side face detection and recognition APIs, you need to write complex computer vision methods to detect face (probably implement the Haar cascade in OpenCV, which by the way only detects frontal faces, or you can look into flandmark library which has it's complications while using) followed by face alignment for face recognition. You would further need to train a classifier for recognizing the faces. This is tiresome for a developer who might not have the time, expertise or infrastructure for writing these methods.


Instead, Microsoft's Project Oxford API presents a simple one line code for face detection, verification, grouping and identification. The API simply uploads the image and the detection results are returned.

1
2
3
4
using (Stream imageFileStream = File.OpenRead(imageFilePath))
{
    var faces = await faceServiceClient.DetectAsync(imageFileStream);    
}  

A great documentation on how to begin using the APIs in C# and Android can be found on their official website: Face API Documentation

  
The catch?
You need a constant internet connection for uploading the images whenever you need to use the APIs. Further, an azure subscription is required and you will have to pay in case you exceed the transaction limit.

In this blog, I am going to explain how to make multiple calls for face detection and also explain how you can load test the APIs for performance testing. You should first begin by registering on the official website using your Live ID, which will allow you upto 30,000 transactions per month which is sufficient for testing purposes. Before performance testing the APIs, I would advice everyone to code the detection method themselves as explained in the official documentation. The code is really easy! You can also checkout my code from github: ProjectOxfordLoadTest

To load test the APIs, I collected images using the publicly available datasets such as BioID database, Caltech Frontal face Database and the ORL Face database. Following is a simple Unit Test for testing the API.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
[TestMethod]
public async Task TestFaceDetection()
{
    String filePath = "D:\Face Detection Databases\CaltechFaces\image.jpg";

    //Detect faces in the selected image
    FaceRectangle[] faceRects = await UploadAndDetectFaces(filePath);

    Assert.IsTrue(faceRects.Length > 0); //All images contain detectable frontal faces
}

private async Task<FaceRectangle[]> UploadAndDetectFaces(string imageFilePath)
{
    try
    {
        using (Stream imageFileStream = File.OpenRead(imageFilePath))
        {
            var faces = await faceServiceClient.DetectAsync(imageFileStream);
            var faceRects = faces.Select(face = > face.FaceRectangle);
            return faceRects.ToArray();
        }
    }
    catch (Exception)
    {
        return new FaceRectangle[0];
    }
}

A comprehensive documentation for creating a Web Performance and Load Test Project in Visual Studio is available on MSDN, which can help all beginners to load the test environment and add a Load Test.

Load test is basically created to check how the API performs under stress from parallel calls. The Load Test project (explained above) can automatically create parallel loads for testing and the number of calls can be varied in the properties. I randomly chose images from the database folder so that every parallel call can have different images for testing as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
public async Task TestFaceDetection()
{
    //Extract all image files' location from a given folder
    String searchFolder = @"D:\Face Detection Databases\CaltechFaces";
    var filters = new String[] { "jpg", "jpeg", "png", "gif", "tiff", "bmp" };
    var files = GetFilesFromDirectory(searchFolder, filters, true);
    int numberOfFiles = files.Length;

    //Return a random image location 
    Random rnd = new Random();
    int randomImage = rnd.Next(0, numberOfFiles);
    string filePath = files[randomImage];

    //Detect faces in the selected image
    FaceRectangle[] faceRects = await UploadAndDetectFaces(filePath);
 
    Assert.IsTrue(faceRects.Length > 0); //All images contain detectable frontal faces
}

Further, the above Test Method will provide the performance timings for running the entire TestFaceDetection() method, hence we define a timing context around the UploadAndDetectFaces(string) method so that we can accurately measure the time taken by the call to face detection API. A TestContext method is included in your test class as described below:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
[TestClass]
public class UnitTest1
{
    public TestContext TestContext
    {
        get{ return context; }
        set{ context = value; }
    }
    private TestContext context;
    
    public async Task TestFaceDetection()
    {
        this.context = TestContext;
        
        //Extract all image files' location from a given folder
        String searchFolder = @"D:\Face Detection Databases\CaltechFaces";
        var filters = new String[] { "jpg", "jpeg", "png", "gif", "tiff", "bmp" };
        var files = GetFilesFromDirectory(searchFolder, filters, true);
        int numberOfFiles = files.Length;
        
        //Return a random image location 
        Random rnd = new Random();
        int randomImage = rnd.Next(0, numberOfFiles);
        string filePath = files[randomImage];
        
        if (context.Properties.Contains("$LoadTestUserContext")) //Begin timing load test
         context.BeginTimer("MyTimerFaceDetection");
        
        //Detect faces in the selected image
        FaceRectangle[] faceRects = await UploadAndDetectFaces(filePath);
        
        if (context.Properties.Contains("$LoadTestUserContext")) //End timing load test
         context.EndTimer("MyTimerFaceDetection");
        
        Assert.IsTrue(faceRects.Length > 0); //All images contain detectable frontal faces
    }
}

Once you run your load test, you can vary the parallel load in the properties. This can help you evaluate how the API performs under stress and how many tests are performed per second in your present net bandwidth. The time taken for returning the detected faces will depend upon the time taken to upload the image pixels to the buffer and hence the performance evaluations will depend upon the internet bandwidth.

If you have been able to successfully create and run the load test, you will be able to visualize the performance graphs of the API. You can also visit my github repo of the project at: ProjectOxfordLoatTest.

For a constant load of just one user and a test time of 3:00 minutes, a total of 184 calls were made to the server with an average response time of 0.93 seconds. This response time was for the images in the ORL Database which hardly have a size of 10 KB per image. 95% of the calls were responded to within a 1.84 seconds!
Performance graph using images from ORL Database (~ 10 KB)
Further, for the images in BioID Database which contains images of approximately 65 KB size, only a total of 141 calls were made in 3:00 minutes. This yields an average response time of 1.14 seconds, which amounts to 0.87 FPS.
Performance graph using images from BioID Database (~ 65 KB)
Thus, Project Oxford's Face API definitely appear to be a good alternative if you have to detect faces, head pose and emotion, they are certainly not suitable for real time processing. The exposed APIs do implement the current state-of-the-art methods for identification, tracking and recognition and would be my first choice if I have to build a computer vision app that can benefit from server-side processing.