Object detection with Turi Create and augmentation using ARKit
Editor - 14 March 2018 - 12 Mins
Editor - 14 March 2018 - 12 Mins
Over the past few years, the use of Machine Learning to solve complex problems has been increasing. Machine learning (ML) is a field of computer science that gives computer systems the ability to “learn” (i.e. progressively improve performance on a specific task) with data, without being explicitly programmed.
Last year was a good year for the freedom of information, as titans of the industry Google, Microsoft, Facebook, Amazon, Apple and even Baidu open-sourced their ML frameworks. In this blog, let’s explore a framework provided by Apple named Turi Create.
In WWDC 2017, Apple provided machine learning videos which uses CoreML framework. However, Apple had a prerequisite of Machine Learning knowledge from the developers. To simplify the development of custom Machine Learning models, Apple released Turi Create in Dec 2017. With Turi Create, you don’t have to be a Machine Learning expert to add recommendations, object detection or activity classification to your iOS App. Turi Create focuses on tasks instead of algorithms and with Turi Create we can tackle a number of scenarios:
You can also work with essential Machine Learning models organized into algorithm-based toolkits:
Moving on, let’s talk a bit about Augmented Reality technology.
AR is a technology that superimposes a computer-generated image on a user’s view of the real world, thus providing a composite view. It is used to enhance the natural environments or situations and offer perceptually enriched experiences. The first functional AR systems that provided immersive mixed reality experiences for users were invented in the early 1990s, starting with the Virtual Fixtures system developed at the U.S. Air Force’s Armstrong Labs in 1992. Checkout the image below of AR implementation in a native app in which directions to the check-in counter of airport is shown.
The purpose of this blog is to detect an object with the help of a model file generated by Turi Create and then show the details related to the object using AR view. I thought of conducting this process inside a conference room in my organization. Luckily the rooms here are named after computer scientists. The room in which I conducted this task goes by the name ‘Donald Knuth’. So I have shown a little profile information like name, photo, description etc about the computer scientist using AR view after the detection of an object.
Now about identifying an object part, there are paintings in the room which are unique in nature. I have selected one of them and conducted this experiment:
To summarize, we will be creating an iOS app in which we will scan the AR View for the unique object (painting) presence. Once it is detected, we will show the details related to that object i.e. the room in which it is placed. The details will be the room name, description and a picture and will be shown in the AR view.
Sample Images : We need to have sample pictures of an object which is required by Turi Create for Machine learning model generation. According to the official documentation of Turi Create, we need 30-200 sample images of the object in different contexts, from a variety of angles and scales, lighting conditions, etc. The more samples we have, the better will be the detection. I have taken 30 images of the painting from different angles and varying distance using my iPhone-7 device. Some of the samples are shown below:
Bounding Box details: Each sample image should have the complete painting in view and dimension details of the painting inside the image is required. For more clarity we need the bounding box of the painting inside the sample image. I have used 30 sample images at different angles and lighting conditions and I made use of the GIMP software to figure out the bounding box dimension details. Following image shows the details:
The Bounding box details of all the sample images are required. This might be a time-consuming process and if we have more than 100 images then it takes a lot of time. To reduce this manual work we can make use of an annotation tool which helps us provide the details of the bounding box pretty fast. I found a list of annotation tools here and for quick start you can go for this simple image annotator tool here. It provides the output in CSV format too.
Now using the bounding box details, we need the following 4 details of the painting in each of the sample image:
Once done, we need to create a file name build.py and this can be created in any text editor. But first lets setup Turi Create.
Let’s get started with the setup of Turi Create. Following are the prerequisities:
NOTE: Apple recommends installing Turi Create using vitrualenv but for the sake of trying this tool for the first time, I installed it directly. I would recommend to use virtualenv too.
Now let’s start with its usage.
After making the build.py file with the above code snippet, go to Terminal and migrate to the directory where build.py is present (using cd command) and execute the below command:
This command starts the building process and exports the Machine Learning model file used by the iOS App for object detection. The key parameters for the creation of the model file are:
Let’s start with creation of an Xcode project with name RoomDetector with default settings. ARKit is available from iOS sdk v11.0+ and is apart from camera there is no specific permission required to use it. Drag and drop your Painting.mlmodel file into your project as shown below:
The above project is available on Github which contains all assets used to develop it.
To make use of the ML model file for object detection process, first import the CoreML and Vision framework of iOS into your UIViewController and then create a VNCoreMLModel:
VNCoreMLModel is a container for a Core ML model used with Vision requests. Vision framework provides high-performance image analysis and computer techniques to identify faces, detect features, and classify scenes in images and video.
Next step is to create a ML Request which is provided by CoreML Framework and here you can provide a completion handler which will be triggered in case of detections:
Now we need to provide the input from ARKit’s scene or its camera view to the ML request. We need to capture the ARKit’s camera view in the form of a CVPixelBuffer. CVPixelBuffer is an image buffer that holds pixels in the memory and it can be captured using the below API:
We need to continuously call this API in an interval to get the images and using ML request we will evaluating the results for object detection. Remember the CVPixelBuffer is an in-memory image buffer so don’t store it globally in an array. As soon as the processing of the image in the buffer is done for object detection, it will be released automatically from memory by iOS. Also, choose the interval wisely for calling the above API(>= 1000 msec).
Finally, to execute the request, make a request handler which takes pixel buffer as a parameter and will call completion handler in case of object detection:
The detection of an object using Turi Create is amazing. We used 30 images in the dataset and when we tested it for detection there were cases when it took some time to detect (>3 sec). But in 70% of the cases the detection was instantaneous (< 3 secs). There are a few ways to increase the accuracy:
I made use of ARkit’s scene view (camera view) for showing image and text just after detection (details of the room). You can check-out the project for more details here and the demo of the App below: