Foreground-Background Separation using Core-ML

Published in

Dev Genius

9 min readJun 27, 2020

Recently I got really interested in Image-Segmentation using deep learning and whenever I learn something related to the machine and deep learning they just remain in Jupiter notebook, I don't do anything with these models and never have any experience in deploying them. So I decided to create a simple application and after looking at many frameworks such as ML-Kit, Flask, and IOS.

Finally, I decided to go with IOS and there are two reasons for that:-

I have never worked on IOS, so by doing an IOS project I will learn something new and after reading about Core-ML I just got excited and was eager to do something with that.
I am just Apple Fan.

For this article, I am not going in detail about Image-Segmentation, which I would leave for some other time and I am sure you all would be knowing what is Image Segmentation that's why you landed on this page.

But, just for reference: Image Segmentation is partitioning an image into regions in order to extract multiple different objects in the image. One simple example is shown below. Image Segmentation is used in various tasks from medical imaging, security, self-driving cars and recently most of us are using this on video call applications such as hangout, zoom, or teams as they provide a feature for changing background.

I have kept this project a little simple, rather than working on the videos I will be doing image segmentation on still images and the main task is foreground extraction. For reference, this is what we want to achieve using this application.

DeepLab (V3):

For image segmentation, I will be using the google DeepLab(V3) model that has been used in the creation of the ‘portrait’ modes of Pixel 2 and Pixel 2 XL smartphones. It is a state-of-art deep learning model for semantic image segmentation, where the goal is to assign semantic labels (e.g., person, dog, cat, and so on) to every pixel in the input image.

The main reason to use this model is that we can’t use Pytorch and Tensorlflow model directly with Core-ML instead Apple provides an already-converted version of DeepLabV3 which makes the work lot easier but you can train your own model and convert the model using CoreML-Tools. But for this, I will stick with the model provided by apple.

Getting Started

Create a New Project:

Head to Xcode and create a new project, choose a single view application as a template and fill all of the information, and make sure to choose “Storyboard” as a user interface.

Setting Up Views

Now let’s set our ViewController with the buttons and a logo. We will have one image view, and a UI Button to initiate masking.

Now, we need to add functionality to select images. For this, I decided to create two buttons on the navigation bar. One will upload an image from the photo library and the other is for taking a photo from the camera. Let’s add them inside viewdidload and then we will set up our views and layouts.

In the above code, I have added a navigation bar with two UI buttons, this will give you two buttons on the right side of the navigation bar.

It’s important to change the Info.plist file and add a property so that an explanation of why we need access to the camera and the library is given to the user. Add some text to the “Privacy — Photo Library Usage Description”.

Now we need to create a ‘selector action’ methods for all the three buttons, remember we have one button each for photo gallery, camera, and segmentation button, and these methods will be called every time the respective button is pressed.

On pressing the camera and Photogallery button we want Image Picker Controller to become active, it is a controller that manages the system interfaces for taking pictures, recording movies, and choosing items from the user’s media library. Let’s create a method showImagePickerController that will activate the controller for us.

In this, we will pass the source for the image controller which is either .photogallery or .camera. Now we need to create a call-back instance function that will notify the delegate that the user picked a still image or movie, so we can display the selected image on our UI View.

This function takes the image to assign it to imageView, then it does something interesting by invoking image.fixOrientation()method on the image which I will explain later. Here self.originalImage is just a simple variable of type UIImage .

Core-ML

Now the fun part begins. Create a Core-ML group and add all the Deep-Lab model files which you downloaded.

Now we need to set up a vision which is Apple Framework for handling computer vision-related work with our Core-ML model.

First, create an instance of the Core-ML model and the model property of type VNCoreMLRequest and then we will set-up out model if you are following carefully we called setUpModel() inside viewDidLoad() and now we will add what its gonna do.

In the above code, we first create a vision model and then pass this model to a request handler which will create a vision request and we will also use the object’s completion handler to specify a method to receive results from the model after you run the request.

As our model setup is complete we are able to select an image, the only thing left now is to start segmentation, for that we create VNImageRequestHandler object with the image to be processed.

So what's happening here we are using VNImageRequestHandler to create the request for taking our image, do preprocessing as specified by the Core-ML model, send the image to the Core-Ml model, and obtain the output. Here perform([]) method pass in an array of Vision request.

Now the only thing left is to handle the results of the Core-ML model. But before that, we should notice that the output of our Core-ML model is just the array size is 512 by 512. The array content is just an Int value (0,7,15,9 etc). Each of these numbers denotes a color. If our CoreML model recognized that there is misc/background region in the image, all the pixels which form the background/misc area will be given a value of say 0 . This is just an array, not an image so we can’t use this directly as a mask to our original image.

Just for reference output is the type of MLMultiArray which is just apple’s version of the 2D array.

MLMultiArray -> UIImage

Add a new file MLMultiArrayToUIImage and add this code to the following code to it. Most parts of the code were taken from CoreMLHelpers but I made some tweaks to fit my needs.

Here we are iterating on the MLMultiArray and converting these values to grayscale pixels, and I am not interested in semantic segmentation which means classifying different classes in the image, I am only interested in the removal of background from the foreground, so all background which is class 0 is assigned with 255(white) pixel and all identified objects are converted to black 0 pixel.

Now we have an array of UInt8 pixels and we will use these pixel values to convert it into a grayscale image, for that we have to create a CGContext with pixel data and then create an image using those pixel data, which is done by fromByteArray function in the above code. I have done this for GrayScale but you can also tweak this for RGB .

Resize Mask Image

Before creating a handler to handle output of our Core-ML model, there is one more thing to do. Remember, the Deep-Lab model returns a 512 x 512 mask, but our original image could be of different sizes so we can’t directly apply the mask to the image. So we have two options either we scale down our original image or scale up the mask image, and thankfully there is an easy way to do it. Go ahead and create a file UIImageExtension.swift and following code.

Here again, we have a created a context but this time without any pixel data and then draw our maskImage inside the CGcontext and will resize the image to the desired shape.

Handling Segmentation Result

Now we need to add the handler to handle segmentation results. The Vision request’s completion handler indicates whether the request succeeded or resulted in an error. If it succeeded, its results property contains VNCoreMLFeatureValueObservation which is a dictionary of output produced by our Core-ML model, and the first object in that dictionary is our result.

This dictionary of results contains MLfeaturevalue an object which wraps our result and its value together, we just need to get multiArrayValue out of that. Then we converted this array to an image as described above and then resize image-based on the size of the original image. After resizing the image we need to apply a mask to our original image. Inline 9 we are executing maskOriginalImage() function, let’s go ahead and implement that.

Masking Original Image

For masking the image, we first need to create an image mask from our mask and then apply the image mask to our original image.

Finally, we have all our pieces together, but wait if you remember I added fixOrientation() to the image when we choose an image from UIImagePickerController. The reason behind is that UIImage loses its orientation when converting it into CGImage . It affects portraits' images that are automatically put in landscape mode. I have no idea why this happens, but image get’s rotated by 90⁰ and you can check this by commenting line where fixOrientation() method is called.

To fix this issue add this code inside UIImageExtension.swift . You can learn more about how orientation work on ios.

So finally we have added all functionality and now we can build the code and test it out. Here is the sample output.

We can easily notice that it works great but not at edges and the reason is that Deep-Lab is an image segmentation model not for saliency detection and for this tasks we need a deep-learning model designed for salient object detection and a model which I recently learned about is BasNet, it works like a charm and here are some sample image masks.

We can easily notice the difference between the mask images, and how good it is in detecting boundaries for the objects. But unfortunately, there is no Core-ML model and I created these images using Python. There are converter tools to convert PyTorch model to Core-ML but I am not a big fan of those as they still have a lot of limitations. I am working on converting BasNet to Core-ML and once I am successful I will update this blog and share the model but till then this is what we have.

I hope you enjoyed and learned something new and if you did please leave some claps. 👏👏👏

[References]

Qin, X., Zhang, Z., Huang, C., Gao, C., Dehghan, M., & Jagersand, M. (2019). BASNet: Boundary-Aware Salient Object Detection. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, & Hartwig Adam (2018). Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In ECCV.