Getting started with Core ML: Build a simple image recognition application

By Sai Kambampati, the author of this article, the author of this article is Sai Kambampati. Proofreading: Liberalism, Lision; Finalized: CMB

Apple announced several frameworks and apis to get developers excited at WWDC 2017, and the most popular of these new frameworks was definitely Core ML. The Core ML framework allows developers to integrate machine learning models into their applications. The biggest advantage of the framework is that no additional neural network or machine learning knowledge is required to use it. Another feature of the Core ML framework is that it can be used by developers who have already trained data models into Core ML models. For demonstration purposes, this article will directly use a Core ML model available on the Apple developer website. So without further ado, let’s move on to Core ML.

Note: This article requires code to be written using Xcode 9 Beta and an iOS 11 beta installed device to test the functionality implemented in this article. While Xcode 9 Beta supports both Swift 3.2 and 4.0, all the code in this article is written using Swift 4.0.

What is Core ML

Core ML enables developers to integrate a wide variety of machine learning models into applications. In addition to supporting extensive deep learning of over 30 layer types, it also supports standard models such as tree integration, SVMs, and generalized linear models. Core ML is built on top of underlying technologies like Metal and Accelerate, so it seamlessly leverages CPU and GPU to maximize performance. Machine learning models can run directly on the device so that data can be analyzed without leaving the device.

– Official Apple documentation about Core ML

Core ML is a new machine learning framework released with iOS 11 at WWDC this year. With Core ML, developers can integrate machine learning models directly into their applications. So what is machine learning? Simply put, machine learning is the application that empowers a computer to learn without explicitly programming it. A trained model is the result of training data sets with machine learning algorithms.

As application developers, our primary concern is how to apply machine learning models to applications to achieve more interesting functionality. Fortunately, The Core ML framework provided by Apple greatly simplifies the process of integrating different machine learning models into applications. This opens up a lot of possibilities for developers to develop features such as image recognition, natural language processing, and text prediction.

Now you’re probably wondering if it’s going to be difficult to incorporate this type of AI into your application, and that’s the interesting part, because Core ML is actually very easy to use. In this article, you’ll see that it only takes 10 lines of code to integrate Core ML into your application.

Pretty cool, huh? Here we go!

Demo program overview

The program that this article will implement is quite simple. The app lets users take or select a photo from an album, and machine learning algorithms try to predict the objects in the photo. While the prediction may not be perfect, this will give you an idea of how to apply Core ML to your application.

start

First, open the Xcode 9 Beta and create a new project. Select the Single View App template for the project and ensure that the language used is set to Swift.

Creating a User Interface

If you don’t want to build the UI from scratch, download the starter project here and jump straight to the Core ML section.

Here we go! Start by opening main.storyboard and adding some UI elements to the view. Select the view Controller In the storyboard and then click on the Xcode menu bar: Editor-> Embed In-> Navigation Controller. You can then see a navigation bar appear above the view. Name the navigation bar Core ML (or whatever you think is appropriate).

Place it at the bottom of the view and scale it so that both ends coincide with both ends of the view. This sets up the interface for the application.

Although this article does not cover automatic layout, it is highly recommended. If you can’t do automatic layout, just go to the storyboard and select the type of device you’re going to run.

Realize photo taking and album functions

Now that you have the interface set up, let’s start implementing the functionality. This section will implement the function of the album and the photo button. In the ViewController. Swift, to comply with UIImagePickerController UINavigationControllerDelegate required by the agreement.

class ViewController: UIViewController, UINavigationControllerDelegate
Copy the code

And then I’m going to add two ibOutlets to my previous UILabel and UIImageView. For simplicity, this article names UIImageView and UILabel as imageView and Classifier, respectively. The code looks like this:

import UIKit
 
class ViewController: UIViewController, UINavigationControllerDelegate {
    @IBOutlet weak var imageView: UIImageView!
    @IBOutlet weak var classifier: UILabel!
    
     override func viewDidLoad() {
        super.viewDidLoad()
    }
    
    override func didReceiveMemoryWarning() {
        super.didReceiveMemoryWarning()
    }
}
Copy the code

Then you need to implement the corresponding button click response action, add the following code to the ViewController:

@IBAction func camera(_ sender: Any) { if ! UIImagePickerController.isSourceTypeAvailable(.camera) { return } let cameraPicker = UIImagePickerController() cameraPicker.delegate = self cameraPicker.sourceType = .camera cameraPicker.allowsEditing = false present(cameraPicker, animated: true) } @IBAction func openLibrary(_ sender: Any) { let picker = UIImagePickerController() picker.allowsEditing = false picker.delegate = self picker.sourceType = .photoLibrary present(picker, animated: true) }Copy the code

The above code creates a constant of type UIImagePickerController and ensures that the user cannot modify the photo that has been taken (whether it was just taken or selected from the album). Then set the delegate to self, and finally present the UIImagePickerController to the user.

So far haven’t add UIImagePickerControllerDelegate corresponding method to ViewController. Swift, therefore the Xcode there will be errors. Here the protocol is implemented in the form of extension:

extension ViewController: UIImagePickerControllerDelegate {
    func imagePickerControllerDidCancel(_ picker: UIImagePickerController) {
        dismiss(animated: true, completion: nil)
    }
}
Copy the code

The above code takes care of the user’s deselecting of the photo. The code so far looks like this:

import UIKit class ViewController: UIViewController, UINavigationControllerDelegate { @IBOutlet weak var imageView: UIImageView! @IBOutlet weak var classifier: UILabel! override func viewDidLoad() { super.viewDidLoad() // Do any additional setup after loading the view, typically from a nib. } override func didReceiveMemoryWarning() { super.didReceiveMemoryWarning() // Dispose of any resources that can be recreated. } @IBAction func camera(_ sender: Any) { if ! UIImagePickerController.isSourceTypeAvailable(.camera) { return } let cameraPicker = UIImagePickerController() cameraPicker.delegate = self cameraPicker.sourceType = .camera cameraPicker.allowsEditing = false present(cameraPicker, animated: true) } @IBAction func openLibrary(_ sender: Any) { let picker = UIImagePickerController() picker.allowsEditing = false picker.delegate = self picker.sourceType = .photoLibrary present(picker, animated: true) } } extension ViewController: UIImagePickerControllerDelegate { func imagePickerControllerDidCancel(_ picker: UIImagePickerController) { dismiss(animated: true, completion: nil) } }Copy the code

Make sure you go back to the storyboard and make connections for all your IBOutlets and ibActions.

In order to access the cameras and albums, there’s one more thing to do. Open info.plist and add two items: privacy-camera Usage Description and privacy-Photo Library Usage Description. That’s because, starting with iOS 10, accessing the camera and albums requires you to specify why.

Now comes the core of this article. Again, if you don’t want to build the interface from scratch, you can download the start project here.

Integrate Core ML data model

Now it’s time to integrate the Core ML data model into the application. As mentioned earlier, you also need to provide a trained model for Core ML to work. You can use your own trained models, but this article will use already trained models available on the Apple developer website.

Scroll down to the bottom of the Machine Learning page on Apple’s developer site to see four Core ML models that have been trained.

This tutorial will use the Inception V3 model, but you can also try the other three. After you download the Inception V3 model, add it to your Xcode project and see what information Xcode displays.

Note: Please ensure that Target Membersip of the project is selected, otherwise the application will not be able to access the file.

In the screenshot above, you can see that the data model is of type neural network classifier. Other information to note is the model evaluation parameter, which represents the input and output parameters of the model. The model used in this article requires input of a 299×299 image and output of the most likely types and their corresponding probabilities for each type.

Another important piece of information in this screenshot is the Model Class, which is automatically generated by the machine learning model (Inceptionv3) and can be used directly in your code. Click the arrow to the right of Inceptionv3 to see the source code for the class.

Now add the model to your code. Open the viewController.swift file and import the Core ML framework at the beginning:

import CoreML
Copy the code

Next, declare a model variable for the Inceptionv3 model and initialize it in the viewWillAppear() method:

var model: Inceptionv3!
 
override func viewWillAppear(_ animated: Bool) {
    model = Inceptionv3()
}
Copy the code

I know what you’re thinking right now.

“Why didn’t you initialize this model earlier?”

“What’s the point of defining it in the viewWillAppear method?”

The point, dear friends, is that it is much faster when apps try to recognize objects in images. (Note: It doesn’t seem to matter if you initialize the model variable directly when you declare it. You can test it yourself.)

Now back to the Inceptionv3.mlModel, the only input parameter that the model accepts is a 299×299 image, so the next step is to figure out how to convert an image to that size.

Convert image

In the ViewController.swift extension, update the code to the following code. It implements the imagePickerController (_ : didFinishPickingMediaWithInfo) method is used for processing the selected image:

extension ViewController: UIImagePickerControllerDelegate { func imagePickerControllerDidCancel(_ picker: UIImagePickerController) { dismiss(animated: true, completion: nil) } func imagePickerController(_ picker: UIImagePickerController, didFinishPickingMediaWithInfo info: [String : Any]) { picker.dismiss(animated: true) classifier.text = "Analyzing Image..." guard let image = info["UIImagePickerControllerOriginalImage"] as? UIImage else { return } UIGraphicsBeginImageContextWithOptions(CGSize(width: 299, height: 299), true, 2.0) image.draw(in: CGRect(x: 0, y: 0, width: 299, height: 0) 299)) let newImage = UIGraphicsGetImageFromCurrentImageContext()! UIGraphicsEndImageContext() let attrs = [kCVPixelBufferCGImageCompatibilityKey: kCFBooleanTrue, kCVPixelBufferCGBitmapContextCompatibilityKey: kCFBooleanTrue] as CFDictionary var pixelBuffer : CVPixelBuffer? let status = CVPixelBufferCreate(kCFAllocatorDefault, Int(newImage.size.width), Int(newImage.size.height), kCVPixelFormatType_32ARGB, attrs, &pixelBuffer) guard (status == kCVReturnSuccess) else { return } CVPixelBufferLockBaseAddress(pixelBuffer! , CVPixelBufferLockFlags(rawValue: 0)) let pixelData = CVPixelBufferGetBaseAddress(pixelBuffer!) let rgbColorSpace = CGColorSpaceCreateDeviceRGB() let context = CGContext(data: pixelData, width: Int(newImage.size.width), height: Int(newImage.size.height), bitsPerComponent: 8, bytesPerRow: CVPixelBufferGetBytesPerRow(pixelBuffer!) , space: rgbColorSpace, bitmapInfo: CGImageAlphaInfo.noneSkipFirst.rawValue) //3 context? .translateBy(x: 0, y: newImage.size.height) context? .scaleby (x: 1.0, y: -1.0) UIGraphicsPushContext(context! newImage.draw(in: CGRect(x: 0, y: 0, width: newImage.size.width, height: newImage.size.height)) UIGraphicsPopContext() CVPixelBufferUnlockBaseAddress(pixelBuffer! , CVPixelBufferLockFlags(rawValue: 0)) imageView.image = newImage } }Copy the code

ImagePickerController (_ : didFinishPickingMediaWithInfo) in the function code interpretation is as follows:

Line #7-11: The first few lines of this method select the image frominfoIn the dictionary (useUIImagePickerControllerOriginalImageKey) out. In addition, once an image is selected, DismissUIImagePickerController.
Line #13-16: Because the model used in this article only accepts size299x299, so we convert the selected image to a square and assign it to another constantnewImage.
Line # 18 to 23: willnewImageconvertCVPixelBuffer. For the unfamiliarityCVPixelBufferFor readers, it is an image buffer used to store pixels in memory, see detailshere.

Line #31-32: Convert all pixels of this image to a device-independent RGB color space. Then create one that holds all the pixel dataCGContextIt is easily called when some basic properties of the context need to be rendered (or changed). And that’s what we’re doing in these two lines — panning and zooming the image.

Line #34-38: Finally, place the graphics context in the current context, render the image, remove the context from the top of the stack, and placeimageView.imageSet tonewImage.

It doesn’t matter if you can’t understand most of the above code right now; these are just some high-level snippets of Core Image that are beyond the scope of this article. All you need to know is that the code above converts the selected image into a form that the data model accepts. I suggest you change the numbers in the code and note the results to better understand the code above.

Using the Core ML

Now let’s get back to Core ML. We use the Inceptionv3 model for object recognition. With Core ML, all we need to do is add a few lines of code. Paste the following code snippet after imageView.image = newImage.

guard let prediction = try? model.prediction(image: pixelBuffer!) else {
    return
}
 
classifier.text = "I think this is a \(prediction.classLabel)."
Copy the code

That’s it! The Inceptionv3 class automatically generates a method called prediction(image:) that can be used to predict what objects will appear in a given image. Here we pass the adjusted image to the method with pixelBuffer as the parameter. Once the result of the String prediction is returned, the classifier label is updated to the name of the identified object.

It’s time to test the app. Compile and run the app on a simulator or your iPhone (with iOS 11 Beta), take a photo from an album or use the camera, and the app will recognize what objects are in the image.

As you test the app, you may notice that it doesn’t always make accurate predictions. It’s not the code that’s the problem, it’s the trained model.

conclusion

Hopefully, you already know how to integrate Core ML into your application; this article is just a primer. If you’re interested in converting trained Caffe, Keras, or SciKit models to Core ML models, stay tuned for our next Core ML tutorial series. I’ll show you how to convert a model to a Core ML model.

To see the demo application in this article, check out the full project on GitHub.

For more information about the Core ML framework, please refer to the official Core ML documentation. Also check out Apple’s WWDC 2017 video:

Introducing Core ML

Core ML in Depth

If you have anything else to say about Core ML, leave us a comment!

This article is translated by SwiftGG translation team and has been authorized to be translated by the authors. Please visit swift.gg for the latest articles.