Face Detection with Apple’s iOS 11 Vision Framework

Great stuff is coming from Apple this autumn! Among a lot of new APIs there is the Vision Framework which helps with detection of faces, face features, object tracking and others.

In this post we will take a look at how can one put the face detection to work. We will make a simple application that can take a photo (using the camera of from the library) and will draw some lines on the faces it detects to show you the power of Vision.

Select an Image

I will go fast through this so if you are a really beginner and you find this too hard to follow, please check the my previous iOS related post, Building a Travel Photo Sharing iOS App, first, as it has the same photo selection functionality but explained in greater detail.

You will need Xcode 9 beta and a device running iOS 11 beta to test this. Let’s start by creating a new Single View App project named FaceVision:

Open the Main.storyboard and drag a button Take Photo to the center of it. Use the constraints to make it stay there :) Create a takePhoto action for it:

@IBAction func takePhoto(_ sender: UIButton) {
    let picker = UIImagePickerController()
    picker.delegate = self
    let alert = UIAlertController(title: nil, message: nil, preferredStyle: .actionSheet)
    if UIImagePickerController.isSourceTypeAvailable(.camera) {
        alert.addAction(UIAlertAction(title: "Camera", style: .default, handler: {action in
            picker.sourceType = .camera
            self.present(picker, animated: true, completion: nil)
        }))
    }
    alert.addAction(UIAlertAction(title: "Photo Library", style: .default, handler: { action in
        picker.sourceType = .photoLibrary
        // on iPad we are required to present this as a popover
        if UIDevice.current.userInterfaceIdiom == .pad {
            picker.modalPresentationStyle = .popover
            picker.popoverPresentationController?.sourceView = self.view
            picker.popoverPresentationController?.sourceRect = self.takePhotoButton.frame
        }
        self.present(picker, animated: true, completion: nil)
    }))
    alert.addAction(UIAlertAction(title: "Cancel", style: .cancel, handler: nil))
    // on iPad this is a popover
    alert.popoverPresentationController?.sourceView = self.view
    alert.popoverPresentationController?.sourceRect = takePhotoButton.frame
    self.present(alert, animated: true, completion: nil)
}

Here we used an UIImagePickerController to get an image so we have to make our ViewController implement the UIImagePickerControllerDelegate and UINavigationControllerDelegate protocols:

class ViewController: UIViewController, UIImagePickerControllerDelegate, UINavigationControllerDelegate {

We also need an outlet for the button:

@IBOutlet weak var takePhotoButton: UIButton!

And an image var:

var image: UIImage!

We also need to add the following in the Info.plist to be able to access the camera and the photo library:

  • Privacy - Camera Usage Description: Access to the camera is needed in order to be able to take a photo to be analyzed by the app
  • Privacy - Photo Library Usage Description: Access to the photo library is needed in order to be able to choose a photo to be analyzed by the app

After the users chooses an image we will use another view controller to show it and to let the user start the processing or go back to the first screen. Add a new View Controller in the Main.storyboard. In it, add an Image View with an Aspect Fit Content Mode and two buttons like in the image below (don’t forget to use the necessary constraints):

Now, create a new UIViewController class named ImageViewControler.swift and set it to be the class of the new View Controller you just added in the Main.storyboard:

import UIKit

class ImageViewController: UIViewController {

    override func viewDidLoad() {
        super.viewDidLoad()

        // Do any additional setup after loading the view.
    }
}

Still in the Main.storyboard, create a Present Modally kind segue between the two view controllers with the showImageSegue identifier:

Also add an outlet for the Image View and a new property to hold the image from the user:

@IBOutlet weak var imageView: UIImageView!

var image: UIImage!

Now, back to our initial ViewController class, we need to present the new ImageViewController and set the selected image:

func imagePickerController(_ picker: UIImagePickerController, didFinishPickingMediaWithInfo info: [String : Any]) {
    dismiss(animated: true, completion: nil)
    image = info[UIImagePickerControllerOriginalImage] as! UIImage
    performSegue(withIdentifier: "showImageSegue", sender: self)
}

override func prepare(for segue: UIStoryboardSegue, sender: Any?) {
    if segue.identifier == "showImageSegue" {
        if let imageViewController = segue.destination as? ImageViewController {
            imageViewController.image = self.image
        }
    }
}

We also need an exit method to be called when we press the Close button from the Image View Controller:

@IBAction func exit(unwindSegue: UIStoryboardSegue) {
    image = nil
}

To make this work, head back to the Main.storyboard and Ctrl+drag from the Close button to the exit icon of the Image View Controller and select the exit method from the popup.

To actually show the selected image to the user we have to set it to the imageView:

override func viewDidLoad() {
    super.viewDidLoad()

    // Do any additional setup after loading the view.
    imageView.image = image
}

If you run the app now you should be able to select a photo either from the camera or from the library and it will be presented to you in the second view controller with the Close and Process! buttons below it.

Detect Face Features

It’s time to get to the fun part, detect the faces and faces features in the image.

Create a new process action for the Process! button with the following content:

@IBAction func process(_ sender: UIButton) {
    var orientation:Int32 = 0

    // detect image orientation, we need it to be accurate for the face detection to work
    switch image.imageOrientation {
    case .up:
        orientation = 1
    case .right:
        orientation = 6
    case .down:
        orientation = 3
    case .left:
        orientation = 8
    default:
        orientation = 1
    }

    // vision
    let faceLandmarksRequest = VNDetectFaceLandmarksRequest(completionHandler: self.handleFaceFeatures)
    let requestHandler = VNImageRequestHandler(cgImage: image.cgImage!, orientation: orientation ,options: [:])
    do {
        try requestHandler.perform([faceLandmarksRequest])
    } catch {
        print(error)
    }
}

After translating the image orientation from UIImageOrientationx values to kCGImagePropertyOrientation values (not sure why Apple didn’t make them the same), the code will start the detection process from the Vision framework. Don’t forget to import Vision to have access to it’s API.

We’ll add now the method that will be called when the Vision’s processing is done:

func handleFaceFeatures(request: VNRequest, errror: Error?) {
    guard let observations = request.results as? [VNFaceObservation] else {
        fatalError("unexpected result type!")
    }

    for face in observations {
        addFaceLandmarksToImage(face)
    }
}

This also calls yet another method that does the actual drawing on the image based on the data received from the detect face landmarks request:

func addFaceLandmarksToImage(_ face: VNFaceObservation) {
    UIGraphicsBeginImageContextWithOptions(image.size, true, 0.0)
    let context = UIGraphicsGetCurrentContext()

    // draw the image
    image.draw(in: CGRect(x: 0, y: 0, width: image.size.width, height: image.size.height))

    context?.translateBy(x: 0, y: image.size.height)
    context?.scaleBy(x: 1.0, y: -1.0)

    // draw the face rect
    let w = face.boundingBox.size.width * image.size.width
    let h = face.boundingBox.size.height * image.size.height
    let x = face.boundingBox.origin.x * image.size.width
    let y = face.boundingBox.origin.y * image.size.height
    let faceRect = CGRect(x: x, y: y, width: w, height: h)
    context?.saveGState()
    context?.setStrokeColor(UIColor.red.cgColor)
    context?.setLineWidth(8.0)
    context?.addRect(faceRect)
    context?.drawPath(using: .stroke)
    context?.restoreGState()

    // face contour
    context?.saveGState()
    context?.setStrokeColor(UIColor.yellow.cgColor)
    if let landmark = face.landmarks?.faceContour {
        for i in 0...landmark.pointCount - 1 { // last point is 0,0
            let point = landmark.point(at: i)
            if i == 0 {
                context?.move(to: CGPoint(x: x + CGFloat(point.x) * w, y: y + CGFloat(point.y) * h))
            } else {
                context?.addLine(to: CGPoint(x: x + CGFloat(point.x) * w, y: y + CGFloat(point.y) * h))
            }
        }
    }
    context?.setLineWidth(8.0)
    context?.drawPath(using: .stroke)
    context?.saveGState()

    // outer lips
    context?.saveGState()
    context?.setStrokeColor(UIColor.yellow.cgColor)
    if let landmark = face.landmarks?.outerLips {
        for i in 0...landmark.pointCount - 1 { // last point is 0,0
            let point = landmark.point(at: i)
            if i == 0 {
                context?.move(to: CGPoint(x: x + CGFloat(point.x) * w, y: y + CGFloat(point.y) * h))
            } else {
                context?.addLine(to: CGPoint(x: x + CGFloat(point.x) * w, y: y + CGFloat(point.y) * h))
            }
        }
    }
    context?.closePath()
    context?.setLineWidth(8.0)
    context?.drawPath(using: .stroke)
    context?.saveGState()

    // inner lips
    context?.saveGState()
    context?.setStrokeColor(UIColor.yellow.cgColor)
    if let landmark = face.landmarks?.innerLips {
        for i in 0...landmark.pointCount - 1 { // last point is 0,0
            let point = landmark.point(at: i)
            if i == 0 {
                context?.move(to: CGPoint(x: x + CGFloat(point.x) * w, y: y + CGFloat(point.y) * h))
            } else {
                context?.addLine(to: CGPoint(x: x + CGFloat(point.x) * w, y: y + CGFloat(point.y) * h))
            }
        }
    }
    context?.closePath()
    context?.setLineWidth(8.0)
    context?.drawPath(using: .stroke)
    context?.saveGState()

    // left eye
    context?.saveGState()
    context?.setStrokeColor(UIColor.yellow.cgColor)
    if let landmark = face.landmarks?.leftEye {
        for i in 0...landmark.pointCount - 1 { // last point is 0,0
            let point = landmark.point(at: i)
            if i == 0 {
                context?.move(to: CGPoint(x: x + CGFloat(point.x) * w, y: y + CGFloat(point.y) * h))
            } else {
                context?.addLine(to: CGPoint(x: x + CGFloat(point.x) * w, y: y + CGFloat(point.y) * h))
            }
        }
    }
    context?.closePath()
    context?.setLineWidth(8.0)
    context?.drawPath(using: .stroke)
    context?.saveGState()

    // right eye
    context?.saveGState()
    context?.setStrokeColor(UIColor.yellow.cgColor)
    if let landmark = face.landmarks?.rightEye {
        for i in 0...landmark.pointCount - 1 { // last point is 0,0
            let point = landmark.point(at: i)
            if i == 0 {
                context?.move(to: CGPoint(x: x + CGFloat(point.x) * w, y: y + CGFloat(point.y) * h))
            } else {
                context?.addLine(to: CGPoint(x: x + CGFloat(point.x) * w, y: y + CGFloat(point.y) * h))
            }
        }
    }
    context?.closePath()
    context?.setLineWidth(8.0)
    context?.drawPath(using: .stroke)
    context?.saveGState()

    // left pupil
    context?.saveGState()
    context?.setStrokeColor(UIColor.yellow.cgColor)
    if let landmark = face.landmarks?.leftPupil {
        for i in 0...landmark.pointCount - 1 { // last point is 0,0
            let point = landmark.point(at: i)
            if i == 0 {
                context?.move(to: CGPoint(x: x + CGFloat(point.x) * w, y: y + CGFloat(point.y) * h))
            } else {
                context?.addLine(to: CGPoint(x: x + CGFloat(point.x) * w, y: y + CGFloat(point.y) * h))
            }
        }
    }
    context?.closePath()
    context?.setLineWidth(8.0)
    context?.drawPath(using: .stroke)
    context?.saveGState()

    // right pupil
    context?.saveGState()
    context?.setStrokeColor(UIColor.yellow.cgColor)
    if let landmark = face.landmarks?.rightPupil {
        for i in 0...landmark.pointCount - 1 { // last point is 0,0
            let point = landmark.point(at: i)
            if i == 0 {
                context?.move(to: CGPoint(x: x + CGFloat(point.x) * w, y: y + CGFloat(point.y) * h))
            } else {
                context?.addLine(to: CGPoint(x: x + CGFloat(point.x) * w, y: y + CGFloat(point.y) * h))
            }
        }
    }
    context?.closePath()
    context?.setLineWidth(8.0)
    context?.drawPath(using: .stroke)
    context?.saveGState()

    // left eyebrow
    context?.saveGState()
    context?.setStrokeColor(UIColor.yellow.cgColor)
    if let landmark = face.landmarks?.leftEyebrow {
        for i in 0...landmark.pointCount - 1 { // last point is 0,0
            let point = landmark.point(at: i)
            if i == 0 {
                context?.move(to: CGPoint(x: x + CGFloat(point.x) * w, y: y + CGFloat(point.y) * h))
            } else {
                context?.addLine(to: CGPoint(x: x + CGFloat(point.x) * w, y: y + CGFloat(point.y) * h))
            }
        }
    }
    context?.setLineWidth(8.0)
    context?.drawPath(using: .stroke)
    context?.saveGState()

    // right eyebrow
    context?.saveGState()
    context?.setStrokeColor(UIColor.yellow.cgColor)
    if let landmark = face.landmarks?.rightEyebrow {
        for i in 0...landmark.pointCount - 1 { // last point is 0,0
            let point = landmark.point(at: i)
            if i == 0 {
                context?.move(to: CGPoint(x: x + CGFloat(point.x) * w, y: y + CGFloat(point.y) * h))
            } else {
                context?.addLine(to: CGPoint(x: x + CGFloat(point.x) * w, y: y + CGFloat(point.y) * h))
            }
        }
    }
    context?.setLineWidth(8.0)
    context?.drawPath(using: .stroke)
    context?.saveGState()

    // nose
    context?.saveGState()
    context?.setStrokeColor(UIColor.yellow.cgColor)
    if let landmark = face.landmarks?.nose {
        for i in 0...landmark.pointCount - 1 { // last point is 0,0
            let point = landmark.point(at: i)
            if i == 0 {
                context?.move(to: CGPoint(x: x + CGFloat(point.x) * w, y: y + CGFloat(point.y) * h))
            } else {
                context?.addLine(to: CGPoint(x: x + CGFloat(point.x) * w, y: y + CGFloat(point.y) * h))
            }
        }
    }
    context?.closePath()
    context?.setLineWidth(8.0)
    context?.drawPath(using: .stroke)
    context?.saveGState()

    // nose crest
    context?.saveGState()
    context?.setStrokeColor(UIColor.yellow.cgColor)
    if let landmark = face.landmarks?.noseCrest {
        for i in 0...landmark.pointCount - 1 { // last point is 0,0
            let point = landmark.point(at: i)
            if i == 0 {
                context?.move(to: CGPoint(x: x + CGFloat(point.x) * w, y: y + CGFloat(point.y) * h))
            } else {
                context?.addLine(to: CGPoint(x: x + CGFloat(point.x) * w, y: y + CGFloat(point.y) * h))
            }
        }
    }
    context?.setLineWidth(8.0)
    context?.drawPath(using: .stroke)
    context?.saveGState()

    // median line
    context?.saveGState()
    context?.setStrokeColor(UIColor.yellow.cgColor)
    if let landmark = face.landmarks?.medianLine {
        for i in 0...landmark.pointCount - 1 { // last point is 0,0
            let point = landmark.point(at: i)
            if i == 0 {
                context?.move(to: CGPoint(x: x + CGFloat(point.x) * w, y: y + CGFloat(point.y) * h))
            } else {
                context?.addLine(to: CGPoint(x: x + CGFloat(point.x) * w, y: y + CGFloat(point.y) * h))
            }
        }
    }
    context?.setLineWidth(8.0)
    context?.drawPath(using: .stroke)
    context?.saveGState()

    // get the final image
    let finalImage = UIGraphicsGetImageFromCurrentImageContext()

    // end drawing context
    UIGraphicsEndImageContext()

    imageView.image = finalImage
}

As you can see we have quite a lot of features that Vision is able to identify: the face contour, the mouth (both inner and outer lips), the eyes together with the pupils and eyebrows, the nose and the nose crest and, finally, the median line of the faces.

You can now run the app and take some unusual selfies of yourself. Here’s mine:

I hope you enjoyed this, please let me know in the comments how did it go and if there are things that can be improved. Also, some pictures taken with the app wouldn’t hurt at all :)

You can get the code from here: https://github.com/intelligentbee/FaceVision

Thanks!