John Ramey Statistics and Machine Learning

Serverless API around Google Cloud Vision with the Serverless Framework

The Serverless Framework hit v1.0 (beta) recently after about a year of development. The framework has matured quickly to help devs build scalable applications without having to maintain servers. It aims to ease deployment via:

  • Easy-to-use CLI tool
  • Customizable deployment via config files
  • Simplifies/automates the annoying parts
  • Extensible via plugins

Although the Serverless Framework does not yet support Google Cloud Functions, it is designed to support a variety of event-driven compute services, including AWS Lambda and (eventually) Google Cloud Functions. If you’re not familiar with serverless computing, I recommend you start with Martin Fowler’s overview.

So why would I use a framework rather than glue a bunch of bash scripts together? Simple. Serverless Framework takes care of AWS IAM Roles, making deployment much less annoying. Also, as we’ll see below, Serverless makes it easy to include Python dependencies along with your Lambda function.

I’ve been eager to build a serverless app. Combining that goal with wanting to make Google Cloud Vision a bit more convenient to work with, I built a serverless API wrapper around Google Cloud Vision using AWS API Gateway and AWS Lambda. I expected there to be some craziness when combining services from both Amazon and Google, but the Serverless Framework ensured there was none. I focused on AWS Lambda in this project but may play with Google’s offering after it matures a bit.

What Does the App Do?

For the impatient, check out the GitHub repository.

Briefly, I created a microservice via API Gateway that accepts an image URL and triggers a Lambda function, which ingests the image from a URL and sends the image to Google Cloud Vision for standard image recognition tasks (e.g., facial detection, OCR, etc.). A JSON response is returned, from which I was able to produce a new image with bounding boxes around the faces detected (my son and me).

highlighted faces

Beyond facial detection, Google Cloud Vision supports the following image recognition tasks:

  • LABEL_DETECTION
  • TEXT_DETECTION
  • SAFE_SEARCH_DETECTION
  • FACE_DETECTION
  • LANDMARK_DETECTION
  • LOGO_DETECTION
  • IMAGE_PROPERTIES

How to Get Started?

Above, we described what the project does. Now, let’s go through how to set up the project and deploy it in your own cloud environment.

Google Cloud Vision API

First, let’s go through a few details to set up the Google Cloud Vision API. In order to access the Cloud Vision API, you will need a Google Cloud Platform account. Fortunately, Google provides a free 60-day trial with $300 credit.

Next, you will need to create Google Application Credentials. You will need to create a Service Account Key by following the instructions given here. After creating a Service Account Key, I downloaded a JSON file with my application credentials into my app and renamed the file as cloudvision/google-application-credentials.json.

That’s it.

AWS

As I said above, we are mixing cloud providers, which might be weaksauce to some of you. Regardless, AWS doesn’t have a spiffy API for image recognition, but their cloud offerings are mature.

You’ll first need an AWS account. Quick disclaimer: it’s not free, but for our purposes, AWS Lambda is pretty cheap.

Next, you need to create a default AWS profile on your local box. To do this, install the aws-cli and then run the following at the command line:

aws configure

For more details, Serverless provides an AWS overview along with a video walkthrough on YouTube.

That’s it.

Serverless Framework to Deploy the App on AWS

After your AWS account is ready to go, make sure you have Node.js 4.0+ installed. Then, install the Serverless Framework.

npm install serverless -g

The above command makes the serverless command available at the CLI along with two shortcuts: sls and slss. If you simply type serverless, the Serverless Framework provides some intuitive docs.

If you haven’t already done so, git clone the app with:

git clone git@github.com:ramhiser/serverless-cloud-vision.git
cd serverless-cloud-vision

Here’s one of the best parts about Serverless Framework. We can install any Python dependencies we need in our app to a local folder, and those dependencies will be deployed along with our app. To see this, install the Python dependencies in our requirements.txt to the cloudvision/vendored folder:

pip install -t cloudvision/vendored/ -r requirements.txt

NOTE: Homebrew + Mac OS users who encounter the DistutilsOptionError error should see this SO post for a fix.

After installing the Python requirements to the vendored folder, we are ready to deploy our app to AWS. Type the following at the commandline to deploy the wrapper API:

serverless deploy

This command does the following:

  • Create IAM roles on AWS for Lambda and API Gateway (only done once)
  • Zips Python code and uploads to S3
  • Creates AWS Lambda function
  • Creates API Gateway endpoint that triggers AWS Lambda function

Serverless takes a bit longer to run this command the first time because it has to create IAM roles. However, after you have deployed your app once and then made a change to the code, this command executes much quicker.

After the Serverless command returns successfully, it’ll provide a few useful pieces of information, including the API endpoint you’ll need to use your microservice:

Service Information
service: cloudvision
stage: dev
region: us-east-1
endpoints:
  POST - https://some-api-gateway.execute-api.us-east-1.amazonaws.com/dev/detect_image
functions:
  lambda-cloudvision: arn:aws:lambda:us-east-1:1234567890:function:lambda-cloudvision

The endpoint https://some-api-gateway.execute-api.us-east-1.amazonaws.com/dev/detect_image provided by API Gateway is automatically generated by AWS and will differ in your implementation.

Now we have a simple API to apply basic image recognition tasks. For instance, the following curl command sends an image URL of my son and me to the API.

curl -H "Content-Type: application/json" -X POST \
-d '{"image_url": "https://raw.githubusercontent.com/ramhiser/serverless-cloud-vision/master/examples/images/ramhiser-and-son.jpg"}' \
https://some-api-gateway.execute-api.us-east-1.amazonaws.com/dev/detect_image

The response JSON includes a variety of metadata to describe the image and the faces detected:

{
  "responses": [
    {
      "faceAnnotations": [
        {
          "angerLikelihood": "VERY_UNLIKELY",
          "blurredLikelihood": "VERY_UNLIKELY",
          "boundingPoly": {
            "vertices": [
              {
                "x": 512,
                "y": 249
              },
              {
                "x": 637,
                "y": 249
              },
              {
                "x": 637,
                "y": 395
              },
              {
                "x": 512,
                "y": 395
              }
            ]
          },
          "detectionConfidence": 0.98645973,
          ...

This JSON response was used to draw the bounding boxes in the image above. For implementation details, see the examples folder within my repo.

By default, facial recognition is performed as we can see from the lambda_handler function in cloudvision/handler.py:

def lambda_handler(event, context):
    """AWS Lambda Handler for API Gateway input"""
    post_args = event.get("body", {})
    image_url = post_args["image_url"]
    detect_type = post_args.get("detect_type", "FACE_DETECTION")
    max_results = post_args.get("max_results", 4)

    logging.debug("Detecting image from URL: %s" % image_url)
    logging.debug("Image detection type: %s" % detect_type)
    logging.debug("Maximum number of results: %s" % max_results)

    json_return = detect_image(image_url,
                               detect_type,
                               max_results)
    return json_return

The API calls the detect_image function with the image URL and two optional arguments: max_results and detect_type. The max_results argument specifies how many entities (e.g., faces) we wish to find, whereas the detect_type argument indicates the image recognition task we wish to perform. As mentioned above, Google Cloud Vision supports multipe image recognition tasks beyond facial detection. For instance, let’s apply OCR to my employer, uStudio’s, logo:

uStudio

To do this, let’s run the following curl command:

curl -H "Content-Type: application/json" -X POST \
-d '{"image_url": "https://raw.githubusercontent.com/ramhiser/serverless-cloud-vision/master/examples/images/ustudio.jpg", "detect_type": "TEXT_DETECTION"}' \
https://some-api-gateway.execute-api.us-east-1.amazonaws.com/dev/detect_image

The response JSON has a similar form as our facial detection example, but this time a bounding box around the logo is given with a description: Ustudio.

{
  "responses": [
    {
      "textAnnotations": [
        {
          "locale": "et",
          "description": "Ustudio\n",
          "boundingPoly": {
            "vertices": [
              {
                "y": 91,
                "x": 176
              },
              {
                "y": 91,
                "x": 1322
              },
              {
                "y": 348,
                "x": 1322
              },
              {
                "y": 348,
                "x": 176
              }
            ]
          }
        },
        ...

Nice! OCR made simple.

Gotchas

The Serverless Framework simplified our microservice deployment via API Gateway and Lambda. There are some gotchas though that you should be aware of.

First, neither AWS nor Serverless Framework are fully aware of your folder structure, so you’ll need to ensure Python is aware of where your dependencies are located, as in this snippet from cloudvision/lib/__init__.py:

import os
import sys

here = os.path.dirname(os.path.realpath(__file__))
sys.path.append(os.path.join(here, "../vendored"))

Second, YAML indentation. Grrrrr. My API Gateway endpoints were not working for some time (no errors!) because I was missing two spaces in my YAML file. Two spaces!!! It was only after someone else encountered the same issue did I see what to do:

Fixing YAML Indentation

That’s it! Besides the couple of gotchas, the Serverless Framework makes it easy to deploy simple microservices.