Serverless API around Google Cloud Vision with the Serverless Framework

Recently, the Serverless Framework reached version 1.0 (beta) after about a year of development. This framework has rapidly matured, offering developers a way to build scalable applications without the hassle of managing servers. It simplifies deployment through:

An easy-to-use CLI tool
Customizable deployment via config files
Automation of tedious tasks
Extensibility through plugins

Although the Serverless Framework doesn’t yet support Google Cloud Functions, it’s designed to work with various event-driven compute services, including AWS Lambda and, eventually, Google Cloud Functions. If you’re unfamiliar with serverless computing, I recommend starting with Martin Fowler’s overview.

So, why use a framework instead of cobbling together a bunch of bash scripts? The answer is simple: the Serverless Framework handles AWS IAM Roles, streamlining the deployment process. Additionally, as we’ll see below, it simplifies the inclusion of Python dependencies along with Lambda functions.

I’ve been keen on building a serverless app and, combining that goal with the desire to make Google Cloud Vision more convenient to use, I developed a serverless API wrapper around Google Cloud Vision using AWS API Gateway and AWS Lambda. Despite concerns about integrating services from both Amazon and Google, the Serverless Framework ensured a seamless experience. While I focused on AWS Lambda for this project, I may explore Google’s offering once it matures.

What Does the App Do?

In a nutshell, I created a microservice via API Gateway that accepts an image URL and triggers a Lambda function. This function ingests the image from the URL and sends it to Google Cloud Vision for standard image recognition tasks (e.g., facial detection, OCR, etc.). It returns a JSON response, allowing me to generate a new image with bounding boxes around the detected faces (in this case, my son and me).

Beyond facial detection, Google Cloud Vision supports various image recognition tasks, including LABEL_DETECTION, TEXT_DETECTION, SAFE_SEARCH_DETECTION, FACE_DETECTION, LANDMARK_DETECTION, LOGO_DETECTION, and IMAGE_PROPERTIES.

How to Get Started?

Now, let’s walk through setting up and deploying the project in your own cloud environment.

Google Cloud Vision API

First, let’s set up the Google Cloud Vision API. To access the Cloud Vision API, you’ll need a Google Cloud Platform account. Luckily, Google offers a free 60-day trial with $300 credit.

After creating a Service Account Key, download the JSON file with your application credentials and rename it as cloudvision/google-application-credentials.json.

AWS

We’re mixing cloud providers here, which might not sit well with everyone. However, while AWS lacks a polished API for image recognition, its cloud offerings are robust.

First, create an AWS account. Quick disclaimer: AWS isn’t free, but for our purposes, AWS Lambda is quite cost-effective.

Next, create a default AWS profile on your local machine. Install the aws-cli and run:

bash

aws configure

For more details, Serverless provides an AWS overview along with a video walkthrough on YouTube.

Serverless Framework to Deploy the App on AWS

Once your AWS account is set up, ensure you have Node.js 4.0+ installed and install the Serverless Framework:

bash

npm install serverless -g

Clone the app repository:

bash

git clone git@github.com:ramhiser/serverless-cloud-vision.git
cd serverless-cloud-vision

One of the highlights of the Serverless Framework is its ability to install Python dependencies locally, ensuring they’re deployed along with the app. To achieve this, install the Python dependencies specified in requirements.txt to the cloudvision/vendored folder:

bash

pip install -t cloudvision/vendored/ -r requirements.txt

After installing the Python requirements, deploy the app to AWS:

bash

serverless deploy

This command creates IAM roles for Lambda and API Gateway (only done once), zips Python code and uploads it to S3, creates the Lambda function, and sets up the API Gateway endpoint that triggers the Lambda function.

Upon successful deployment, Serverless provides useful information, including the API endpoint you’ll need to use your microservice. For example:

plaintext

Service Information
service: cloudvision
stage: dev
region: us-east-1
endpoints:
  POST - https://some-api-gateway.execute-api.us-east-1.amazonaws.com/dev/detect_image
functions:
  lambda-cloudvision: arn:aws:lambda:us-east-1:1234567890:function:lambda-cloudvision

The endpoint provided by API Gateway is automatically generated by AWS and will differ in your implementation.

Now you have a simple API for basic image recognition tasks. For instance, you can send an image URL of my son and me to the API:

bash

curl -