The Vision API, part of Google Cloud's Vision AI suite of products, is an advanced image analysis tool that utilizes machine learning models to recognize and understand the contents of images.
It allows developers to integrate powerful image analysis capabilities into their applications without the need for extensive machine learning expertise.
Developers and businesses across diverse industries have integrated Vision AI into their applications to enhance user experiences and offer more sophisticated image-related features. It's been leveraged in e-commerce for product recognition, in healthcare for analyzing medical images, in entertainment for content moderation, and in various other sectors to obtain valuable insights from visual content.
The ease of use, seamless integration through APIs, and the accuracy of image analysis have made the Vision API a go-to solution for businesses seeking advanced image recognition and understanding capabilities.
The Vision API offers the following capabilities:
The Vision API currently allows you to use the following features:
Labels can identify general objects, locations, activities, animal species, products, and more.
A LABEL_DETECTION response includes the detected labels, their score, topicality, and an opaque label ID, where:
If you need targeted custom labels, Cloud AutoML Vision allows you to train a custom machine learning model to classify images.
Labels are returned in English only.
Google Cloud's Vision API is trained to recognize a wide range of popular logos across various industries, and is capable of detecting multiple logos within a single image.
A LOGO_DETECTION response includes the logo and associated description, confidence score, and bounding polygon.
For each logo detected, the API provides:
SafeSearch Detection categorizes content into following categories, and returns the likelihood that each is present in a given image.
The likelihood levels indicate the degree to which the Vision API believes the content falls into each category. The likelihood levels include the following values:
More information on each category can be found here.
Landmark detection allows you to analyze images to identify specific landmarks, such as buildings, natural features, and other recognizable locations.
The Vision API recognizes landmarks and provides information about them, including their name, location, and other relevant details.
Face Detection helps locate faces with bounding polygons, and identifies specific facial "landmarks," such as eyes, ears, nose, mouth, etc., along with their corresponding confidence values.
It returns likelihood ratings for emotions like joy, sorrow, anger, surprise, and general image properties like underexposed, blurred, and headwear present.
Likelihood ratings are expressed as 6 different values: UNKNOWN, VERY_UNLIKELY, UNLIKELY, POSSIBLE, LIKELY, or VERY_LIKELY.
The Vision API's Web Detection feature searches and identifies references to an image that are either similar or identical. The Web Detection feature offers six different types of information listed below:
The Vision API can detect and extract text from images. There are two annotation features that support optical character recognition (OCR):
The body of your POST request contains a JSON object, containing a single requests list, which itself contains one or more objects of type AnnotateImageRequest:
Every request:
Within the requests list:
{
"requests":[
{
"image":{
"content":"/9j/7QBEUGhvdG9...image contents...eYxxxzj/Coa6Bax//Z"
},
"features":[
{
"type":"LABEL_DETECTION",
"maxResults":1
}
]
}
]
}
The response from the Vision API contains a list of Image Annotation results with a score associated with each entity.
The annotate request receives a JSON response of type AnnotateImageResponse. Although the requests are similar for each feature type, the responses for each feature type can be quite different.
For example, the below response is returned for the image:
{
"responses": [
{
"labelAnnotations": [
{
"mid": "/m/0bt9lr",
"description": "dog",
"score": 0.97346616
},
{
"mid": "/m/09686",
"description": "vertebrate",
"score": 0.85700572
},
{
"mid": "/m/01pm38",
"description": "clumber spaniel",
"score": 0.84881884
},
{
"mid": "/m/04rky",
"description": "mammal",
"score": 0.847575
},
{
"mid": "/m/02wbgd",
"description": "english cocker spaniel",
"score": 0.75829375
}
]
}
]
}
You can provide the image in your request in one of three ways:
Next, we will look into how to use the Vision API to extract information from an image and store the results in BigQuery using the Python Client library provided by the Vision API.
1. A Google Cloud project with billing enabled is the first step to use the Vision API.
2. Create a BigQuery Dataset
export BIGQUERY_DATASET=<bigquery_dataset_name>
bq mk -d --location=${REGION}${BIGQUERY_DATASET}
3. Enable Vision API
gcloud services enable vision.googleapis.com
4. Set up authentication and access control
5. The first step for using the Python variant of the Vision API, is to install it. The best way to install it is through pip.
pip install google-cloud-vision
Now we will write a Python script to detect features on an image using Cloud Vision API and store the results into BigQuery.
1. Import the required libraries
In this step, you import necessary libraries to interact with both the Vision API and BigQuery.
google.cloud.vision_v1 is used for the Vision API, and google.cloud.bigquery is used for BigQuery interactions.
import json
from google.cloud import bigquery
from google.cloud import vision_v1
2. The next step is to create a function, which takes in an image and creates a vision object out of it and sends the request to Vision API.
This function accepts the path to the input image. It can be either a publicly accessible image URL or path to the image present locally.
It uses the ImageAnnotatorClient from the Vision API to create a client. The image is read and converted into a Vision API image object. Features to be detected (e.g. face detection, label detection) can be specified. If not specified, Vision API considers all the features that are supported.
A request is made to the Vision API using the specified features, and it returns an AnnotateImageResponse object with features that are detected from the image.
def annotate(image_path: str):
# Create a client for the Google Cloud Vision API.
client = vision_v1.ImageAnnotatorClient()
# Check if the image_path starts with "http" or "gs:" indicating it's a remote image or a local file.
if image_path.startswith("http") or image_path.startswith("gs:"):
# If it's a remote image, create an Image object with an image URI.
image_src=vision_v1.Image()
image_src.source.image_uri = image_path
else:
# If it's not a remote image, it's assumed to be a local file.
# Open the image file in binary read ('rb') mode and read its content.
with open(image_path, "rb") as image_file:
content = image_file.read()
# Create an Image object with the image's content.
image_src=vision_v1.Image(content=content)
# Create a request to annotate the image using the Vision API.
request = vision_v1.AnnotateImageRequest(image=image_src)
# Send the request to the Vision API and receive a response.
response = client.annotate_image(request)
# Check if the response contains an error message. If so, raise an exception with the error message.
if response.error.message:
raise Exception(
"{}\nFor more info on error messages, check: "
"https://cloud.google.com/apis/design/errors".format(response.error.message)
)
# Return the response, which contains annotations for the processed image.
return response
3. Once we have the Annotated Image Response from Vision API, next we write the results into the BigQuery table.
Before adding it to the BigQuery table, you can prepare the Annotated Response according to the need and then insert the structured data into a BigQuery table.
def write_to_bigquery(image_path, annotated_image_response, table_id):
from google.cloud import bigquery
print("Inserting the record into the table for image ", image)
record = {
"image_data": image_path,
"label_annotations": annotated_image_response["label_detection"],
"logo_annotations": annotated_image_response["logo_detection"],
"text_annotations": annotated_image_response["text_detection"],
"landmark_annotations" : annotated_image_response["landmark_detection"],
"face_annotations" : annotated_image_response["face_detection"]
}
bigquery_client = bigquery.Client()
bigquery_client.insert_rows_json(table_id, [record])
Thanks for reading! This content was based on a recent Google Cloud Community event, Unlocking visual intelligence: A Google Cloud showcase of Vision AI. You can see the session recording here for additional details into the topic.
If you have any questions, please leave a comment below. Thank you!