Solved: Re: What is the best way to use credentials for AP...

lugger1 · 04-19-2023 07:38 AM

Hello, I have an Databricks account on Azure, and the goal is to compare different image tagging services from GCP and other providers via corresponding API calls, with Python notebook. I have problems with GCP vision API calls, specifically with credentials: as far as I understand, the one necessary step is to set 'GOOGLE_APPLICATION_CREDENTIALS' environment variable in my databricks notebook with something like

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] ='/folder1/credentials.json'

where '/folder1/credentials.json' is the place my notebook looks for json file with credentials (notebook is in the same folder, /folder1/notebook_api_test).

I am getting this path by looking into Workspace-> Copy file path in the Databricks web page.

But this approach doesn't work, when cell is executed, I am getting this error:

DefaultCredentialsError: File /folder1/credentials.json was not found.

What is the right way to deal with credentials to access google vision API from Databricks notebook?

lugger1

Ok, here is a trick: in my case, the file with GCP credentials is stored in notebook workspace storage, which is not visible to os.environ() command. So solution is to read a content of this file, and save it to the cluster storage attached to the notebook, which is created with the cluster and is erased when cluster is gone (so we need to repeat this procedure every time the cluster is re-created). According to this doc, we can read the content of the credentials json file stored in notebook workspace with

with open('/Workspace/folder1/cred.json'): #note that I need a full path here, for some reason
content = f.read()

and then according to this doc, we need to save it on another place in a new file (with the same name in my case, cred.json), namely on cluster storage attached to the notebook (which is visible to os-related functions, like os.environ()), with

fd = os.open("cred.json", os.O_RDWR|os.O_CREAT)
ret = os.write(fd,content.encode())
#need to add .encode(), or will get TypeError: a bytes-like object is required, not 'str'
os.close(fd)

Only after that we can continue with setting an environment variable, required for GCP authentication:

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] ='./cred.json'

and then API calls should work fine, without DefaultCredentialsError.

View solution in original post

Aris_O

Hi @lugger1,

Welcome to Google Cloud Community.

You must enter your authentication credentials in order to use Google Cloud services, such the Vision API in Databricks.

To properly set up your Google Cloud credentials in your Databricks notebook, follow these steps:

In your Google Cloud project, create a service account with the rights required to utilize the Vision API.
Download the service account's JSON key file.
Upload the JSON key file to your Databricks workspace in your Databricks notebook or to a cloud storage platform like Azure Blob Storage.
Set the path to the JSON key file's location in the GOOGLE_APPLICATION_CREDENTIALS environment variable in your notebook.

After setting the environment variable GOOGLE_APPLICATION_CREDENTIALS, you ought to be able to utilize the Google Cloud Vision API in your notebook without encountering the DefaultCredentialsError.

Here are some documentations that might help you:
https://cloud.google.com/databricks?_ga=2.82645974.-1392753435.1676655686
https://cloud.google.com/blog/products/data-analytics/databricks-on-google-cloud?_ga=2.82645974.-139...

lugger1

Ok, here is a trick: in my case, the file with GCP credentials is stored in notebook workspace storage, which is not visible to os.environ() command. So solution is to read a content of this file, and save it to the cluster storage attached to the notebook, which is created with the cluster and is erased when cluster is gone (so we need to repeat this procedure every time the cluster is re-created). According to this doc, we can read the content of the credentials json file stored in notebook workspace with