Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Pub/Sub Publish message from Dataproc cluster using Python: ACCESS_TOKEN_SCOPE_INSUFFICIENT

Hello, 


I have a problem publishing a pub/sub message from the Dataproc cluster, from Cloud Function it works well with a service account, but with Dataproc I got this error: 

raise exceptions.from_grpc_error(exc) from exc
google.api_core.exceptions.PermissionDenied: 403 Request had insufficient authentication scopes. [reason: "ACCESS_TOKEN_SCOPE_INSUFFICIENT"
domain: "googleapis.com"
metadata {
  key: "method"
  value: "google.pubsub.v1.Publisher.Publish"
}
metadata {
  key: "service"
  value: "pubsub.googleapis.com"
}
]

The service account assigned to this cluster suppose to have Pub/Sub publisher but the error above appears.

There is a workaround I have done to sort this issue, which is to use the service account key (.json) file to publish but I believe it is a bad practice as the secrets (private key) are exposed and can be read from the code, I tried to use the secret manager, but again there is no access from the cluster, same error when publishing to pub/sub (403) 

That's how I get the cluster to publish pub/sub topic 

service_account_credentials = {"""  hidden for security reasons lol """} 

credentials = service_account.Credentials.from_service_account_info(
service_account_credentials)

The code to publish 

class EmailPublisher:

def __init__(self, project_id: str, topic_id: str, credentials):
    self.publisher = pubsub_v1.PublisherClient(credentials=credentials)
    self.topic_path = self.publisher.topic_path(project_id, topic_id)

def publish_message(self, message: str):
    data = str(message).encode("utf-8")
    future = self.publisher.publish(self.topic_path, data, origin="dataproc-python-pipeline", username="gcp")
   
logging.info(
future.result())
    logging.info("Published messages with custom attributes to %s", self.topic_path)

Is there any solution to make the Dataproc cluster read the service account and have permission to access GCP's services 

Thank you,
Solved Solved
1 4 1,507
1 ACCEPTED SOLUTION

No worries. Let me clarify:

1. master_config.yaml is not a specific file in Google Cloud Dataproc. It was used as an example. In reality, you would specify the service account and scopes when you create the cluster, either through the Google Cloud Console, the gcloud command-line tool, or the Dataproc API.

2. To create a cluster with a specific service account and scopes using the gcloud command-line tool, you would use a command like this:

gcloud dataproc clusters create my-cluster --scopes=https://www.googleapis.com/auth/pubsub,https://www.googleapis.com/auth/cloud-platform --service-account=<service-account-email>

In the Python SDK, you would specify the service account and scopes when you create the cluster. Here's an example:

from google.cloud import dataproc_v1

cluster_client = dataproc_v1.ClusterControllerClient(client_options={
'api_endpoint': '{}-dataproc.googleapis.com:443'.format('us-central1')
})

cluster_data = {
'project_id': 'my-project',
'cluster_name': 'my-cluster',
'config': {
'gce_cluster_config': {
'service_account': '<service-account-email>',
'service_account_scopes': [
'https://www.googleapis.com/auth/pubsub',
'https://www.googleapis.com/auth/cloud-platform'
]
}
}
}operation = cluster_client.create_cluster('my-project', 'us-central1', cluster_data)

result = operation.result()

3. Once the cluster is created with the correct service account and scopes, your Python code running on the cluster should automatically use the service account. If you're using the Google Cloud Client Libraries, they will automatically use the service account associated with the environment.

4. If your Dataproc cluster is not able to access Secret Manager, it's likely because the cluster was not created with the necessary scopes. You can add the Secret Manager scope (https://www.googleapis.com/auth/cloud-platform) when you create the cluster to give it access to Secret Manager.

5. The --service-account flag in the gcloud command and the service_account field in the Python SDK are not deprecated. They are used to specify the service account that the cluster should use. However, just specifying the service account is not enough - you also need to ensure that the service account has the necessary IAM roles, and that the cluster is created with the necessary scopes.

View solution in original post

4 REPLIES 4