Hello,
I have a problem publishing a pub/sub message from the Dataproc cluster, from Cloud Function it works well with a service account, but with Dataproc I got this error:
raise exceptions.from_grpc_error(exc) from exc google.api_core.exceptions.PermissionDenied: 403 Request had insufficient authentication scopes. [reason: "ACCESS_TOKEN_SCOPE_INSUFFICIENT" domain: "googleapis.com" metadata { key: "method" value: "google.pubsub.v1.Publisher.Publish" } metadata { key: "service" value: "pubsub.googleapis.com" } ]
service_account_credentials = {""" hidden for security reasons lol """}
credentials = service_account.Credentials.from_service_account_info(
service_account_credentials)
class EmailPublisher:
def __init__(self, project_id: str, topic_id: str, credentials):
self.publisher = pubsub_v1.PublisherClient(credentials=credentials)
self.topic_path = self.publisher.topic_path(project_id, topic_id)
def publish_message(self, message: str):
data = str(message).encode("utf-8")
future = self.publisher.publish(self.topic_path, data, origin="dataproc-python-pipeline", username="gcp")
logging.info(future.result())
logging.info("Published messages with custom attributes to %s", self.topic_path)
Solved! Go to Solution.
No worries. Let me clarify:
1. master_config.yaml
is not a specific file in Google Cloud Dataproc. It was used as an example. In reality, you would specify the service account and scopes when you create the cluster, either through the Google Cloud Console, the gcloud
command-line tool, or the Dataproc API.
2. To create a cluster with a specific service account and scopes using the gcloud
command-line tool, you would use a command like this:
gcloud dataproc clusters create my-cluster --scopes=https://www.googleapis.com/auth/pubsub,https://www.googleapis.com/auth/cloud-platform --service-account=<service-account-email>
In the Python SDK, you would specify the service account and scopes when you create the cluster. Here's an example:
from google.cloud import dataproc_v1
cluster_client = dataproc_v1.ClusterControllerClient(client_options={
'api_endpoint': '{}-dataproc.googleapis.com:443'.format('us-central1')
})
cluster_data = {
'project_id': 'my-project',
'cluster_name': 'my-cluster',
'config': {
'gce_cluster_config': {
'service_account': '<service-account-email>',
'service_account_scopes': [
'https://www.googleapis.com/auth/pubsub',
'https://www.googleapis.com/auth/cloud-platform'
]
}
}
}operation = cluster_client.create_cluster('my-project', 'us-central1', cluster_data)
result = operation.result()
3. Once the cluster is created with the correct service account and scopes, your Python code running on the cluster should automatically use the service account. If you're using the Google Cloud Client Libraries, they will automatically use the service account associated with the environment.
4. If your Dataproc cluster is not able to access Secret Manager, it's likely because the cluster was not created with the necessary scopes. You can add the Secret Manager scope (https://www.googleapis.com/auth/cloud-platform
) when you create the cluster to give it access to Secret Manager.
5. The --service-account
flag in the gcloud
command and the service_account
field in the Python SDK are not deprecated. They are used to specify the service account that the cluster should use. However, just specifying the service account is not enough - you also need to ensure that the service account has the necessary IAM roles, and that the cluster is created with the necessary scopes.