Hello,
I have a problem publishing a pub/sub message from the Dataproc cluster, from Cloud Function it works well with a service account, but with Dataproc I got this error:
raise exceptions.from_grpc_error(exc) from exc google.api_core.exceptions.PermissionDenied: 403 Request had insufficient authentication scopes. [reason: "ACCESS_TOKEN_SCOPE_INSUFFICIENT" domain: "googleapis.com" metadata { key: "method" value: "google.pubsub.v1.Publisher.Publish" } metadata { key: "service" value: "pubsub.googleapis.com" } ]
service_account_credentials = {""" hidden for security reasons lol """}
credentials = service_account.Credentials.from_service_account_info(
service_account_credentials)
class EmailPublisher:
def __init__(self, project_id: str, topic_id: str, credentials):
self.publisher = pubsub_v1.PublisherClient(credentials=credentials)
self.topic_path = self.publisher.topic_path(project_id, topic_id)
def publish_message(self, message: str):
data = str(message).encode("utf-8")
future = self.publisher.publish(self.topic_path, data, origin="dataproc-python-pipeline", username="gcp")
logging.info(future.result())
logging.info("Published messages with custom attributes to %s", self.topic_path)
Solved! Go to Solution.
No worries. Let me clarify:
1. master_config.yaml
is not a specific file in Google Cloud Dataproc. It was used as an example. In reality, you would specify the service account and scopes when you create the cluster, either through the Google Cloud Console, the gcloud
command-line tool, or the Dataproc API.
2. To create a cluster with a specific service account and scopes using the gcloud
command-line tool, you would use a command like this:
gcloud dataproc clusters create my-cluster --scopes=https://www.googleapis.com/auth/pubsub,https://www.googleapis.com/auth/cloud-platform --service-account=<service-account-email>
In the Python SDK, you would specify the service account and scopes when you create the cluster. Here's an example:
from google.cloud import dataproc_v1
cluster_client = dataproc_v1.ClusterControllerClient(client_options={
'api_endpoint': '{}-dataproc.googleapis.com:443'.format('us-central1')
})
cluster_data = {
'project_id': 'my-project',
'cluster_name': 'my-cluster',
'config': {
'gce_cluster_config': {
'service_account': '<service-account-email>',
'service_account_scopes': [
'https://www.googleapis.com/auth/pubsub',
'https://www.googleapis.com/auth/cloud-platform'
]
}
}
}operation = cluster_client.create_cluster('my-project', 'us-central1', cluster_data)
result = operation.result()
3. Once the cluster is created with the correct service account and scopes, your Python code running on the cluster should automatically use the service account. If you're using the Google Cloud Client Libraries, they will automatically use the service account associated with the environment.
4. If your Dataproc cluster is not able to access Secret Manager, it's likely because the cluster was not created with the necessary scopes. You can add the Secret Manager scope (https://www.googleapis.com/auth/cloud-platform
) when you create the cluster to give it access to Secret Manager.
5. The --service-account
flag in the gcloud
command and the service_account
field in the Python SDK are not deprecated. They are used to specify the service account that the cluster should use. However, just specifying the service account is not enough - you also need to ensure that the service account has the necessary IAM roles, and that the cluster is created with the necessary scopes.
There are a couple of approaches you can take to enable the Dataproc cluster to access the service account and thereby gain permissions to interact with GCP's services.
One approach involves using the gcloud command-line tool to generate a service account key file and then attaching this file to the Dataproc cluster. You can create the service account key file by executing the following command:
gcloud iam service-accounts keys create <key-file-path> --iam-account=<service-account-email>
After creating the service account key file, you can attach it to the Dataproc cluster by modifying the master_config.yaml
file. In this file, you need to append the following line to the container_definitions
section:
- name: my-service-account
volumeMounts:
- mountPath: /var/secrets/google
name: service-account-key
subPath: key.json
Here, my-service-account
is the name of the service account you created earlier. The /var/secrets/google
mount path is the location on the Dataproc cluster where the service account key file will be attached. key.json
is the name of the service account key file.
After modifying the master_config.yaml
file, you can create the Dataproc cluster. The cluster will then be able to access the service account key file from the Secret Manager and gain permissions to interact with GCP's services.
Another approach is to utilize the Secret Manager to store the service account key file. In this method, you would create a secret in the Secret Manager and then grant the Dataproc cluster access to this secret.
To create a secret in the Secret Manager, execute the following command:
gcloud secrets create <secret-name> --data-file=<path-to-key-file>
After creating the secret, you need to add the following line to the master_config.yaml
file:
- name: my-secret
secretRef:
name: <secret-name>
key: key.json
In this scenario, my-secret
is the name of the secret you created earlier. key.json
is the name of the service account key file stored in the secret.
After modifying the master_config.yaml
file, you can create the Dataproc cluster. The cluster will then be able to access the service account key file from the Secret Manager and gain permissions to interact with GCP's services.
No worries. Let me clarify:
1. master_config.yaml
is not a specific file in Google Cloud Dataproc. It was used as an example. In reality, you would specify the service account and scopes when you create the cluster, either through the Google Cloud Console, the gcloud
command-line tool, or the Dataproc API.
2. To create a cluster with a specific service account and scopes using the gcloud
command-line tool, you would use a command like this:
gcloud dataproc clusters create my-cluster --scopes=https://www.googleapis.com/auth/pubsub,https://www.googleapis.com/auth/cloud-platform --service-account=<service-account-email>
In the Python SDK, you would specify the service account and scopes when you create the cluster. Here's an example:
from google.cloud import dataproc_v1
cluster_client = dataproc_v1.ClusterControllerClient(client_options={
'api_endpoint': '{}-dataproc.googleapis.com:443'.format('us-central1')
})
cluster_data = {
'project_id': 'my-project',
'cluster_name': 'my-cluster',
'config': {
'gce_cluster_config': {
'service_account': '<service-account-email>',
'service_account_scopes': [
'https://www.googleapis.com/auth/pubsub',
'https://www.googleapis.com/auth/cloud-platform'
]
}
}
}operation = cluster_client.create_cluster('my-project', 'us-central1', cluster_data)
result = operation.result()
3. Once the cluster is created with the correct service account and scopes, your Python code running on the cluster should automatically use the service account. If you're using the Google Cloud Client Libraries, they will automatically use the service account associated with the environment.
4. If your Dataproc cluster is not able to access Secret Manager, it's likely because the cluster was not created with the necessary scopes. You can add the Secret Manager scope (https://www.googleapis.com/auth/cloud-platform
) when you create the cluster to give it access to Secret Manager.
5. The --service-account
flag in the gcloud
command and the service_account
field in the Python SDK are not deprecated. They are used to specify the service account that the cluster should use. However, just specifying the service account is not enough - you also need to ensure that the service account has the necessary IAM roles, and that the cluster is created with the necessary scopes.
Thank you so much for your support and patience.