Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Dataflow worker can not get access to the file on bucket

I encountered difficulties when I tried to move data from Cloud SQL to BigQuery using Dataflow (JDBC to BigQuery). In the log appears an error that the worker can't access the Postgres driver on my buckets. It creates all the necessary temporary directories, but can't read the driver. I gave him all possible rights, but I realized that I'm doing something wrong.
I would be very grateful if you could tell me what my mistake is.1.png2.png

Solved Solved
0 1 980
1 ACCEPTED SOLUTION

Good day @Di2mot,

Welcome to Google Cloud Community!

One of the reasons for this is you might be missing some of the roles required for your dataflow service account and worker service account because by default, dataflow uses two service accounts. Dataflow service account is used for worker instances creation and quota checking while workers service account is used by the worker instances for accessing input/output resources. The worker service account uses default compute engine service account that must have dataflow.admin and dataflow.worker in order to run, create and examine a job. In addition, if your job needs to be write in bigquery, your service account must have bigquery.dataEditor role. You can check this documentation to learn more: https://cloud.google.com/dataflow/docs/concepts/security-and-permissions

Additionally, here is a best practice also on how to give your Dataflow project access to GCS: https://cloud.google.com/dataflow/docs/concepts/security-and-permissions#accessing_gcs

Hope this will help!

View solution in original post

1 REPLY 1

Good day @Di2mot,

Welcome to Google Cloud Community!

One of the reasons for this is you might be missing some of the roles required for your dataflow service account and worker service account because by default, dataflow uses two service accounts. Dataflow service account is used for worker instances creation and quota checking while workers service account is used by the worker instances for accessing input/output resources. The worker service account uses default compute engine service account that must have dataflow.admin and dataflow.worker in order to run, create and examine a job. In addition, if your job needs to be write in bigquery, your service account must have bigquery.dataEditor role. You can check this documentation to learn more: https://cloud.google.com/dataflow/docs/concepts/security-and-permissions

Additionally, here is a best practice also on how to give your Dataflow project access to GCS: https://cloud.google.com/dataflow/docs/concepts/security-and-permissions#accessing_gcs

Hope this will help!