Submit Dataproc serverless job using service account

Hi Folks,

I'm trying to submit to submit a Dataproc serverless job using service account (have necessary permissions) to load multiple csv files (> 100GB) from GCS bucket to Cloud SQL PostgreSQL instance.

Can you please help me with the command that needs to be submitted? How can I reference the JSON file linked with the service account for authentication or is it not necessary to mention?

gcloud dataproc batches submit spark \ --region=us \ --service-account=test@Test.iam.gserviceaccount.com \ --job-type=spark \ --python-file=test.py \ --cluster=cluster_name

When I submit this batch job, how does it work in the background? Does the cluster gets created and deleted after job gets completed or it needs to be deleted manually?

Thanks,

Vigneswar Jeyaraj

0 5 3,008

5 REPLIES 5

never-displayed