Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

How to get job_id inside a pyspark job running on dataproc cluster ?

RC1
Bronze 4
Bronze 4
Is there any provision or any way to get the job_id of the pyspark job inside the pyspark code while running on Dataproc cluster?
 
 
0 1 781
1 REPLY 1

You can send a custom job_id see this. But to if you are not sending a custom job_id you have to wait until the job is finished to get the id when the object is returned.

The JobId will be available as part of the metadata field in the Operation object that is returned from Instantiate operation. See this article for how to work with metadata.

The Airflow operator only polls on the Operation but does not return the final Operation object. You could try to add a return to execute.