Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

How to activate Dataproc Job metrics

Hi all,

I've been struggling with it for the past 2 weeks, going through GCP documentation and looking of the solution online. No luck. I simply do not know how to turn on (activate/enable) the Dataproc Job metrics?

When I look into Metric Explorer I see that Job metrics are inactive:

dextras_0-1716792187542.png

But when I check the documentation it states that by default Cluster AND Job metrics are enabled bye default and free of charge:

dextras_1-1716792674212.png

Is there something I am missing? Is there any parameter I need to use while starting my Dataproc cluster in order to activate Job metrics? Did any of you had similar problem?

Solved Solved
1 9 1,011
1 ACCEPTED SOLUTION

To be honest I am still not sure what exactly is the reason.
I did nor mention in original question though that I am not running the jobs directly but using ILUM, I assume that this is the reason why the do not show up in the GCP console. Also ILUM has an open request for integration with GCP https://roadmap.ilum.cloud/boards/feature-requests/posts/google-dataproc-integration where it states that there are a missing features :

  • visibility of the Dataproc jobs on GCP UI
  • managing access with Cloud IAM
  • logging with Stackdriver
  • etc.

I will stop my analysis here. Thanks @DamianS  for taking time to answer!

What you wrote in your answer is also true 🙂 Just not in my case.

View solution in original post

9 REPLIES 9

Hello @dextras  ,Welcome on Google Cloud Community.

Those metrics will not be active until Dataproc does not have at least one finished job. So create a job, run this job AND you will see that metric is active. 

DamianS_0-1716872416914.png

DamianS_1-1716872507454.png

--
cheers,
DamianS
LinkedIn medium.com Cloudskillsboost

 

Damian,

I am working on Dataproc Serverless. The customer metrics are enabled by default. However, I am not clear about the concept of "active". It seems that I can add a filter on project ID and batch ID within limited time frame. Otherwise,  there are a lot less metrics left when "inactive". Are these metrics getting purged while inactive?

@waynez 
Metrics will be "active" when they receive needed by metric data. In example, 
If you have Cloud SQL  in place. This Cloud SQL have metrics about connections, storage size and replication status. IF you don't have replica in place, I mean you didn't configured second instance for replication, this metric will not be active, because there is no data to process. Those metric will not be deleted. Those metrics would not be enable until you provide needed data. 

Hi @DamianS  ! Thanks for checking it out.
I do have several clusters running already (custom image though, running Spark jobs), that have thousands of jobs finished. I still do not see those metrics activated 😞
Did you have to enable them somehow?

@dextras 
Basically I've literally created sample cluster and once done, I've created sample job 😄 Once job has been finished, I've got metric active.

🤔So maybe it is related to how the jobs are run: Using Dataproc API/Console or directly with spark-submit.

This post explains something about logs:
https://cloud.google.com/dataproc/docs/guides/dataproc-job-output
And states that jobs run not using Dataproc API "do not have job ID or drivers." Not about metrics but still interesting.

I will dig further 🙂

I've used this doc for job creation: https://cloud.google.com/dataproc/docs/guides/submit-job#how_to_submit_a_job

Literally, didn't changed anything 😄 

To be honest I am still not sure what exactly is the reason.
I did nor mention in original question though that I am not running the jobs directly but using ILUM, I assume that this is the reason why the do not show up in the GCP console. Also ILUM has an open request for integration with GCP https://roadmap.ilum.cloud/boards/feature-requests/posts/google-dataproc-integration where it states that there are a missing features :

  • visibility of the Dataproc jobs on GCP UI
  • managing access with Cloud IAM
  • logging with Stackdriver
  • etc.

I will stop my analysis here. Thanks @DamianS  for taking time to answer!

What you wrote in your answer is also true 🙂 Just not in my case.

No problem 🙂 Happy to help, somehow 😉