Hey guys! Can I get some clarity here? I am having problems with running spark jobs on Dataproc serverless.
Problem: The minimum CPU memory requirement is 12 GB for a cluster. That doesn't fit into the region CPU quota we have and requires us to expand it. 12 GB is overkill for us; we don't want to expand the quota.
Details: This link mentions the minimum requirements for Dataproc serverless: https://cloud.google.com/dataproc-serverless/docs/concepts/properties
They are as follows: (a) 2 executor nodes (b) 4 cores per node (c) 4096 Mb CPU memory per node(memory+ memory overhead)
Hence, a total 12 GB of compute memory is required. Can we bypass this and run Dataproc serverless with less compute memory?
If you have some idea about what data you will be processing than you check out dataproc clusters and select the cluster as per your choice. Schedule using workflow in dataproc , which will create a cluster , run your job , delete your cluster.