Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

RAPIDS Accelerator for Apache Spark on Google Cloud Dataproc

I’m exploring NVIDIA’s RAPIDS Accelerator for Apache Spark and running GPU-accelerated DataFrame operations using custom configs.

I’d like to replicate this setup on Google Cloud Dataproc. Any guidance or examples for enabling RAPIDS on Dataproc would be appreciated.

spark.plugins com.nvidia.spark.SQLPlugin
spark.executor.resource.gpu.amount 1
spark.task.resource.gpu.amount 1
spark.rapids.sql.enabled true
spark.rapids.memory.gpu.allocFraction 0.8
spark.rapids.sql.concurrentGpuTasks 2
spark.rapids.sql.explain ENABLE
spark.rapids.sql.incompatibleOps.enabled true

1 1 98
1 REPLY 1

Hi @dineshsinarasse,

Great to hear you're exploring RAPIDS on Dataproc!

To enable the RAPIDS Accelerator for Apache Spark on Google Cloud Dataproc, here’s a quick guide:

1. Use a GPU-enabled Dataproc image:

Use the GPU-accelerated Dataproc image — for example:
gcloud dataproc clusters create my-rapids-cluster \
--region=us-central1 \
--image-version=2.1-debian10 \
--optional-components=JUPYTER \
--master-machine-type=n1-standard-4 \
--master-accelerator type=nvidia-tesla-t4,count=1 \
--worker-machine-type=n1-standard-4 \
--worker-accelerator type=nvidia-tesla-t4,count=1 \
--initialization-actions=gs://dataproc-initialization-actions/rapids/rapids.sh \
--metadata gpu-driver-provider=NVIDIA

2. Spark Config:

Add your RAPIDS configs to the Spark session or spark-defaults.conf. Example:
spark.plugins com.nvidia.spark.SQLPlugin
spark.executor.resource.gpu.amount 1
spark.task.resource.gpu.amount 1
spark.rapids.sql.enabled true
spark.rapids.memory.gpu.allocFraction 0.8
spark.rapids.sql.concurrentGpuTasks 2
spark.rapids.sql.explain ENABLE
spark.rapids.sql.incompatibleOps.enabled true

3. Install GPU driver + RAPIDS plugin:

Handled by the rapids.sh initialization action above.