Has anybody used GCP Databricks to process events from Cloud Pub/Sub? Just wondering if I need to create a service account and assign a pub/sub subscriber role to subscribe to the messages. Any guidance on this would be much appreciated.
Thanks
Thanks for your question, Shiva.
The authentication piece for using Pub/Sub in the deployed Databricks edition on GCP should be similar for all supported GCP products and services. You may find this guide for connecting to BigQuery from GCP Databricks helpful.
As for reading from Pub/Sub in Databricks, a couple of questions for you:
Hi Tianzi,
Thanks for your response. I'm using Spark 3.1.2 and Pyspark.
Does pubsub support structured streaming? If not I'll have to use DStream.
Thanks Shiva for your reply! Structured Streaming for Pub/Sub is not there yet. But I gave this OSS Spark Pub/Sub connector a try. It still works if you use my fork and follow the README there. My PySpark job submitted to a Dataproc cluster (version 1.5, project access set to allow API access to all or select GCP services) ran successfully. Are you submitting Spark jobs to Dataproc too?
Be sure to: