Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Avro library in dataproc

RC1
Bronze 4
Bronze 4

I am using a Dataproc cluster with hadoop 3.2 and spark 3.1. I have a python code to read avro files from GCS. So I have used 'spark-avro_2.12-3.1.1.jar' but its give some error like method not found etc. How to decide which libray is compatible to use ?

0 1 862
1 REPLY 1

Avro based imports are supported for Hive versions 2.3.6 and 3.1.2. A common mistake using avro with Dataproc is that you  need to have storage.objects.get permission on the Cloud Storage bucket used for the import.

Check the official documentation of importing Avro into Dataproc.

Additionally you can follow the next tutorial just for some extra help.