I am using a Dataproc cluster with hadoop 3.2 and spark 3.1. I have a python code to read avro files from GCS. So I have used 'spark-avro_2.12-3.1.1.jar' but its give some error like method not found etc. How to decide which libray is compatible to use ?
Avro based imports are supported for Hive versions 2.3.6 and 3.1.2. A common mistake using avro with Dataproc is that you need to have storage.objects.get permission on the Cloud Storage bucket used for the import.
Check the official documentation of importing Avro into Dataproc.
Additionally you can follow the next tutorial just for some extra help.