Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Read table from BigQuery and convert to SpecifcAvro record

I want to read records as a specific avro record in my Java API. I have tried to follow this example StorageSampleWithAvroBQ.

I have also generated an Avro class via this gradle plugin.

However, I cannot seem to make it work. Whenever I run my cloud function locally the error I get is

 Caused by: java.lang.ClassCastException: class org.apache.avro.generic.GenericData$Record cannot be cast to class com.tdcx.cwfm.model.CwfmAbs (org.apache.avro.generic.GenericData$Record and com.tdcx.cwfm.model.CwfmAbs are in unnamed module of loader com.google.cloud.functions.invoker.runner.Invoker$FunctionClassLoader @7de26db8).

Kindly guide me in ensuring that I can read and process bigquery records as a specific class ( in this situation its CwfmAbs ).

 

jpjaymetdcx_0-1682661431168.png

Avro Schema

{"namespace": "com.tdcx.cwfm.model",
"type": "record",
"name": "CwfmAbs",
"fields": [
{"name": "BCP", "type": "string"},
{"name": "Project", "type": "string"},
{"name": "Site", "type": "string"},
{"name": "LOB", "type": "string"},
{"name": "Date",
"type": {
"type": "int",
"logicalType": "date"
}
},
{"name": "Emp_ID", "type": "string"},
{"name": "Name", "type": "string"},
{"name": "Supervisor", "type": "string"},
{"name": "Manager", "type": "string"},
{"name": "Scheduled_Hours_Less_Lunch", "type": "float"},
{"name": "Absent_Hours_Less_Lunch", "type": "float"},
{"name": "Late_Hours", "type": "float"}
.
.
.
}
Solved Solved
0 6 3,723
1 ACCEPTED SOLUTION

Besides Apache Beam or Dataflow, there are several other options available for reading data from BigQuery and converting it to Avro. Some of these options include:

  1. BigQuery API: You can use the BigQuery API directly to retrieve the data from BigQuery and then perform the conversion to Avro in your own code. The BigQuery API provides client libraries for various programming languages, allowing you to make requests to the API and retrieve the query results.

  2. Google Cloud Storage: Instead of directly converting the data from BigQuery to Avro, you can export the BigQuery table to Google Cloud Storage in a format like CSV or JSON. Then, you can read the exported files from Cloud Storage and convert them to Avro using a library or tool of your choice.

  3. Apache Spark: Apache Spark is a popular distributed data processing framework that supports reading data from BigQuery using its BigQuery Connector. You can use Spark's DataFrame API or SQL interface to read data from BigQuery, perform transformations, and then convert it to Avro. Spark provides libraries like Avro4s or Spark Avro to help with Avro serialization.

  4. Python libraries: If you are working with Python, you can leverage libraries like pandas and pyarrow to read data from BigQuery into a pandas DataFrame and then convert it to Avro using the avro-python3 library. This approach gives you flexibility in manipulating the data using pandas before converting it to Avro.

View solution in original post

6 REPLIES 6