Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Not able to write complex dataframe to bigquery

Hi everyone, 
I'm trying to write dataframe to bigquery using Dataproc job, dataframe columns are of complex structure like df column -  body: struct containing text: array containing element: array containing element: struct

Error is:

Writing dataframe to BQ Table
Failed to write DataFrame to BigQuery due to: An error occurred while calling o259.save.
: com.google.cloud.bigquery.connector.common.BigQueryConnectorException: Failed to write to BigQuery

Caused by: java.lang.IllegalArgumentException: Data type not expected: array<struct<rawtext:string,references:array<struct<end_index:bigint,refids:array<string>,start_index:bigint>>>>
  at com.google.cloud.spark.bigquery.SchemaConverters.toBigQueryType(SchemaConverters.java:447)

I have requirement to write dataframe to the bigquery table but getting above error. df column is having complex structure like nested array or struct. It's working with string but not complex.
Table is created by dataproc job itself on first run.

 

Solved Solved
0 1 1,600
1 ACCEPTED SOLUTION

The error you're encountering suggests that the BigQuery connector may not support the specific nested array of structs in your DataFrame. While this has been a limitation, it's essential to check the latest version of the connector and BigQuery documentation, as there may have been updates or new features that could help.

If the limitation persists, consider these alternatives:

  • Flatten the Data: Modify your DataFrame to flatten the nested structures. This approach may require changes to your data model and queries but can make the data compatible with BigQuery.
  • Use Alternative Formats: Convert your DataFrame into a supported format like JSON or Avro that handles nested structures, and then load it into BigQuery. You may need to use BigQuery functions to transform the data into the final schema within BigQuery.
  • Custom ETL Process: Develop a custom ETL process that transforms the data into a BigQuery-friendly schema before loading.

These approaches can help you work with complex data structures in BigQuery while adhering to the current capabilities of the BigQuery connector.

View solution in original post

1 REPLY 1

The error you're encountering suggests that the BigQuery connector may not support the specific nested array of structs in your DataFrame. While this has been a limitation, it's essential to check the latest version of the connector and BigQuery documentation, as there may have been updates or new features that could help.

If the limitation persists, consider these alternatives:

  • Flatten the Data: Modify your DataFrame to flatten the nested structures. This approach may require changes to your data model and queries but can make the data compatible with BigQuery.
  • Use Alternative Formats: Convert your DataFrame into a supported format like JSON or Avro that handles nested structures, and then load it into BigQuery. You may need to use BigQuery functions to transform the data into the final schema within BigQuery.
  • Custom ETL Process: Develop a custom ETL process that transforms the data into a BigQuery-friendly schema before loading.

These approaches can help you work with complex data structures in BigQuery while adhering to the current capabilities of the BigQuery connector.