How to export data from a BigQuery table to Cloud ...

java2018 · 10-24-2023 02:58 AM

When attempting to export data from a BigQuery table to Cloud Storage, you encountered an issue. The error message indicates that the operation cannot be performed on a nested schema field event_params. How can this issue be resolved?

code：
destination_uri = "gs://{}/{}".format(bucket_name, "shakespeare.json")

dataset_ref = bigquery.DatasetReference(project, dataset_id)

table_ref = dataset_ref.table(table_id)

print(destination_uri, table_ref)

extract_job = client.extract_table(

table_ref,

destination_uri,

# Location must match that of the source table.

location="us",

) # API request

extract_job.result() # Waits for job to complete.

error message:

Traceback (most recent call last):
File "/home/tangbo508/python-files/test.py", line 25, in <module>
extract_job.result() # Waits for job to complete.
File "/usr/local/lib/python3.9/dist-packages/google/cloud/bigquery/job/base.py", line 922, in result
return super(_AsyncJob, self).result(timeout=timeout, **kwargs)
File "/usr/local/lib/python3.9/dist-packages/google/api_core/future/polling.py", line 261, in result
raise self._exception
google.api_core.exceptions.BadRequest: 400 Operation cannot be performed on a nested schema. Field: event_params

Poala_Tenorio

The error message you're encountering indicates that you're trying to export a BigQuery table to Cloud Storage, but there's a nested schema field called "event_params" in the table, which is causing the issue. BigQuery doesn't support exporting tables with nested fields directly to Cloud Storage.

To resolve this issue, you can perform a flattening of the nested schema, so that you're working with a flat structure. You can use the FLATTEN function in a SQL query to do this before exporting the data. Here's an example of how to modify your code to flatten the table:

from google.cloud import bigquery
client = bigquery.Client()
bucket_name = "your_bucket_name"
dataset_id = "your_dataset_id"
table_id = "your_table_id"
destination_uri = f"gs://{bucket_name}/shakespeare.json"

# Build a query to flatten the nested schema
query = f"""
SELECT * FROM `{project}.{dataset_id}.{table_id}` """

job_config = bigquery.QueryJobConfig(destination=destination_uri, write_disposition="WRITE_TRUNCATE")

query_job = client.query(query, location="US", job_config=job_config)
query_job.result() # Wait for the query job to complete print(f"Table data exported to {destination_uri}")

By using a query that flattens the nested schema, you should be able to export the data to Cloud Storage without encountering the "nested schema" error.

ikarimov

That approach won't work. According to the QueryJobConfig documentation , the destination should be set to a Table, TableReference or fully qualified table id.

How to export data from a BigQuery table to Cloud Storage?