I want to export the bigquery data to GCS using bq extract command.
As per the documentation, it supports the parquet format but when I try to set --destination_format PARQUET, It throws this error
--destination_format=PARQUET: value should be one of <CSV|NEWLINE_DELIMITED_JSON|AVRO|SAVED_MODEL>
Does that mean parquet is not supported by bq export yet or do we need to set additional parameters?
Solved! Go to Solution.
Hi @hamzasarwar,
Welcome to Google Cloud Community!
BigQuery currently supports exporting data in Parquet format to Cloud Storage. I tried to replicate the bq extract command and successfully exported the Parquet file. You can refer to the command below.
bq extract --destination_format=PARQUET 'your_dataset.your_table' gs://your-bucket/your-file.parquet |
The error you're seeing indicates that the Parquet format is invalid with your bq extract command and does not support exporting to Parquet files. This issue might be due to several reasons. Here are some suggestions that may help resolve the issue:
If the issue persists, I recommend reaching out to Google Cloud Support for further assistance, as they can provide insights into whether this behavior is specific to your project.
I hope the above information is helpful.
Yes, you are correct The bq extract command itself doesn't support Parquet directly. The response you provided outlines a valid workaround, but let's refine it with some extra considerations and best practices:
Workaround to Export to Parquet
Export to a Supported Format First: As the response suggests, start by exporting your data to an intermediate format like CSV or AVRO using bq extract. AVRO is generally preferred over CSV, especially for large datasets or complex schemas, due to its efficiency and schema evolution capabilities.
bq extract --destination_format=AVRO your_project:your_dataset.your_table gs://your-bucket/your-output.avro
Convert to Parquet: Now, you'll need to convert this intermediate file to Parquet. Here are the improved conversion options:
PySpark/pandas: Excellent choice! Both PySpark and pandas provide efficient ways to read AVRO/CSV and write Parquet.
import pandas as pd
# Or use from pyspark.sql import SparkSession to initialize a SparkSession
# Read the AVRO (adjust for CSV if needed)
df = pd.read_parquet('gs://your-bucket/your-output.avro')
# Write to Parquet
df.to_parquet('gs://your-bucket/your_output.parquet', index=False)
Hi @hamzasarwar,
Welcome to Google Cloud Community!
BigQuery currently supports exporting data in Parquet format to Cloud Storage. I tried to replicate the bq extract command and successfully exported the Parquet file. You can refer to the command below.
bq extract --destination_format=PARQUET 'your_dataset.your_table' gs://your-bucket/your-file.parquet |
The error you're seeing indicates that the Parquet format is invalid with your bq extract command and does not support exporting to Parquet files. This issue might be due to several reasons. Here are some suggestions that may help resolve the issue:
If the issue persists, I recommend reaching out to Google Cloud Support for further assistance, as they can provide insights into whether this behavior is specific to your project.
I hope the above information is helpful.