Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Bq export in parquet format

I want to export the bigquery data to GCS using bq extract  command. 
As per the documentation, it supports the parquet format but when I try to  set --destination_format PARQUET, It throws this error 
--destination_format=PARQUET: value should be one of <CSV|NEWLINE_DELIMITED_JSON|AVRO|SAVED_MODEL>

Does that mean parquet is not supported by bq export yet or do we need to set additional parameters?

Solved Solved
0 2 750
1 ACCEPTED SOLUTION

Hi @hamzasarwar,

Welcome to Google Cloud Community!

BigQuery currently supports exporting data in Parquet format to Cloud Storage. I tried to replicate the bq extract command and successfully exported the Parquet file. You can refer to the command below.

 

bq extract --destination_format=PARQUET 'your_dataset.your_table' gs://your-bucket/your-file.parquet

success export.pngexport file.png

The error you're seeing indicates that the Parquet format is invalid with your bq extract command and does not support exporting to Parquet files. This issue might be due to several reasons. Here are some suggestions that may help resolve the issue:

  • Google Cloud SDK: Ensure that you're using the latest version of gcloud components
  • Ensure you have the necessary permission to perform the task, and consider the location of your Cloud Storage and BigQuery dataset. You can refer to this documentation for detailed information
  • There are alternative methods for exporting a BigQuery dataset to Cloud Storage. You might also want to try using the Console method instead of the bq command. Please refer to this documentation for complete steps on Console and other methods.

If the issue persists, I recommend reaching out to Google Cloud Support for further assistance, as they can provide insights into whether this behavior is specific to your project.

I hope the above information is helpful.

View solution in original post

2 REPLIES 2

Yes, you are correct The bq extract command itself doesn't support Parquet directly. The response you provided outlines a valid workaround, but let's refine it with some extra considerations and best practices:

Workaround to Export to Parquet

  1. Export to a Supported Format First: As the response suggests, start by exporting your data to an intermediate format like CSV or AVRO using bq extract. AVRO is generally preferred over CSV, especially for large datasets or complex schemas, due to its efficiency and schema evolution capabilities.

     
    bq extract --destination_format=AVRO your_project:your_dataset.your_table gs://your-bucket/your-output.avro
    
  2. Convert to Parquet: Now, you'll need to convert this intermediate file to Parquet. Here are the improved conversion options:

    • PySpark/pandas: Excellent choice! Both PySpark and pandas provide efficient ways to read AVRO/CSV and write Parquet.

       
      import pandas as pd 
      # Or use  from pyspark.sql import SparkSession to initialize a SparkSession
      
      # Read the AVRO (adjust for CSV if needed)
      df = pd.read_parquet('gs://your-bucket/your-output.avro')  
      
      # Write to Parquet
      df.to_parquet('gs://your-bucket/your_output.parquet', index=False) 
      
       
       
       

Hi @hamzasarwar,

Welcome to Google Cloud Community!

BigQuery currently supports exporting data in Parquet format to Cloud Storage. I tried to replicate the bq extract command and successfully exported the Parquet file. You can refer to the command below.

 

bq extract --destination_format=PARQUET 'your_dataset.your_table' gs://your-bucket/your-file.parquet

success export.pngexport file.png

The error you're seeing indicates that the Parquet format is invalid with your bq extract command and does not support exporting to Parquet files. This issue might be due to several reasons. Here are some suggestions that may help resolve the issue:

  • Google Cloud SDK: Ensure that you're using the latest version of gcloud components
  • Ensure you have the necessary permission to perform the task, and consider the location of your Cloud Storage and BigQuery dataset. You can refer to this documentation for detailed information
  • There are alternative methods for exporting a BigQuery dataset to Cloud Storage. You might also want to try using the Console method instead of the bq command. Please refer to this documentation for complete steps on Console and other methods.

If the issue persists, I recommend reaching out to Google Cloud Support for further assistance, as they can provide insights into whether this behavior is specific to your project.

I hope the above information is helpful.