Error loading 3GB parquet file from GCS to BigQuery

I have a 3GB parquet file in GCS I am trying to load into a BigQuery table. I see two errors related to the job on the console:

 

Resources exceeded during query execution: The query could not be executed in the allotted memory. Peak usage: 101% of limit. Top memory consumer(s): input table/file scan: 100%

Error while reading data, error message: Failed to read a column from Parquet file gs://<redacted>data.parquet: row_group_index = 0, column = 6. Exception message: Unknown error: CANCELLED: . Detail: CANCELLED: File: gs://<redacted>/data.parquet​

 

The file is being written from a pyarrow table. I have tried adjusting `row_group_size` down to 10 thousand rows and that has not seemed to help.

Could this have to do with the single dictionary / enum column? Here is the pyarrow schema:

 

required group field_id=-1 schema {
  required binary field_id=-1 observation_id (String);
  required binary field_id=-1 exposure_id (String);
  required double field_id=-1 mjd;
  required double field_id=-1 ra;
  optional double field_id=-1 ra_sigma;
  required double field_id=-1 dec;
  optional double field_id=-1 dec_sigma;
  optional double field_id=-1 mag;
  optional double field_id=-1 mag_sigma;
  required binary field_id=-1 observatory_code (String);
}

 

The docs only state maximum row size of 50MB (https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-parquet). There is no way a row or even a row group of 10,000 is near that size.

Any guidance would be appreciated.

0 2 708
2 REPLIES 2

If you believe that your parquet file does not exceed the maximum input limit, It is possible that is a problem within your project. I would recommend contacting Google and file a support case for this for them to look to this internally in your project.

To avoid resourcesExceeded errors when loading Parquet files into BigQuery, follow these guidelines:

  • Keep row sizes to 50 MB or less.
  • If your input data contains more than 100 columns, consider reducing the page size to be smaller than the default page size (1 * 1024 * 1024 bytes). This is especially helpful if you are using significant compression.


 

Hey, I've got the same issue. Did you manage to resolve this?

Thanks, Al