I have a 3GB parquet file in GCS I am trying to load into a BigQuery table. I see two errors related to the job on the console:
Resources exceeded during query execution: The query could not be executed in the allotted memory. Peak usage: 101% of limit. Top memory consumer(s): input table/file scan: 100%
Error while reading data, error message: Failed to read a column from Parquet file gs://<redacted>data.parquet: row_group_index = 0, column = 6. Exception message: Unknown error: CANCELLED: . Detail: CANCELLED: File: gs://<redacted>/data.parquet
The file is being written from a pyarrow table. I have tried adjusting `row_group_size` down to 10 thousand rows and that has not seemed to help.
required group field_id=-1 schema {
required binary field_id=-1 observation_id (String);
required binary field_id=-1 exposure_id (String);
required double field_id=-1 mjd;
required double field_id=-1 ra;
optional double field_id=-1 ra_sigma;
required double field_id=-1 dec;
optional double field_id=-1 dec_sigma;
optional double field_id=-1 mag;
optional double field_id=-1 mag_sigma;
required binary field_id=-1 observatory_code (String);
}
The docs only state maximum row size of 50MB (https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-parquet). There is no way a row or even a row group of 10,000 is near that size.
Any guidance would be appreciated.