Re: Facing issue while querying s3 iceberg data fr...

Suraj-SPG · 05-20-2024 01:28 AM

Hi,

I see following error when quering iceberg data in s3 from bigquery. Can you please help. Also let me know if there is any workaround possible.

Error while reading data, error message: Only data files are currently supported. Positition and equality delete files for merge on read are not supported. File 's3://XXXXX-XXX-dev-XXXX/iceberg/XXXta/yNXXwl1g/app_id_bucket=8/server_date=20240517/XXXXX_XXXXX_00094_bvgih-5e34be59-98c3fd9345a4.parquet' has content type: 'Position Delete'

Reagrds,

Suraj.

Suraj-SPG

I tried optimize and vaccum command in aws athena but didn't help. I still see delete files in s3 which is causing this. Is there a way that I can continue querying this data from bigquery.

ms4446

BigQuery's Iceberg connector doesn't yet fully support the Iceberg feature of "position deletes" (or "delete files"). These files are part of Iceberg's merge-on-read mechanism, which tracks deleted records to ensure accurate query results.

Here are several strategies you can consider:

Filter Out Delete Files in BigQuery:
- Query Adjustments: If you have control over your BigQuery queries, you might be able to construct queries that explicitly exclude files with the "Position Delete" content type. This would involve examining the file metadata in your _file_metadata pseudo column and filtering based on it.
Pre-Process Iceberg Data:
- Manual Compaction: Consider periodically compacting your Iceberg data outside of BigQuery using tools like Apache Spark. Compaction can merge data files and eliminate position deletes.
- Athena Integration (Experimental): While your Athena attempts didn't resolve the issue, explore whether newer Athena versions or settings might offer better compaction/optimization for Iceberg.
Data Transformation:
- Create Transformed Copy: If feasible, you could create a separate, transformed copy of your Iceberg data without the position deletes. This might involve using a data processing tool to read the Iceberg data, apply the deletes, and write out a new dataset that BigQuery can readily consume.

Example: Filtering in BigQuery (Conceptual)

 

SELECT * 
FROM your_iceberg_table
WHERE _file_metadata['content_type'] != 'Position Delete'

Important Considerations

Data Freshness: Be mindful of how often you need to refresh your data. If you choose to pre-process or transform your data, you'll need a mechanism to keep it in sync with the original Iceberg table.

Facing issue while querying s3 iceberg data from bnigquery