Re: BigQuery data random after data loading

Xedonedron · 11-07-2024 07:34 PM

I'm building an ETL from by using Cloud Composer/Airflow and GoogleCloudStorageToBigQueryOperator. Wether the file is CSV or Parquet, loading it using WRITE_TRUNCATE method, the row is randomize. For example:

100 row from the raw CSV with sequential row id such as 1, 2, 4, 5, 6, then load to the BigQuery, and then get random like it can be start from 50, 22, 15, 59, 5, 1, 7 etc.

Any solve for this issue or how to avoid this?

mars124

Why are you trying to avoid this?

Xedonedron

Sure I need avoid this because the data is not order properly.

mars124

Why do you need to have them ordered "properly" in the table?