I'm building an ETL from by using Cloud Composer/Airflow and GoogleCloudStorageToBigQueryOperator. Wether the file is CSV or Parquet, loading it using WRITE_TRUNCATE method, the row is randomize. For example:
100 row from the raw CSV with sequential row id such as 1, 2, 4, 5, 6, then load to the BigQuery, and then get random like it can be start from 50, 22, 15, 59, 5, 1, 7 etc.
Any solve for this issue or how to avoid this?
Why are you trying to avoid this?
Sure I need avoid this because the data is not order properly.
Why do you need to have them ordered "properly" in the table?