Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

performance improvement of Big Query result export to S3

Hey guys,

I am using Informatica mapping to extract data from BQ and load it into AWS-S3. My usual no. of records will be  100+M.

Hence it is taking 3hrs to load this data into S3.

Is there any way I can optimize this?

We are using Big Query REST V2 connector. I have enabled staging query result option.

 

Solved Solved
0 1 479
1 ACCEPTED SOLUTION

Hey @gmani2023 - noodling on some ideas... Are you still experiencing the lag?

There are a few things you can do to optimize the loading of data from BigQuery to S3:

  • Use a faster connector: The BigQuery REST V2 connector is a good option for loading data into S3, but there are other connectors that are faster. For example, the BigQuery ETL connector can load data into S3 up to 10 times faster than the REST V2 connector.
  • Use staging query results: Staging query results can help to improve the performance of the loading process. When you enable staging query results, Informatica will first load the data into a staging table in BigQuery. Then, it will copy the data from the staging table to S3. This can help to reduce the amount of time that it takes to load the data into S3.
  • Use a parallel load: You can also use a parallel load to improve the performance of the loading process. When you use a parallel load, Informatica will load the data into S3 in multiple threads. This can help to reduce the amount of time that it takes to load the data into S3.
  • Use a larger staging table: If you have a lot of data to load, you can use a larger staging table in BigQuery. This can help to improve the performance of the loading process.
  • Use a faster S3 bucket: If you are using a slow S3 bucket, you can use a faster S3 bucket. This can help to improve the performance of the loading process.

Let us know if any of these ideas seem feasible! 

View solution in original post

1 REPLY 1

Hey @gmani2023 - noodling on some ideas... Are you still experiencing the lag?

There are a few things you can do to optimize the loading of data from BigQuery to S3:

  • Use a faster connector: The BigQuery REST V2 connector is a good option for loading data into S3, but there are other connectors that are faster. For example, the BigQuery ETL connector can load data into S3 up to 10 times faster than the REST V2 connector.
  • Use staging query results: Staging query results can help to improve the performance of the loading process. When you enable staging query results, Informatica will first load the data into a staging table in BigQuery. Then, it will copy the data from the staging table to S3. This can help to reduce the amount of time that it takes to load the data into S3.
  • Use a parallel load: You can also use a parallel load to improve the performance of the loading process. When you use a parallel load, Informatica will load the data into S3 in multiple threads. This can help to reduce the amount of time that it takes to load the data into S3.
  • Use a larger staging table: If you have a lot of data to load, you can use a larger staging table in BigQuery. This can help to improve the performance of the loading process.
  • Use a faster S3 bucket: If you are using a slow S3 bucket, you can use a faster S3 bucket. This can help to improve the performance of the loading process.

Let us know if any of these ideas seem feasible!