Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Apache Hudi or Dataflow

which solution is better for batch workloads Apache Hudi or Dataflow ? considering the daily volume of 20-25 GB/day . my preference is dataflow as it's a serverless service and it supports both batch and streaming workloads. But for complex joins or significant data crunching is apache hudi the best option. 

1 2 813
2 REPLIES 2

Dataflow will be able to handle the volume you have requested. It can unify your streaming and batch workloads, keeping it easy to migrate and re-use code from batch to streaming.

When using Hudi, you may need to choose different tools for data processing. You should also check other Google Cloud Platform tools like [Cloud Data Fusion and Dataproc etc.

What are your sources? Is the data going to be transformed and loaded to BQ?

If you want to go cloud-native, then Dataflow could be an option. GCP Databricks+delta lake is another option you could consider. But a lot will depend on what sources you are extracting data from