which solution is better for batch workloads Apache Hudi or Dataflow ? considering the daily volume of 20-25 GB/day . my preference is dataflow as it's a serverless service and it supports both batch and streaming workloads. But for complex joins or significant data crunching is apache hudi the best option.
Dataflow will be able to handle the volume you have requested. It can unify your streaming and batch workloads, keeping it easy to migrate and re-use code from batch to streaming.
When using Hudi, you may need to choose different tools for data processing. You should also check other Google Cloud Platform tools like [Cloud Data Fusion and Dataproc etc.
What are your sources? Is the data going to be transformed and loaded to BQ?
If you want to go cloud-native, then Dataflow could be an option. GCP Databricks+delta lake is another option you could consider. But a lot will depend on what sources you are extracting data from