Hello all,
I have files that resides on GCS folder. I need to check if certain numbers of files were uploaded and haven't been proccessed yet (based on a BigQuery table). Once those files were uploaded, I need to copied them instantly to another GCS bucket, and then write a new line to a BigQuery table. What is the best approach to address this use case? My initial thought were to use PubSub/CloudFunctions or combination of both. Can I use DataFlow for this use case? Python is the preffered langauge.
What do you think?
Hi Tzachi_Israel,
Welcome to the Google Cloud Community!
It seems you're looking to automate file uploads in Google Cloud Storage, track their status in BigQuery, and move them to another bucket once processed. Both Pub/Sub + Cloud Functions and Dataflow can handle this. Here are some potential approaches to meet your needs, along with a breakdown of the options and recommendations:
Pub/Sub + Cloud Function Approach:
DataFlow Approach:
Both Pub/Sub + Cloud Functions and Dataflow can achieve your goal, but the best choice depends on the scale and complexity of your system. I recommend starting with Pub/Sub + Cloud Functions for simplicity and ease of use, especially if you're processing a moderate number of files. If you expect the volume of files to grow or need more complex processing, Dataflow could be a more scalable solution.
I hope the above information is helpful.
User | Count |
---|---|
2 | |
1 | |
1 | |
1 |