Hello super engineers, I hope you are well!
I have recently created a scheduled upload of CSVs from a SaaS tool to Google Cloud Storage through the bucket feature.
I would like to combine all the CSVs into one (because they share the same schéma) and have them in BigQuery. is that possible?
Stay awesome 🏄♂️
To avoid the duplication of old data, your Cloud Function must be linked to a trigger. For this purpose, the Object finalized
event would be used to monitor when a new file is uploaded to the bucket, as shown in the documentation:
For a function to use a Cloud Storage trigger, it must be implemented as an event-driven function.
You can specify a Cloud Storage trigger when you deploy a function. See Deploy a Cloud Function for general instructions on how to deploy a function, and see below for additional information specific to configuring Cloud Storage triggers during deployment.
If you are deploying using the gcloud CLI, you can use the Cloud Storage Object finalized event type with the following flags:
gcloud functions deploy import2BigQuery \ --trigger-bucket=csvBucket \
You could use this tutorial on how to use Cloud Function from the Google Cloud Community to import your data.
When your CSV bucket is available on Cloud Storage, you then can load this data into BigQuery and store it on a table as it would take all the data stored on your bucket with the same schema that you want to work it; also this method works on many different programming languages.