Schedule Job from cloud storage to big query

Hi, may I know how and what are the steps to schedule a job that will do merging of datasets from cloud storage to big query?

1 2 210
2 REPLIES 2

What do you mean by merging?  I'm going to guess that what you mean is that when a new file (object) arrives in Google Cloud Storage, then you want logic to run to ingest the newly arrived files into a BigQuery table.  Is that the correct notion?   If yes, then you have many options:

1) You can schedule a LOAD DATA job that runs on a periodic schedule

2) You can trigger a Cloud Function or Cloud Workflow that gets driven by an event.  The event could be the creation of a new object in the Cloud Storage.  When the event occurs, that could immediately trigger the Cloud Function which then immediately loads the data.  The benefit of this is that if the file arrives early, it can be ingested immediately ... and if the file arrives late, you won't be missing the file when the time scheduled job fires.

3) You could potentially create a BigQuery external table that uses a "wildcard" character in the file name that reads the data on demand from the objects in Cloud Storage .... hence you don't actually load the data into BigQuery, you merely reference the external files.

There may be additional options .... let's start with this set.  Review each of these very brief summaries and see if any of them need elaboration.  If yes, post back and include as much detail as possible about what it is you want to achieve and what you have examined so far.

Can you please elaborat further on second option. I have a google storage bucket that holds data of chat transcript in json format. How do I create load job for that?