Greetings, my name is Farizi, I am a data engineer, currently, I have tried the new feature from pub/sub which is to write directly to bigquery, but with no success, my current flow is :
So far, Datastream Avro files are already stored into GCS, and I have defined the schema manually for the pub/sub topic itself, and I have created bigquery table too. Still, the data itself is not available in the table, even though I have chosen to use the topic schema inside the subscription.
Is there any documentation to use this new feature if we want to use datastream together? please answer ASAP because this feature is excellent and can replace the dataflow ability to stream into big query.
Solved! Go to Solution.
Hi Farizi,
I might be missing something from your description, but Pub/Sub notifications from GCS only carry the path to the file in GCS, not the actual payload. To get the data into BQ, you need a Dataflow job to process the data from GCS and load it into BQ. There's a detailed guide to accomplish this, available here: https://cloud.google.com/datastream/docs/implementing-datastream-dataflow-analytics
Etai
Hi Farizi,
I might be missing something from your description, but Pub/Sub notifications from GCS only carry the path to the file in GCS, not the actual payload. To get the data into BQ, you need a Dataflow job to process the data from GCS and load it into BQ. There's a detailed guide to accomplish this, available here: https://cloud.google.com/datastream/docs/implementing-datastream-dataflow-analytics
Etai
ah that's a shame, I thought this feature can replace dataflow, but thanks for the quick reply!
Hi etaim, I've already followed the tutorial that you give to me, and here is the new flow:
The data is successfully loaded in the bigquery dataset, but all of the data types have been converted into a string in the bigquery column, note that the datastream output is avro files in GCS, is there any step that I have missed to keep the bigquery schema the same as the source?
Hi Farizi, doesn't sound like you missed any step, but maybe something wasn't done exactly right? Off the top of my head I can't think of a reason why this would happen. If you can, I suggest opening a support ticket so that we can properly investigate.
Etai