Real time streaming of analytics table in BQ to Pub/Sub

Hi, 

Am currently working on a POC. We are leveraging Google analytics for our website and capturing the traffic in GA4. We setup a connection to stream the real time data from GA4 to BQ. Now we would like to stream the same data from BQ to Pub/Sub in real time. 

I tried enabling the CDC events over the BQ Analytics table to publish an event to pub/sub topic and created a cloud function with event trigger on analytics table with append rows. Nothing is worked. 

@ms4446

The idea is to stream the data from GA4  ---> BQ Table --> Pub/Sub in real time. Need assistance to accomplish this @ms4446 

0 3 569
3 REPLIES 3

While BigQuery can be the target of CDC processing (changes in an up-stream database propagated to BigQuery), BigQuery is not (yet) a great streaming source of CDC records.   I am not familiar with any streaming "change detection" currently available in BigQuery.   About the only thing I can immediately think of would be a scheduled Cloud Function that executes the BigQuery APPENDS() Table Valued Function to determine what has been appended/inserted into the table since the last invocation.  The results from this can then be used to post new messages to Pub/Sub.

In your original post, you said

I tried enabling the CDC events over the BQ Analytics table to publish an event to pub/sub topic and created a cloud function with event trigger on analytics table with append rows. Nothing is worked

Can you elaborate on what you tried and where you read about it?

Hi @kolban

Approach 1:

I tried creating a cloud function triggered on BigQuery with following eventarc trigger config

EventProvider: BigQuery

Event: google.cloud.bigquery.storage.v1.BigQueryWrite.AppendRows

Resource:Specific Resource --> gcpfpai.analytics_406317890.events_intraday_20240102 

I've used the above configuration by referring to documentation to trigger my cloud function where am writing the code to fetch the newly appended rows. 

Approach 2: I execute the following bash command to automatically refresh the data in materialized from my analytics table which in turn publish an event to pub/sub topic still this didn't worked as expected

Create Materialize View & Enable CDC Events
# Set project ID and GA4 property ID
project_id="xxxxx"
ga4_property_id="YYYYY"


# Define the CDC stream topic
stream_topic="projects/gcpfpai/topics/ReadRealTimev4"

# Enable BigQuery as the sink for the CDC stream
bq update --destination_dataset="analytics_406317890" \
--destination_table="events_intraday_20240102_view" \
--write_disposition=WRITE_APPEND \
--materialized_view=MATERIALIZED_VIEW_FULL \
"projects/gcpfpai/topics/ReadRealTimev4"

# Start streaming data to the separate table
bq query --use_legacy_sql=false \
'CREATE OR REPLACE MATERIALIZED VIEW `gcpfpai.analytics_406317890.events_intraday_20240102_view`
OPTIONS(enable_refresh = true, refresh_interval_minutes = 1) AS
SELECT * FROM `gcpfpai.analytics_406317890.events_intraday_20240102`'

Hey @chittemp 

I’m working on a similar problem and haven’t found a solution for it yet. Were you able to get this working?

The reason I have to use BigQuery as source is because GA4 only exports to bigquery.

I have integrated my Shopify site with GA4 and setup streaming data export from GA4 to bigquery. I wanna make use of this streaming data for a downstream application.

Any other suggestions would be really helpful!