Hi,
Am currently working on a POC. We are leveraging Google analytics for our website and capturing the traffic in GA4. We setup a connection to stream the real time data from GA4 to BQ. Now we would like to stream the same data from BQ to Pub/Sub in real time.
I tried enabling the CDC events over the BQ Analytics table to publish an event to pub/sub topic and created a cloud function with event trigger on analytics table with append rows. Nothing is worked.
@ms4446
The idea is to stream the data from GA4 ---> BQ Table --> Pub/Sub in real time. Need assistance to accomplish this @ms4446
While BigQuery can be the target of CDC processing (changes in an up-stream database propagated to BigQuery), BigQuery is not (yet) a great streaming source of CDC records. I am not familiar with any streaming "change detection" currently available in BigQuery. About the only thing I can immediately think of would be a scheduled Cloud Function that executes the BigQuery APPENDS() Table Valued Function to determine what has been appended/inserted into the table since the last invocation. The results from this can then be used to post new messages to Pub/Sub.
In your original post, you said
I tried enabling the CDC events over the BQ Analytics table to publish an event to pub/sub topic and created a cloud function with event trigger on analytics table with append rows. Nothing is worked
Can you elaborate on what you tried and where you read about it?
Hi @kolban,
Approach 1:
I tried creating a cloud function triggered on BigQuery with following eventarc trigger config
EventProvider: BigQuery
Event: google.cloud.bigquery.storage.v1.BigQueryWrite.AppendRows
Resource:Specific Resource --> gcpfpai.analytics_406317890.events_intraday_20240102
I've used the above configuration by referring to documentation to trigger my cloud function where am writing the code to fetch the newly appended rows.
Approach 2: I execute the following bash command to automatically refresh the data in materialized from my analytics table which in turn publish an event to pub/sub topic still this didn't worked as expected
Create Materialize View & Enable CDC Events
# Set project ID and GA4 property ID
project_id="xxxxx"
ga4_property_id="YYYYY"
# Define the CDC stream topic
stream_topic="projects/gcpfpai/topics/ReadRealTimev4"
# Enable BigQuery as the sink for the CDC stream
bq update --destination_dataset="analytics_406317890" \
--destination_table="events_intraday_20240102_view" \
--write_disposition=WRITE_APPEND \
--materialized_view=MATERIALIZED_VIEW_FULL \
"projects/gcpfpai/topics/ReadRealTimev4"
# Start streaming data to the separate table
bq query --use_legacy_sql=false \
'CREATE OR REPLACE MATERIALIZED VIEW `gcpfpai.analytics_406317890.events_intraday_20240102_view`
OPTIONS(enable_refresh = true, refresh_interval_minutes = 1) AS
SELECT * FROM `gcpfpai.analytics_406317890.events_intraday_20240102`'
Hey @chittemp
I’m working on a similar problem and haven’t found a solution for it yet. Were you able to get this working?
The reason I have to use BigQuery as source is because GA4 only exports to bigquery.
I have integrated my Shopify site with GA4 and setup streaming data export from GA4 to bigquery. I wanna make use of this streaming data for a downstream application.
Any other suggestions would be really helpful!
User | Count |
---|---|
4 | |
4 | |
2 | |
1 | |
1 |