Hi,
Am currently working on a POC. We are leveraging Google analytics for our website and capturing the traffic in GA4. We setup a connection to stream the real time data from GA4 to BQ. Now we would like to stream the same data from BQ to Pub/Sub in real time.
I tried enabling the CDC events over the BQ Analytics table to publish an event to pub/sub topic and created a cloud function with event trigger on analytics table with append rows. Nothing is worked.
@ms4446
The idea is to stream the data from GA4 ---> BQ Table --> Pub/Sub in real time. Need assistance to accomplish this @ms4446
Solved! Go to Solution.
Hi Praneeth,
Finally, I've materialized this approach due to the limitations with GCP for real time streaming.
1. Created a cloud scheduler to send a message to Pub/Sub topic for every 1 minute.
2. The cloud function is triggered when a new event is published to Pub/Sub Topic.
3. In the cloud function, am running the query on the GA4 streaming table to read the rows where the start time & end time falls within the minute.
4. The identified rows are streamed to the downstream application.
Please let me know if it helps and really appreciate the suggestions.
While BigQuery can be the target of CDC processing (changes in an up-stream database propagated to BigQuery), BigQuery is not (yet) a great streaming source of CDC records. I am not familiar with any streaming "change detection" currently available in BigQuery. About the only thing I can immediately think of would be a scheduled Cloud Function that executes the BigQuery APPENDS() Table Valued Function to determine what has been appended/inserted into the table since the last invocation. The results from this can then be used to post new messages to Pub/Sub.
In your original post, you said
I tried enabling the CDC events over the BQ Analytics table to publish an event to pub/sub topic and created a cloud function with event trigger on analytics table with append rows. Nothing is worked
Can you elaborate on what you tried and where you read about it?
Hi @kolban,
Approach 1:
I tried creating a cloud function triggered on BigQuery with following eventarc trigger config
EventProvider: BigQuery
Event: google.cloud.bigquery.storage.v1.BigQueryWrite.AppendRows
Resource:Specific Resource --> gcpfpai.analytics_406317890.events_intraday_20240102
I've used the above configuration by referring to documentation to trigger my cloud function where am writing the code to fetch the newly appended rows.
Approach 2: I execute the following bash command to automatically refresh the data in materialized from my analytics table which in turn publish an event to pub/sub topic still this didn't worked as expected
Create Materialize View & Enable CDC Events
# Set project ID and GA4 property ID
project_id="xxxxx"
ga4_property_id="YYYYY"
# Define the CDC stream topic
stream_topic="projects/gcpfpai/topics/ReadRealTimev4"
# Enable BigQuery as the sink for the CDC stream
bq update --destination_dataset="analytics_406317890" \
--destination_table="events_intraday_20240102_view" \
--write_disposition=WRITE_APPEND \
--materialized_view=MATERIALIZED_VIEW_FULL \
"projects/gcpfpai/topics/ReadRealTimev4"
# Start streaming data to the separate table
bq query --use_legacy_sql=false \
'CREATE OR REPLACE MATERIALIZED VIEW `gcpfpai.analytics_406317890.events_intraday_20240102_view`
OPTIONS(enable_refresh = true, refresh_interval_minutes = 1) AS
SELECT * FROM `gcpfpai.analytics_406317890.events_intraday_20240102`'
Hey @chittemp
I’m working on a similar problem and haven’t found a solution for it yet. Were you able to get this working?
The reason I have to use BigQuery as source is because GA4 only exports to bigquery.
I have integrated my Shopify site with GA4 and setup streaming data export from GA4 to bigquery. I wanna make use of this streaming data for a downstream application.
Any other suggestions would be really helpful!
hello. We had the exact same problem.
But we solved it by using BigQuery's APPENDS function, Cloud Run Job, and Cloud Scheduler to send it to PubSub once a minute.
Hopefully you've already figured it out, but if you have any more questions, feel free to ask.
Hi
Finally, I've materialized this approach due to the limitations with GCP for real time streaming.
1. Created a cloud scheduler to send a message to Pub/Sub topic for every 1 minute.
2. The cloud function is triggered when a new event is published to Pub/Sub Topic.
3. In the cloud function, am running the query on the GA4 streaming table to read the rows where the start time & end time falls within the minute.
4. The identified rows are streamed to the downstream application.
Please let me know if it helps and really appreciate the suggestions.
Hi Praneeth,
Finally, I've materialized this approach due to the limitations with GCP for real time streaming.
1. Created a cloud scheduler to send a message to Pub/Sub topic for every 1 minute.
2. The cloud function is triggered when a new event is published to Pub/Sub Topic.
3. In the cloud function, am running the query on the GA4 streaming table to read the rows where the start time & end time falls within the minute.
4. The identified rows are streamed to the downstream application.
Please let me know if it helps and really appreciate the suggestions.