Hi, I am working on a dashboard that is based on BigQuery
there are several conversions I am sending to BigQuery and I am trying to get the session scoped source and medium like analytics provided in the UI, unfortunately when observing the records of my conversion events I see no presence for the session scoped utm's so I am left with null records
is there a solution to this scenraio or a workaround, currently I am measuring things based on first user scoped utm's because I have no other option and it raises a lot of questions when comparing to GA4 UI
thanks in advance 🙂
Solved! Go to Solution.
Dealing with Discrepancies in Direct/None Traffic and Ecommerce Metrics in GA4 BigQuery Queries
Understanding the Challenges:
Strategies to Address These Issues:
1. Understanding Direct/None Traffic
2. Refining Query for Direct/None Traffic
CASE WHEN traffic_source.source IS NULL OR traffic_source.source = '' THEN 'direct' ELSE traffic_source.source END AS source,
CASE WHEN traffic_source.medium IS NULL OR traffic_source.medium = '' THEN '(none)' ELSE traffic_source.medium END AS medium
3. Handling ecommerce.purchase_revenue
4. Query Optimization for Additional Metrics
5. Testing and Validation
6. Advanced Debugging
Step 1: Construct a new table in BigQuery that captures the session-scoped source and medium for each session:
SELECT session_id,
FIRST_VALUE(traffic_source) OVER (PARTITION BY session_id ORDER BY event_timestamp ASC) AS session_scoped_source,
FIRST_VALUE(traffic_medium) OVER (PARTITION BY session_id ORDER BY event_timestamp ASC) AS session_scoped_medium
FROM `[YOUR_PROJECT_ID].[YOUR_DATASET_ID].[YOUR_EVENT_TABLE_ID]`
Step 2: Merge the conversion event table with the newly created session-scoped source and medium table using the session_id column.
Step 3: Incorporate the session_scoped_source and session_scoped_medium columns in your dashboard to evaluate the efficacy of your conversions based on session-scoped source and medium.
For instance:
SELECT event_name,
COUNT(*) AS conversion_count,
session_scoped_source,
session_scoped_medium
FROM `[YOUR_PROJECT_ID].[YOUR_DATASET_ID].[YOUR_EVENT_TABLE_ID]`
JOIN `[YOUR_PROJECT_ID].[YOUR_DATASET_ID].[YOUR_SESSION_SCOPED_SOURCE_MEDIUM_TABLE_ID]`
ON `[YOUR_PROJECT_ID].[YOUR_DATASET_ID].[YOUR_EVENT_TABLE_ID]`.session_id = `[YOUR_PROJECT_ID].[YOUR_DATASET_ID].[YOUR_SESSION_SCOPED_SOURCE_MEDIUM_TABLE_ID]`.session_id
GROUP BY event_name,
session_scoped_source,
session_scoped_medium
event_name | conversion_count | session_scoped_source | session_scoped_medium |
---|---|---|---|
purchase | 100 | organic search | |
purchase | 50 | social media | |
signup | 200 | direct | (direct) / (none) |
Always test the queries on a subset of your data initially to ensure accuracy and optimal performance.
Hi, thank you very much, I will definitely try this workaround
I tried it and I did manage to generate a report but I still got a lot of difference when comparing my data set to the exploration report in GA4, any suggestions?
There are a few possible reasons why you might be seeing differences between your BigQuery report and the exploration report in GA4:
To troubleshoot the differences between your two reports, you can try the following:
Here are some additional tips for comparing your BigQuery report to the exploration report in GA4:
hey,
I am facing a similar issue and I tried many different workarounds to fetch the required results, however it does not work properly. The main issues is that it does not gets direct/ none data correctly and additionally if you add an extra metric such as ecommerce.purchase_revenue the results also get all mixed up. Is there a way to fix these?
Dealing with Discrepancies in Direct/None Traffic and Ecommerce Metrics in GA4 BigQuery Queries
Understanding the Challenges:
Strategies to Address These Issues:
1. Understanding Direct/None Traffic
2. Refining Query for Direct/None Traffic
CASE WHEN traffic_source.source IS NULL OR traffic_source.source = '' THEN 'direct' ELSE traffic_source.source END AS source,
CASE WHEN traffic_source.medium IS NULL OR traffic_source.medium = '' THEN '(none)' ELSE traffic_source.medium END AS medium
3. Handling ecommerce.purchase_revenue
4. Query Optimization for Additional Metrics
5. Testing and Validation
6. Advanced Debugging
@ms4446 Hi, I have a question about "2. Refining Query for Direct/None Traffic".
In my understanding, "traffic_source" does not provide event-scoped traffics, but it provides user-scoped traffics, which provides where the user first came from on the website.
Instead, if I am right, we should use "collected_traffic_source", which is a newly added record on June 2023, or values in parameters like "campaign", "source" or "medium" in event_params record before June 2023 to fetch event-scoped traffics. This is supported in other articles like below.
https://mu-kong.medium.com/get-session-campagin-from-ga4-exported-bigquery-data-fcb568afc44
https://tanelytics.com/ga4-bigquery-session-traffic_source/
I am concerned if my understanding is wrong, so please let me know the correct way to get session-scoped traffics when you finds out.
Thanks.
Your understanding is correct, and your approach aligns with the evolving nature of GA4's data schema and how session-scoped traffic sources are captured in BigQuery.
Initial Limitations:
traffic_source
fields). This shows the first source a user arrived from.Updates and Enhancements:
collected_traffic_source
field. This provides a more accurate way to understand traffic sources for individual sessions or events.Using collected_traffic_source
Event-Scoped Traffic Before the Update:
event_params
to extract event-scoped traffic source information. This involved parsing parameters like campaign, source, or medium. It required more work, but was the main method before collected_traffic_source
.Your Approach
collected_traffic_source
field.event_params
.Example Queries:
Using collected_traffic_source
:
SELECT
event_name,
event_timestamp,
collected_traffic_source.source,
collected_traffic_source.medium,
collected_traffic_source.campaign
FROM `your_project.your_dataset.events_*`
WHERE _TABLE_SUFFIX BETWEEN 'YYYYMMDD' AND 'YYYYMMDD'
Extracting event-scoped traffic from event_params
:
SELECT
event_name,
event_timestamp,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'source') AS source,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'medium') AS medium,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'campaign') AS campaign
FROM `your_project.your_dataset.events_*`
WHERE _TABLE_SUFFIX BETWEEN 'YYYYMMDD' AND 'YYYYMMDD'
Your approach of using collected_traffic_source
for session-scoped traffic analysis and event_params
(for data before the update) is accurate and recommended for detailed traffic source analysis in GA4 BigQuery exports.
I am relieved to hear that my approach is accurate and is a recommended way. Thank you for your quick and detailed answer.
Hi @ms4446 ,
I have tried to count the sessions using
collected_traffic_source.manual_source,
collected_traffic_source.manual_medium,
COUNT(DISTINCT user_pseudo_id, (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id'))) sessions_count
on my dataset for event period after June 2023 but the result somehow still have huge discrepancy with the sessions source/medium in GA4 Interface. I have tried to use traffic_source.source and traffic_source.medium but the result still far different especially for Google / Organic and Direct . What is the best practice to get the nearest result as in GA4?
Hello,
Is there a way to get the ecommerce.purchase_revenue and ecommerce.purchase_revenue_in_usd values for non-null source, mediums like google, facebook etc?
Tried using both collected_traffic_source and event_params methods above but all total_revenue and revenue_usd seems to be tagged to null source, medium.
Sample of simple sql below.
SELECT
((SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'source')) AS source,
((SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'medium')) AS medium,
SUM(ecommerce.purchase_revenue) AS revenue,
SUM(ecommerce.purchase_revenue_in_usd) AS revenue_usd
FROM
`datasource`
GROUP BY
source,
medium
Results look like example below
owsourcemediumrevenuerevenue_usd
1 | null | null | 1000 | 10 |
2 | google | cpc | null | null |
3 | adroll | Email | null | null |
4 | Instagram | display | null | null |
5 | google | organic | null | null |
Thank you in advance!
j
The issue you're encountering, where ecommerce.purchase_revenue
and ecommerce.purchase_revenue_in_usd
are primarily associated with null sources and mediums, highlights a challenge in accurately attributing revenue to specific traffic sources within your GA4 BigQuery data. Here's why this happens:
Refined Approach for Attribution
To get better attribution, focus on session-level or user-level data when analyzing purchase events. Here's what to do:
collected_traffic_source
(Post-June 2023 Data): This field gives more accurate session-level attribution for data collected after June 2023.Example Query
Here's a query demonstrating how to attribute revenue to sessions based on traffic source (assuming you can link purchase events to sessions):
WITH SessionTraffic AS (
SELECT
session_id,
MAX(IF(key = 'source', value.string_value, NULL)) AS source,
MAX(IF(key = 'medium', value.string_value, NULL)) AS medium
FROM `datasource`,
UNNEST(event_params)
WHERE event_name = 'session_start'
GROUP BY session_id ),
PurchaseRevenue AS (
SELECT
session_id,
SUM(ecommerce.purchase_revenue) AS revenue,
SUM(ecommerce.purchase_revenue_in_usd) AS revenue_usd
FROM `datasource`
WHERE event_name = 'purchase'
GROUP BY session_id )
SELECT
st.source,
st.medium,
SUM(pr.revenue) AS revenue,
SUM(pr.revenue_usd) AS revenue_usd
FROM SessionTraffic st
JOIN PurchaseRevenue pr
ON st.session_id = pr.session_id
GROUP BY st.source, st.medium
This is amazing.
Tested it out and while a bulk are still attributed to null (around 60-70%), I can see other source/mediums showing non-zero/null values for revenue now.
Thank you so much!
I tried all the alternatives above and the fields still contain incorrect data in relation to the GA4 interface, especially Google CPC which does not appear, does anyone know how to solve this?