Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Assistance Needed with GA4 BigQuery Session Count by Hotel

 

I hope this message finds you well. I am currently facing a challenge with Google Analytics 4 data in BigQuery and thought the community might be able to provide some guidance or direct me to someone who can assist.

I am trying to calculate a total session count by hotel using the following SQL statement:

 

sql
Copy code
COUNT(DISTINCT CONCAT(user_pseudo_id, (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id'))) AS Total_sessions

 

However, I'm encountering an issue where the session counts are being inflated due to how I am using the UNNEST function in the FROM clause of my query, specifically:

 

sql
Copy code
FROM `XXXXX-XXXXXXXXX-xxxxx.xxxxxxxxx_xxxxxxxxx.xxxxx_*`, UNNEST(items)

This issue arises when I attempt to extract item_name while also counting sessions, which results in inflated numbers due to the unnesting. Despite attempting various solutions, including CTEs and subqueries, I haven't been able to resolve this issue. The methods I've tried all stem from the GA4 BigQuery schema and not the GA4 UI, and unfortunately, advice from various online communities has not yet yielded a solution.

Given the complexity of this issue, I am open to engaging an expert on a consultancy basis. If you have the bandwidth to assist or can recommend someone who specializes in such queries, I would greatly appreciate it. Please let me know your thoughts or if you need any further information to understand the problem better.

btw,,,The table is already Unnested

 

KeithConroy_0-1714405605590.png

 

1 1 191
1 REPLY 1

Hi @KeithConroy I understand how dealing with inflated session counts can be a challenge when working with GA4 data in BigQuery. This typically occurs because unnesting the items array duplicates rows for each item within a session, which distorts aggregate metrics like session counts.

Here’s how you can resolve this issue:

Solution: Aggregate Sessions Before Unnesting

Step 1: Calculate Sessions Without Unnesting

To avoid duplication caused by the items field, first calculate the total session count at the session level. For instance:

 

 
WITH sessions AS (
SELECT user_pseudo_id,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'hotel_id') AS hotel_id
FROM `XXXXX-XXXXXXXXX-xxxxx.xxxxxxxxx_xxxxxxxxx.xxxxx_*`
)
SELECT hotel_id,
COUNT(DISTINCT CONCAT(user_pseudo_id, session_id)) AS total_sessions
FROM sessions
GROUP BY hotel_id;

Step 2: Join with Unnested Items

Once the session counts are accurate, you can join this result with the unnested items data if you need deeper item-level insights:

 

 
WITH sessions AS (
SELECT user_pseudo_id,
(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'hotel_id') AS hotel_id
FROM `XXXXX-XXXXXXXXX-xxxxx.xxxxxxxxx_xxxxxxxxx.xxxxx_*`
)
SELECT s.hotel_id,
COUNT(DISTINCT CONCAT(s.user_pseudo_id, s.session_id)) AS total_sessions,
i.item_name,
i.item_revenue_in_usd
FROM sessions s
JOIN `XXXXX-XXXXXXXXX-xxxxx.xxxxxxxxx_xxxxxxxxx.xxxxx_*`, UNNEST(items) AS i
ON CONCAT(s.user_pseudo_id, s.session_id) = CONCAT(i.user_pseudo_id, i.session_id)
GROUP BY s.hotel_id, i.item_name, i.item_revenue_in_usd;

General Recommendations

  • Avoid Direct Unnesting for Aggregations: Always calculate key metrics like session counts before unnesting arrays to prevent duplication.
  • Test with Smaller Datasets: Use a limited date range to validate your query and optimize performance before scaling up.

Alternative Approach

If the process feels too complex, you might want to explore Windsor.ai as a more streamlined solution. Windsor.ai simplifies GA4 data integration with BigQuery, allowing you to aggregate and analyze metrics without complex queries.

Hope this helps!