Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Dataform Incremental Errors

I created an incremental Dataform code that when I run as not incremental, it runs so smooth and perfect but when I run it as incremental, I get errors. 

Please see code below and what could be the cause of the error

config {
  type: "incremental",
  schema: "dataform_production",
  name: "ga_adobe_combined",
  tags: ["visits_lk"],
  uniqueKey: ["pk"],
  bigquery: {
    partitionBy: "visit_date",
    clusterBy: ["pk"]
  }
}

with adobe_ as (
  SELECT
  GENERATE_UUID() AS pk,
  "AA" as data_source,
  date as visit_date,
  country,
  site_reconfigured as site,
  platform,
  operating_system AS device_operating_system,
  adobe.mobile_device_type AS device_category,
  sum(page_views) as pageviews,
  sum(visits) as visits,
  sum(content_starts) AS video_views,
  sum(content_time_spent) AS total_time_spent
FROM ${ref('visits_adobe_clean')} adobe
WHERE adobe.date < '2021-06-01'
group by 1,2,3,4,5,6,7,8
)
,

ega AS (
SELECT
  GENERATE_UUID() AS pk,
  "GA" as data_source,
  date(visit_start) as visit_date,
  country,
   site,
  platform,
  device_operating_system,
  device_category,
  sum(pageview_cnt) as pageviews,
  count(distinct visit_id) as visits,
  sum(video_view_cnt) as video_views,
  sum(visit_length_sec) AS total_time_spent
FROM ${ref('visits_ga')} visits
WHERE visits.visit_start > '2021-06-01'
${when(incremental(), `AND DATE(visit_start) > COALESCE((SELECT MAX(DATE(visit_start)) FROM ${self()}), DATE('2021-01-01'))`)}
group by 1,2,3,4,5,6,7,8)

SELECT * FROM adobe_
UNION ALL
SELECT * FROM ega

whenever I run as incremental, I get the error below, I expected incremental runs to use lesser computation, so why am I having this error?
reason:"invalidQuery" location:"query" message:"Resources exceeded during query execution: The query could not be executed in the allotted memory. Peak usage: 160% of limit.\nTop memory consumer(s):\n JOIN operations: 100%\n. at [cfc-2017-site-login.dataform_production.ga_adobe_combined_procedure:4:15]": invalid argument

Please help @ms4446 

Solved Solved
0 1 606
1 ACCEPTED SOLUTION

Hi @francisatoyebi ,

The error you are getting indicates that the query is using too much memory and is being rejected by BigQuery. This could be due to several reasons:

  • The two subqueries you are combining with UNION ALL are producing large result sets.
  • The aggregation functions and grouping in your query are causing BigQuery to do a lot of work.

To reduce the memory usage of your query, you can try the following:

  • Reduce the size of the result sets by filtering out rows that you don't need. You are already filtering by date, but additional filtering might be possible depending on your requirements.
  • Consider breaking down the query into smaller parts or using intermediate tables to handle the complexity.
  • Review the grouping and aggregation logic to ensure that it's optimized for your specific use case.

In your specific code, there is no join operation, so suggestions related to join conditions are not applicable. Also, replacing the sum() function with the count() function would likely produce incorrect results, so that suggestion may not be appropriate for this query.

The error message is pointing to a memory issue, so careful analysis and optimization of the query will be needed to resolve it.

View solution in original post

1 REPLY 1

Hi @francisatoyebi ,

The error you are getting indicates that the query is using too much memory and is being rejected by BigQuery. This could be due to several reasons:

  • The two subqueries you are combining with UNION ALL are producing large result sets.
  • The aggregation functions and grouping in your query are causing BigQuery to do a lot of work.

To reduce the memory usage of your query, you can try the following:

  • Reduce the size of the result sets by filtering out rows that you don't need. You are already filtering by date, but additional filtering might be possible depending on your requirements.
  • Consider breaking down the query into smaller parts or using intermediate tables to handle the complexity.
  • Review the grouping and aggregation logic to ensure that it's optimized for your specific use case.

In your specific code, there is no join operation, so suggestions related to join conditions are not applicable. Also, replacing the sum() function with the count() function would likely produce incorrect results, so that suggestion may not be appropriate for this query.

The error message is pointing to a memory issue, so careful analysis and optimization of the query will be needed to resolve it.