Announcements
This site is in read only until July 22 as we migrate to a new platform; refer to this community post for more details.
Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Google Dataform when using pre_operations runs query twice => 2х cost?

Hi!

I have a rather simple dataform query
After i manually run there are 2 queries (not ONE!!!) in INFORMATION_SCHEMA.JOBS which cost me money both. WHY? It doubles cost on data preparation all over the project. 


test.sqlx:


config {

    type: "table",
}

pre_operations {
  DECLARE
    destination_date DEFAULT ( SELECT
    MAX(PARSE_DATE('%Y%m%d', partition_id)) AS latest_partition_date
    FROM
    ${schema()}.INFORMATION_SCHEMA.PARTITIONS
    WHERE
    table_name = 'k_prod_daily'
    and partition_id !='__NULL__')
}

  SELECT
    'k_prod_daily' as table_name,
    SUM(sold_money) as sum_value,
    CURRENT_DATE() as execution_date
  FROM ${ref("k_prod_daily")}
  WHERE date_time > DATE_SUB(destination_date, INTERVAL 7 DAY)



Log sample in JSON:
 
[{
  "Billed_GB": "0.332398592",
  "creation_time": "2024-03-25 19:24:19.320000 UTC",
  "project_id": "bionic-rock-416819",
  "project_number": "842766588736",
  "user_email": "service-842766588736@gcp-sa-dataform.iam.gserviceaccount.com",
  "job_id": "dataform-gcp-71cda78f-63ad-4cd0-9671-1cec4f73104e",
  "job_type": "QUERY",
  "statement_type": "SCRIPT",
  "priority": "INTERACTIVE",
  "start_time": "2024-03-25 19:24:19.360000 UTC",
  "end_time": "2024-03-25 19:24:23.585000 UTC",
  "query": "    BEGIN\n  CREATE SCHEMA IF NOT EXISTS `bionic-rock-416819.kzx` OPTIONS(location\u003d\"US\");\nEXCEPTION WHEN ERROR THEN\n  IF NOT CONTAINS_SUBSTR(@@error.message, \"already exists: dataset\") AND\n    NOT CONTAINS_SUBSTR(@@error.message, \"too many dataset metadata update operations\") AND\n    NOT CONTAINS_SUBSTR(@@error.message, \"User does not have bigquery.datasets.create permission\")\n  THEN\n    RAISE USING MESSAGE \u003d @@error.message;\n  END IF;\nEND;\n    BEGIN\n      DECLARE dataform_table_type DEFAULT (\n  SELECT ANY_VALUE(table_type)\n  FROM `bionic-rock-416819.kzx.INFORMATION_SCHEMA.TABLES`\n  WHERE table_name \u003d \u0027test_2\u0027\n);\n          IF dataform_table_type IS NOT NULL THEN\n      IF dataform_table_type \u003d \u0027VIEW\u0027 THEN DROP VIEW IF EXISTS `bionic-rock-416819.kzx.test_2`;\nELSEIF dataform_table_type \u003d \u0027MATERIALIZED VIEW\u0027 THEN DROP MATERIALIZED VIEW IF EXISTS `bionic-rock-416819.kzx.test_2`;\nEND IF;\n    END IF;\n      BEGIN\n        DECLARE\n    destination_date DEFAULT ( SELECT\n    MAX(PARSE_DATE(\u0027%Y%m%d\u0027, partition_id)) AS latest_partition_date\n    FROM\n    kzx.INFORMATION_SCHEMA.PARTITIONS\n    WHERE\n    table_name \u003d \u0027k_prod_daily\u0027\n    and partition_id !\u003d\u0027__NULL__\u0027);\n          CREATE OR REPLACE TABLE `bionic-rock-416819.kzx.test_2`\n  \n  \n  OPTIONS()\n  AS (\n    \n\n\n\n\n\n\n\nSELECT \n  \u0027k_prod_daily\u0027 as table_name,\n  SUM(sold_money) as sum_value,\n  CURRENT_DATE() as execution_date\nFROM `bionic-rock-416819.kzx.k_prod_daily`\nWHERE date_time \u003e DATE_SUB(destination_date, INTERVAL 7 DAY)\n  );\n        \n      END;\n    END;",
  "state": "DONE",
  "reservation_id": null,
  "total_bytes_processed": "332396936",
  "total_slot_ms": "2697",
  "error_result": null,
  "cache_hit": null,
  "destination_table": null,
  "referenced_tables": [],
  "labels": [],
  "timeline": [],
  "job_stages": [],
  "total_bytes_billed": "332398592",
  "transaction_id": null,
  "parent_job_id": null,
  "session_info": null,
  "dml_statistics": null,
  "total_modified_partitions": null,
  "bi_engine_statistics": null,
  "query_info": {
    "resource_warning": null,
    "optimization_details": null,
    "query_hashes": {
      "normalized_literals": null
    },
    "performance_insights": null
  },
  "transferred_bytes": null,
  "materialized_view_statistics": null,
  "job_creation_reason": {
    "code": "REQUESTED"
  }
}, {
  "Billed_GB": "0.311427072",
  "creation_time": "2024-03-25 19:24:21.528000 UTC",
  "project_id": "bionic-rock-416819",
  "project_number": "842766588736",
  "user_email": "service-842766588736@gcp-sa-dataform.iam.gserviceaccount.com",
  "job_id": "script_job_706f5d38d8c2a5461e05e1b2ab383130_3",
  "job_type": "QUERY",
  "statement_type": "CREATE_TABLE_AS_SELECT",
  "priority": "INTERACTIVE",
  "start_time": "2024-03-25 19:24:21.779000 UTC",
  "end_time": "2024-03-25 19:24:23.244000 UTC",
  "query": "CREATE OR REPLACE TABLE `bionic-rock-416819.kzx.test_2`\n  \n  \n  OPTIONS()\n  AS (\n    \n\n\n\n\n\n\n\nSELECT \n  \u0027k_prod_daily\u0027 as table_name,\n  SUM(sold_money) as sum_value,\n  CURRENT_DATE() as execution_date\nFROM `bionic-rock-416819.kzx.k_prod_daily`\nWHERE date_time \u003e DATE_SUB(destination_date, INTERVAL 7 DAY)\n  )",
  "state": "DONE",
  "reservation_id": null,
  "total_bytes_processed": "311425416",
  "total_slot_ms": "2368",
  "error_result": null,
  "cache_hit": "false",
  "destination_table": {
    "project_id": "bionic-rock-416819",
    "dataset_id": "kzx",
    "table_id": "test_2"
  },
  "referenced_tables": [{
    "project_id": "bionic-rock-416819",
    "dataset_id": "kzx",
    "table_id": "k_prod_daily"
  }],
  "labels": [],
  "timeline": [{
    "elapsed_ms": "718",
    "total_slot_ms": "2368",
    "pending_units": "0",
    "completed_units": "63",
    "active_units": "0",
    "estimated_runnable_units": null
  }, {
    "elapsed_ms": "669",
    "total_slot_ms": "2368",
    "pending_units": "0",
    "completed_units": "63",
    "active_units": null,
    "estimated_runnable_units": "0"
  }],
  "job_stages": [{
    "name": "S00: Input",
    "id": "0",
    "start_ms": "1711394662005",
    "end_ms": "1711394662188",
    "input_stages": [],
    "wait_ratio_avg": "0.52212389380530977",
    "wait_ms_avg": "118",
    "wait_ratio_max": "0.69026548672566368",
    "wait_ms_max": "156",
    "read_ratio_avg": "0.026548672566371681",
    "read_ms_avg": "6",
    "read_ratio_max": "0.092920353982300891",
    "read_ms_max": "21",
    "compute_ratio_avg": "0.12389380530973451",
    "compute_ms_avg": "28",
    "compute_ratio_max": "0.21238938053097345",
    "compute_ms_max": "48",
    "write_ratio_avg": "0.0044247787610619468",
    "write_ms_avg": "1",
    "write_ratio_max": "0.097345132743362831",
    "write_ms_max": "22",
    "shuffle_output_bytes": "542",
    "shuffle_output_bytes_spilled": "0",
    "records_read": "19502937",
    "records_written": "62",
    "parallel_inputs": "62",
    "completed_parallel_inputs": "62",
    "status": "COMPLETE",
    "steps": [{
      "kind": "READ",
      "substeps": ["$2:sold_money, $1:date_time", "FROM bionic-rock-416819.kzx.k_prod_daily", "WHERE greater($1, 19799)"]
    }, {
      "kind": "AGGREGATE",
      "substeps": ["$30 :\u003d SUM($2)"]
    }, {
      "kind": "WRITE",
      "substeps": ["$30", "TO __stage00_output"]
    }],
    "slot_ms": "2187",
    "compute_mode": "BIGQUERY"
  }, {
    "name": "S01: Output",
    "id": "1",
    "start_ms": "1711394662222",
    "end_ms": "1711394662412",
    "input_stages": ["0"],
    "wait_ratio_avg": "1.0",
    "wait_ms_avg": "226",
    "wait_ratio_max": "1.0",
    "wait_ms_max": "226",
    "read_ratio_avg": "0.0",
    "read_ms_avg": "0",
    "read_ratio_max": "0.0",
    "read_ms_max": "0",
    "compute_ratio_avg": "0.048672566371681415",
    "compute_ms_avg": "11",
    "compute_ratio_max": "0.048672566371681415",
    "compute_ms_max": "11",
    "write_ratio_avg": "0.79203539823008851",
    "write_ms_avg": "179",
    "write_ratio_max": "0.79203539823008851",
    "write_ms_max": "179",
    "shuffle_output_bytes": "0",
    "shuffle_output_bytes_spilled": "0",
    "records_read": "62",
    "records_written": "1",
    "parallel_inputs": "1",
    "completed_parallel_inputs": "1",
    "status": "COMPLETE",
    "steps": [{
      "kind": "READ",
      "substeps": ["$30", "FROM __stage00_output"]
    }, {
      "kind": "COMPUTE",
      "substeps": ["$10 :\u003d \u0027k_prod_daily\u0027", "$11 :\u003d 19807"]
    }, {
      "kind": "AGGREGATE",
      "substeps": ["$20 :\u003d SUM($30)"]
    }, {
      "kind": "WRITE",
      "substeps": ["$10, $11, $20", "TO __stage01_output"]
    }],
    "slot_ms": "180",
    "compute_mode": "BIGQUERY"
  }],
  "total_bytes_billed": "311427072",
  "transaction_id": null,
  "parent_job_id": "dataform-gcp-71cda78f-63ad-4cd0-9671-1cec4f73104e",
  "session_info": null,
  "dml_statistics": null,
  "total_modified_partitions": null,
  "bi_engine_statistics": null,
  "query_info": {
    "resource_warning": null,
    "optimization_details": null,
    "query_hashes": {
      "normalized_literals": "5324a1afed09e105210f4c343261135515daba9de9f051d4e1176f769c3445a4"
    },
    "performance_insights": null
  },
  "transferred_bytes": "0",
  "materialized_view_statistics": null,
  "job_creation_reason": {
    "code": "REQUESTED"
  }
}]


0 2 2,179
2 REPLIES 2

For this inquiry I would suggest to file a support case for investigation of your project and api usage, as they have a better visibility for these resources. Specially if there is a billing concerned. 

https://cloud.google.com/contact

Its difficult to read the INFORMATION_SCHEMA.JOBS in the post.  Maybe edit it and format it as code or provide a text file attachment or link.  However, a guess ... if we look here, we find the following:

When you query INFORMATION_SCHEMA.JOBS to find a summary cost of query jobs, exclude the SCRIPT statement type, otherwise some values might be counted twice. The SCRIPT row includes summary values for all child jobs that were executed as part of this job.

I notice in what I can see in your output that there does indeed appear to be a multi-statement with statement_type = "SCRIPT".  Could it be that this is a "rollup" of the other items contained within the script?