[Analytic Block] Partitioned Date Filters in BigQuery

BigQuery is a scanning database, which means it scans the entire table for the columns referenced in the query.

Because BigQuery elastically scales up compute power as needed, queries never really get slow, but they can get expensive if you scan really big tables.

To cut down on cost (both computational and financial), BigQuery lets you partition large tables by date so that instead of scanning the entire table, a query can scan just a day or multiple days worth of data. To do this, BigQuery introduces a pseudo-column _PARTITIONTIME and when this column is used in a WHERE clause, BigQuery limits the amount of data scanned.

_PARTITIONTIME can be used in any part of the query and will just return the DATE of the partition as a type TIMESTAMP. (It will not return the timestamp of the individual row, since _PARTITIONTIME is always a date).

Making use of _PARTITIONTIME

So, _PARTITIONTIME is a powerful tool for preventing unnecessarily expensive queries. The problem is, you’d normally have to teach everyone querying BigQuery to remember to use partitions in every query where it’s applicable.

LookML solves that problem. Instead of teaching each person, you can teach the pattern to Looker once and never worry about it again.

Someone querying a partitioned table really doesn’t care about partitioning, they want it all to work seamlessly, setting date filters in the background for them. Here’s a Looker Block to make that happen.

Modeling in LookML

Let’s assume we are querying a partitioned table called logs that has events. The code below will present a single field group called Event Date and make sure there is always a filter on _PARTITIONTIME to prevent accidentally over-expensive queries.

explore: logs {
  # Make sure there is always a filter on event_date, event_week, event_month or event_year
  # Default to the last complete day of data
  conditionally_filter: {
    filters: {
      field: logs.event_date
      value: "1 days ago for 1 day"
    }
  }

view: logs {
  # Combine the partition date filters and the time filters into a single field group.
  dimension_group: event {
    type: time
    timeframes: [date,week,month,year]
    sql: _PARTITIONTIME ;;
    # NOTE: for manually partitioned files use code below
    # sql: TIMESTAMP(PARSE_DATE('%Y%m%d', REGEXP_EXTRACT(_TABLE_SUFFIX,r'\d\d\d\d\d\d\d\d'))) ;;
  }

  # define a different set of dimensions, but have them shown together in the Explorer
  dimension_group: _event {
    label: "Event"
    type: time
    timeframes: [time,minute10, minute5, minute30, hour,hour_of_day,raw]
    sql: ${TABLE}.event_time;;
    datatype: epoch
  }

  ...
}

NOTE: BigQuery Manually Partitioned Files (Table Wildcards).

BigQuery also supports manually partitioned tables. Tables are selected using a wildcard (*) in the table name of the from statement and _TABLE_SUFFIXin much the same way _PARTITION_TIME is used. _TABLE_SUFFIX returns the string that matched the table wild card. Change the sql: for the event dimension_group to make this pattern work with manually partitioned files…

Read more about BigQuery table Wildcards
https://cloud.google.com/bigquery/docs/querying-wildcard-tables

9 14 3,024
14 REPLIES 14
Top Labels in this Space
Top Solution Authors