Is it possible to make sharded table in BigQuery with Partitioning?

Hello!

I have a case when I need to store huge amount of data and I need to partition it by minutes. But BQ supports just 4000 partitions for table but I need some history for this table. Is it possible to make a sharded table and for each shard make a partitioning by minute? For example, I will have tables table_name_20231213 and table_name_20231212 and each of it is partitioned by minute inside?

And also it's strange that you don't have a label for BigQuery...

0 1 518
1 REPLY 1

Hi @Stanislav ,

You're correct that BigQuery limits partitioned tables to a maximum of 4,000 partitions. To manage large datasets that exceed this limit, you can use a combination of sharding and partitioning, but with some important considerations:

Solution:

  1. Sharding by Date:

    • Create separate tables for each day with a naming pattern like table_name_YYYYMMDD. This approach effectively creates daily shards, allowing you to store data beyond the 4,000 partition limit of a single table.
  2. Partitioning Limitations:

    • While you can shard tables by date, it's important to note that BigQuery does not support partitioning by minute. The finest granularity for time-based partitioning in BigQuery is by day.
    • Instead of partitioning by minute within each shard, you can include minute-level information in your table schema and use it in query filters. This approach won't provide the same performance benefits as true partitioning but can help in efficiently querying high-granularity data.

Example:

  • table_name_20231213 and table_name_20231212 are sharded by date but not partitioned by minute. They can include minute-level data in their schema for query purposes.

Benefits and Considerations:

  • Sharding by Date:
    • Helps overcome the 4,000 partition limit.
    • Facilitates long-term data retention across days.
  • Querying High-Granularity Data:
    • While minute-level partitioning is not available, including minute details in the schema allows for effective filtering within each day.
    • This requires careful query design for efficiency.

Additional Points:

  • Managing a large number of sharded tables can be complex. Automation tools and scripts are useful for managing table creation, data ingestion, and querying.
  • BigQuery's functionality for managing and querying across date-sharded tables typically involves using wildcard tables, but it does not convert these into a single partitioned table.