Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Writing a Dynamic Union Query in BigQuery Using a List of Tables

Hello everyone,

I am looking for a solution to create a dynamic union query in BigQuery using a list of tables. I came across this helpful approach on Stack Overflow, which involves using JavaScript's const to store the list and then using map to build the query.

However, my situation is a bit different. Instead of using a JavaScript const to store the list, I would like to store the list of tables directly in a BigQuery table. The goal is that whenever a new table is inserted into this list table, the union query should automatically incorporate it without any manual intervention.

Could anyone please guide me on how to achieve this dynamic union query in BigQuery using a list of tables stored in another BigQuery table?

Thank you for your assistance!

3 2 7,344
2 REPLIES 2

To achieve a dynamic union query in BigQuery using a list of tables stored in another BigQuery table, you can create a stored procedure in BigQuery that reads the list of tables from your table and dynamically constructs and executes the SQL query.

Here's a high-level overview of the steps:

  1. Create a table that stores the list of tables you want to union. This table could have a single column, say table_name, that stores the fully qualified table name (i.e., project.dataset.table).

  2. Create a stored procedure that reads the list of tables from the table created in step 1, constructs the SQL query, and then executes it.

Here's an example of how you could define such a stored procedure:

CREATE OR REPLACE PROCEDURE `project.dataset.dynamic_union`(OUT sql STRING)
BEGIN
DECLARE tables ARRAY<STRING>;
DECLARE i INT64 DEFAULT 0;
DECLARE full_query STRING DEFAULT "";

-- Get the list of tables from your table
SET tables = (SELECT ARRAY_AGG(table_name) FROM `project.dataset.table_list`);

-- Loop through the list of tables and construct the SQL query
WHILE i < ARRAY_LENGTH(tables) DO
SET full_query = CONCAT(full_query, "SELECT * FROM ", tables[OFFSET(i)]);
IF i < ARRAY_LENGTH(tables) - 1 THEN
SET full_query = CONCAT(full_query, " UNION ALL ");
END IF;
SET i = i + 1;
END WHILE;

SET sql = full_query;
END;

In this stored procedure, project.dataset.table_list is the table that stores the list of tables you want to union. Replace it with your actual table.

To use this stored procedure, you would call it like this:

DECLARE sql STRING;
CALL `project.dataset.dynamic_union`(sql);
EXECUTE IMMEDIATE sql;

This will execute the dynamically constructed SQL query.

@ms4446 , so I'm looking to do this in Dataform on a daily basis and my understanding is you can use the "operations" type in your config that you would then call that? I'm curious if this would have any issues with lets say 20k tables and potentially 100m rows of data as long as the tables had the same schema? I would also assume it can be done incrementally in Dataform as well? I've tried finding all these answers and maybe I'm just not finding the right results. 

Thanks,

Aaron