Hi, I have a table with many columns: A, B, C,...
How can I split my table to smaller tables which have same value of A.
I don't want to loop over the table to export smaller tables base on each A values, because it will increase the cost.
Thank you.
Here's how you can split a large table in BigQuery into smaller tables based on the values of column A, along with explanations and considerations to optimize this process.
There are a couple of efficient ways to do this in BigQuery without the need for explicit looping:
1. Direct Creation with Queries
This is the most straightforward approach if you know the distinct values of column A in advance:
-- Create 'table_A_value1' based on the value 'value1' in column A
CREATE OR REPLACE TABLE `project_id.dataset_id.table_A_value1`
AS
SELECT *
FROM `project_id.dataset_id.original_table`
WHERE A = 'value1';
-- Create 'table_A_value2'
CREATE OR REPLACE TABLE `project_id.dataset_id.table_A_value2`
AS
SELECT *
FROM `project_id.dataset_id.original_table`
WHERE A = 'value2';
-- ... and so on
project_id
, dataset_id
, original_table
, and the values ('value1', 'value2', etc.) with your actual project, dataset, table, and values from column A.2. Dynamic SQL for Unknown A Values
Use this method if you don't know all the distinct values in column A beforehand:
DECLARE query_string STRING;
DECLARE values ARRAY<STRING>;
-- Get distinct values from column A
SET values = (
SELECT ARRAY_AGG(DISTINCT A)
FROM `project_id.dataset_id.original_table`
);
-- Build dynamic SQL queries
FOR value IN (SELECT * FROM UNNEST(values)) DO
SET query_string = CONCAT(
"CREATE OR REPLACE TABLE `project_id.dataset_id.table_A_", value, "` AS
SELECT * FROM `project_id.dataset_id.original_table` WHERE A = '", value, "';"
);
EXECUTE IMMEDIATE query_string;
END FOR;
Important Considerations
Thanks for the reply, it helped me with a dynamic query I was writing at BigQuery, just one quick fix.
Using value in the dynamic query_string, generated this error:
Unable to coerce type STRUCT<f0_ STRING> to expected type STRING Signature CONCAT<BYTES, [BYTES, ...])
It looks like iterating the un-nested values doesn't give us a string but a struct, so I fixed it using value[0] instead of just value.