Hi All,
At my company we are working on moving our production data from MySQL to Bigquery, however it's in a multi-tenant structure. So each customer has their own dataset (schema) and the tables are all the same unless a customer has add-on features that don't come standard. So once in BigQuery we want to leverage Dataform to create new tables for ALL customers for the common tables in each dataset. So things like the Account table or Transactions table will all be common. How can we loop through each dataset to combine these tables into one? Does anyone have documentation or has faced something similar? Any help is appreciated!
Thanks,
Aaron
Hi @aaron_harkins,
Welcome to Google Cloud Community!
You're absolutely right, Dataform is a fantastic tool for this scenario. It provides a powerful way to define and manage your BigQuery data pipelines.
Here's how you can leverage it to create consolidated tables across all your customer datasets in BigQuery:
FOR
loop in standard SQL) or a UNION ALL
SELECT column1, column2, ...
FROM table1
UNION ALL
SELECT column1, column2, ...
FROM table2;
operation to merge the data.Additionally, for large datasets, you might need to consider partitioning, clustering, or other optimization techniques in BigQuery to improve query performance. For more information you can also check Dataform and BigQuery documentation.
I hope the above information is helpful.
Hi @dawnberdan sorry I've been caught up and I missed this...Ok cool, I understand the concept of UNION however when we have 10,000+ customers and adding more each day, the FOR loop is what I guess I was thinking of however i don't have much experience (well not in a while) with that. Curious if you might have an example? Either way, thanks for getting back to me!
Hi @aaron_harkins,
You’re right! When working with an extensive customer base and needing to perform repetitive actions on each customer, utilizing a for loop in programming can be incredibly useful. The exact implementation may differ depending on your chosen programming language.
Here’s an example:
1. If your customer data is stored in a list or dictionary, a for loop is a suitable choice:
# Assuming customer data is stored in a list of dictionaries
customers = [
{'id': 1, 'name': 'John Doe', 'email': 'johndoe@example.com'},
# ... other customers
]
for customer in customers:
# Perform action on the customer
print(f"Sending email to {customer['name']} at {customer['email']}")
2. For extremely large datasets, consider batch processing to improve performance:
# Assuming a list of customer IDs
customer_ids = [1, 2, 3, ...]
batch_size = 1000
for i in range(0, len(customer_ids), batch_size):
batch = customer_ids[i:i+batch_size]
# Process the batch of customers
print(f"Processing batch {i} to {i+batch_size}")
In this example:
customers
list, that contains the names of all your customers (actual data should be added to this list).for loop
iterates through each element. During each iteration, the customer variable stores the current customer's name (or object, depending on the implementation).customer
, such as sending emails, updating databases, calculating statistics, etc. It is important to replace the print statement with your actual logic for processing each customer.for
loop will efficiently iterate through all customers, handling the processing efficiently.I hope the above information is helpful.
Hi @aaron_harkins migrating from a multi-tenant MySQL structure to BigQuery while utilizing Dataform is a great choice. Consolidating common tables across multiple datasets will simplify your analysis and reporting workflows. Below is a step-by-step guide to help you achieve this efficiently:
Step 1: Identify Datasets and Tables
Step 2: Automate Table Consolidation Using Dataform
a. Create a Parameterized SQL Template
In Dataform, you can automate table merging by writing SQL-based transformations. Start by using BigQuery’s INFORMATION_SCHEMA to dynamically fetch datasets and tables. For example, the following query lists all datasets with a specific naming pattern:
b. Dynamically Combine Tables
To merge tables like Account across datasets, you can write a SQL script in Dataform:
You can automate this process further by building dynamic UNION queries using JavaScript or SQLX in Dataform.
Step 3: Automate Using BigQuery Scripting
If you prefer scripting, BigQuery allows you to loop through datasets programmatically and generate queries dynamically. Here’s an example script:
You can schedule this script to run regularly in BigQuery to keep your consolidated tables updated.
Step 4: Explore Third-Party Tools for Streamlined Automation
If manually handling scripting and SQL transformations feels overwhelming, tools like Windsor.ai can make the process much simpler. While Windsor.ai primarily focuses on data integration, its automation features allow you to:
Final thoughts
Hope this can be helpful!
Thanks @Snoshone07 . Curious if you've ever run into query limits while doing this? We have a ton of data and I keep getting errors...my thought process was to split up the dynamic union a certain amount of times and see if it gets past that. Not sure what else to do.