Hi Team,
I have source table which contains the change stream, I'm trying to replication the table in BigQuery using Dataform. For that I want to read the records from change stream table and insert or update records in the destination table.
When I tried creating two output files one containing insert and other update publish function I get the error
Duplicate action name detected. Names within a schema must be unique across tables, declarations, assertions, and operations
Insert.js ->
publish("car_table", {type: "incremental"})
.query(ctx => {
`SELECT src.id, src.brand, src.count FROM ${ctx.ref("car_table_changelog")} AS src WHERE src._metadata_spanner_mod_type="INSERT"`
});
Update.js ->
publish("car_table", {type: "incremental"})
.query(ctx => `SELECT src.id, src.brand, src.count FROM ${ctx.ref("car_table_changelog")} AS src WHERE src._metadata_spanner_mod_type="UPDATE"`);
How can I achieve it? Can I combine both the SQL statements into one publish function?
Solved! Go to Solution.
Below are some resources and example projects that can help you understand and work with Dataform:
Dataform's official documentation and guides: This is your best starting point as it provides comprehensive details, tutorials, and best practices.
Dataform's GitHub repository: This contains sample projects, integrations, and more. It's a useful resource to see real-world setups and understand potential advanced configurations.
Example Projects:
Dataform Web Tracking Example: This is a sample project that transforms raw Google Analytics and database logs into a set of tables suitable for analysis.
Dataform E-commerce Example: This project takes raw e-commerce data and turns it into a tidy set of tables.
Community Resources:
Dataform's Community Slack: This is a great place to ask questions, share experiences, and learn from other Dataform users.
Dataform Blog: Contains a variety of articles, from best practices to in-depth explanations of specific features.
A sample script to run the MERGE statement in BigQuery can be written in either SQL or a programming language such as JavaScript.
Here is a sample SQL script:
MERGE INTO car_table AS target
USING car_table_changelog AS source
ON target.id = source.id
WHEN MATCHED AND source._metadata_spanner_mod_type = 'UPDATE' THEN
UPDATE SET
target.brand = source.brand,
target.count = source.count
WHEN NOT MATCHED AND source._metadata_spanner_mod_type = 'INSERT' THEN
INSERT (id, brand, count)
VALUES (source.id, source.brand, source.count);
This script can be run in the BigQuery console or using the BigQuery API.
Here is a sample JavaScript script to run the MERGE statement in BigQuery:
const { BigQuery } = require('@google-cloud/bigquery');
const bigquery = new BigQuery();
const query = `
MERGE INTO car_table AS target
USING car_table_changelog AS source
ON target.id = source.id
WHEN MATCHED AND source._metadata_spanner_mod_type = 'UPDATE' THEN
UPDATE SET
target.brand = source.brand,
target.count = source.count
WHEN NOT MATCHED AND source._metadata_spanner_mod_type = 'INSERT' THEN
INSERT (id, brand, count)
VALUES (source.id, source.brand, source.count);
`;
async function runQuery() {
const job = await bigquery.createQueryJob({
query,
location: 'US',
});
const [result] = await job.getQueryResults();
console.log(result);
}
runQuery();
This script can be run using Node.js.
Which type of script you choose will depend on your specific needs and preferences.