Creating a relationship between tables

cwalker7 · 04-24-2023 03:06 PM

Hi guys! I am new to BigQuery, so sorry if I this is a little confusing. I have two data sources that I am trying to create some sort of relationship, but not sure how to do so. I have all of our sales in table that is connected to a Google Sheets. I have a .csv that has all of our mail data, which has 10+ more metrics than our sale data, and I have uploaded the Mail data through Cloud Storage. Both of these data sources are in their own tables. Each row of data in Sales has a matching response code in the mail data, because the customer provides us their code when they contact us and that code is in our Sale data. I am wanting to create a new table that returns only the rows of data from our Mail that a matching/ duplicate code in our Sales. How could I do this through BigQuery?

Also, if I wanted to append our Mail data each week as we send new mail out. How could I do this without making new tables each time but just adding to the data already in there?

Aris_O

Hi @cwalker7 ,

Welcome to Google Cloud Community.

You can use a JOIN operation in BigQuery to construct a new table that only delivers the data rows from your Mail table that match a code in your Sales table. This example search should help you find what you're looking for:

SELECT *
FROM Mail
JOIN Sales
ON Mail.response_code = Sales.response_code

Only the rows of data from Mail that have a matching code in Sales will be returned by this query, which joins the Mail and Sales columns on the response_code field.

You can use the BigQuery Data Transfer Service or the Cloud Storage Transfer Service to automatically upload your new data to a specific location in Cloud Storage, and then create a scheduled query or use a Dataflow job to load the new data into your existing Mail table in BigQuery to append your Mail data each week without creating new tables. To manually upload and append new data to your current table, you can use use the BigQuery online UI or the bq command-line tool.

For instance, you might append a fresh CSV file containing information from your weekly mail to your current Mail table using the command-line command as follows:

bq load --source_format=CSV --skip_leading_rows=1 mydataset.Mail gs://my-bucket/weekly_mail_data.csv

This command will add the data from the Cloud Storage CSV file to any already present data in the Mail table in your BigQuery dataset. Please take note that you must substitute mydataset for your BigQuery dataset, "Mail" for your Mail table name, and "my-bucket/weekly_mail_data.csv" for the location of your new CSV file in Cloud Storage.

cwalker7

I was told to do an Exists as well. Here is my formula but I am getting this "Syntax error: Expected end of input but got identifier "Week" at [2:45]"

Select *

FROM `inline-data-384219.mail_dataset.`Mail Week 29-49`

WHERE EXISTS (

SELECT*

FROM 'mail_dataset.`Sales Table'

WHERE `Mail Week 29-49`.Campaign Code'=`Sales Table`.string_field_12'

)