Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

How to transform Table in us-central1 to table in US data location

pnr
Bronze 3
Bronze 3

Hello,

I have a table in us-central1 which is referenced in Dataform. I want to apply some transformation on it and write to a table in US location. How can I achieve that?

I have configured the

defaultLocation to us-central1 in dataform.json

"defaultLocation": "us-central1"

In my_table.sqlx

config {
type: "incremental",
schema: "my_schema",
protected: true,
location: "US"
uniqueKey: ["KeyId"]
}

But, I was unable to access the my_table.

Can anyone help in letting me know how to achieve my use case? Thanks.

Solved Solved
1 6 2,916
1 ACCEPTED SOLUTION

I understand your use case better now. If you're dealing with a change stream and need continuous transformations, you'll want a solution that can handle real-time or near-real-time data processing.

In BigQuery and Dataform, you can't directly query tables across regions. However, you can set up a process that allows you to continuously transform and move data from one region to another. Here's a suggested approach:

  1. Data Transfer:

    • Use a method suitable for your needs (e.g., BigQuery Transfer Service, custom scripts, etc.) to move data from the us-central1 region to the US multi-region location.
  2. Dataform Script:

    • In your Dataform script, reference the table in the US multi-region location and apply the desired transformations. Here's an example:

-- Reference the table in the US multi-region location and apply transformations.
SELECT
column1,
column2,
column3 * 2 AS new_column
FROM
`project_id.dataset_name.table_name`

  1. Dataform Configuration:

    • Ensure your Dataform project is set up to connect to the BigQuery project and dataset in the US multi-region location.
  2. Running the Transformations:

    • Use the Dataform web interface or CLI to run your scripts and apply the transformations. To run the script, use the command:

dataform run

By following this approach, you can ensure that your change stream data in us-central1 is continuously transformed and written to the US multi-region location in BigQuery.

View solution in original post

6 REPLIES 6

The most efficient way to transform a table in us-central1 to a table in another US location in BigQuery is to copy the table directly within BigQuery. To do this, follow these steps:

  1. In the BigQuery console, navigate to the project that contains the table in us-central1.
  2. Click the dataset that contains the table.
  3. Click the table.
  4. Click the Copy button.
  5. In the Destination dataset field, select or create the dataset in the destination US location where you want the table to reside.
  6. Click the Copy button.

Once the table has been copied, you can apply the desired transformations to the new table. For example, you can use the UPDATE or DELETE statements to update or delete rows, or you can use the CREATE VIEW statement to create a view based on the new table.

To access the new table in Dataform:

After applying the desired transformations to the new table, you can access it in Dataform by specifying the location of the table in the config block. For instance, in your Dataform script, you can use the following configuration:

 

config {
type: "incremental",
schema: "my_schema",
protected: true,
location: "us-east1"
uniqueKey: ["KeyId"]
}

Use code with caution, especially when modifying data, to avoid unintended data changes or loss.

Additional considerations:

  • Efficiency: Exporting to Cloud Storage and then importing back into BigQuery can be time-consuming and expensive, especially for large datasets. Directly copying the table within BigQuery is a more efficient solution.

  • Cost: Both exporting data to Cloud Storage and importing it back into BigQuery have associated costs. It's essential to be aware of this, especially if dealing with large datasets.

  • Data Consistency: The process involves multiple steps, and if not done correctly, there's a risk of data inconsistency or data loss.

  • Location of Cloud Storage Bucket: If you choose to export the table to Cloud Storage as an intermediary step, the Cloud Storage bucket should ideally be in the us-central1 region to minimize egress costs. Similarly, when copying to another bucket for the destination US location, that bucket should ideally be in the target region.

Thanks for the reply. But this might not help my use case, as my table in us-central1 is a change stream and I have to continuously apply transformations on it and write to a table in US region.

My ideal solution would be to read the read the table in us-central1 and write to a table in US region. Can achieve it in BigQuery Dataform?

I understand your use case better now. If you're dealing with a change stream and need continuous transformations, you'll want a solution that can handle real-time or near-real-time data processing.

In BigQuery and Dataform, you can't directly query tables across regions. However, you can set up a process that allows you to continuously transform and move data from one region to another. Here's a suggested approach:

  1. Data Transfer:

    • Use a method suitable for your needs (e.g., BigQuery Transfer Service, custom scripts, etc.) to move data from the us-central1 region to the US multi-region location.
  2. Dataform Script:

    • In your Dataform script, reference the table in the US multi-region location and apply the desired transformations. Here's an example:

-- Reference the table in the US multi-region location and apply transformations.
SELECT
column1,
column2,
column3 * 2 AS new_column
FROM
`project_id.dataset_name.table_name`

  1. Dataform Configuration:

    • Ensure your Dataform project is set up to connect to the BigQuery project and dataset in the US multi-region location.
  2. Running the Transformations:

    • Use the Dataform web interface or CLI to run your scripts and apply the transformations. To run the script, use the command:

dataform run

By following this approach, you can ensure that your change stream data in us-central1 is continuously transformed and written to the US multi-region location in BigQuery.

Hi @ms4446 , I'm unable to find a solution to move data across regions in BigQuery.

I understand your frustration. Moving data across regions in BigQuery is not a straightforward task, and there isn't a one-size-fits-all solution.

The best approach for you will depend on specific factors like the volume of data you're handling, how often you need to move it, and your budget constraints.

Here are a few options to consider:

  1. BigQuery Data Transfer Service:

    • It's important to note that while BigQuery Data Transfer Service is designed to automate data movement from various SaaS applications to BigQuery, it doesn't inherently support cross-region data transfers within BigQuery itself. Therefore, relying on it for this specific use case might not be ideal.
  2. Third-party services:

    • Services like Fivetran, Alooma, and Stitch are popular for data integration. While they can facilitate data movement into BigQuery, their primary focus might not be on cross-region transfers within BigQuery. It's essential to review the specific capabilities of each service to ensure they meet your needs.
  3. Custom script:

    • For those with unique requirements not met by existing services, crafting a custom script is a viable option. However, be prepared for the technical challenges and ongoing maintenance that come with this approach.

Additional Tips:

  • Data Format: BigQuery supports various data formats, such as CSV, JSON, and Parquet. Opt for a format that's both storage-efficient and conducive to processing.
  • Data Compression: Compressing your data can not only speed up the transfer process but also reduce associated costs.
  • Staging Dataset: Consider using a staging dataset as an interim storage area before moving data to its final destination. This can enhance performance and minimize errors, but be aware of potential additional costs.
  • Testing: Always test your data transfer process in a non-production environment first to ensure its accuracy and reliability.

pnr
Bronze 3
Bronze 3

I found this issue 1 and issue 2, looks like it is not supported.