Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Dataform - Bigquery Omni AWS problem

Hi, I'm currently working on the multi-cloud integration in my company.

To achieve this, we started to connect data from AWS into GCP through Bigquery Omni. For Data governance reasons and to get the best control in the process we started to use Dataform.

I'm trying configuring an external table using the operators but when we try to execute, I get the following error.

jaime_parra_0-1741117832500.png

We did enough roles to the Service agent but it continues failing.

jaime_parra_1-1741117948906.png

The AWS region we are configuring is "aws-east-1". Also when we copy and execute the query generated by Dataform in the bigquery console it works well.

Best regards,

 

2 4 294
4 REPLIES 4

Hi @jaime_parra,

Welcome to Google Cloud Community!

The error you're encountering with Dataform and BigQuery Omni accessing data in AWS probably comes down to insufficient permissions. BigQuery Omni requires correctly configured IAM roles in both GCP and AWS to allow the Dataform service account to access the data.

You may review the following consideration as they may have the reason why you are getting a failed status.

  • Test the BigQuery Omni Connection. Execute a simple query against the external table using the BigQuery console directly. This isolates the problem; if it fails, the issue is with the connection itself; if it succeeds, the problem lies within Dataform's configuration or its interaction with BigQuery.
  • Review your AWS IAM Policy. Carefully examine the AWS IAM policy. Ensure the policy allows the necessary actions (such as s3:GetObject ) on the exact S3 bucket path being accessed. Limit the permissions to the strict minimum required.
  • Check your Dataform logs, it should generate detailed logs. These logs may contain more specific error messages or hints about the cause of the failure.

Ensure your Dataform configuration correctly points to your BigQuery Omni connection and that the connection itself is properly established.

Here are some helpful links:

For a deeper understanding, I suggest contacting Google Cloud support for assistance.

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.

Hi @NorieRam 

I tell you that all the permission validations and so on were done and we realized:

1. If we run a process where the region is US, us-east1, the materialization of the views or tables works without incident.

2. If we want to create an external table or view in aws, even if we define the aws-us-east-1 region in the configuration file, it fails due to the error shared in the image.

3. The bigquery omni permissions are correct because the users (including me) can read the sources and create them from the console without problem.

The conclusion is that it is the dataform service that does not work when we do an execution process inside the service.

Finally, at the permissions level, even the dataform service agent was given Owner permissions on the project and it still failed. Although I reiterate that it only happens when you are going to create an external table in the omni aws dataset, in the other cases it works normally.

Thank you and I look forward to any feedback you can give me.

@jaime_parra I suggest filing an issue request regarding this, so that our Engineering Team can investigate further. Before filing, please take note on what to expect when opening an issue. For future updates, I recommend monitoring the issue tracker. 

Hi @NorieRam 

Thank you for your response and recommendations. We have conducted several tests and identified an interesting behavior that led us to pinpoint an issue with scheduled executions in Dataform.

Context and Initial Configuration

We have configured Dataform to work with BigQuery Omni on AWS (aws-us-east-1), setting the following parameters in workflow_settings.yaml:

 

.yaml
defaultProject: augusta-bavv-dev-activo
defaultLocation: aws-us-east-1
defaultDataset: dataform
defaultAssertionDataset: dataform_assertions
dataformCoreVersion: 3.0.0

Additionally, when executing queries from Workspace, external tables in BigQuery Omni are successfully created, and we were able to deploy some views by adjusting the YAML configuration with location=aws-us-east-1.

Here’s an example of a view that we tested and confirmed to be working correctly:

 

prueba.sqlx
 

 

 

config {
    type: "operations"
}

CREATE OR REPLACE VIEW
  `aws_omni_view.prueba` AS
SELECT
  REGEXP_EXTRACT(_FILE_NAME, r'/([^/]+)/[^/]+\.parquet$') AS subfolder,
  PARSE_DATE('%Y%m%d', REGEXP_EXTRACT(_FILE_NAME, r'\d{8}')) AS partition_date
FROM
  `aws_omni_campanas.prueba`

 



 

camilo_medina_0-1741639949883.png

camilo_medina_1-1741640360115.png

We also confirmed that executions from the BigQuery Console run correctly and that data is created in the expected region (aws-us-east-1).

Furthermore, when reviewing BigQuery Job History, we observed that manual executions from Workspace are indeed running in the configured region (aws-us-east-1), indicating that the YAML settings are being applied correctly within the Workspace environment.


Identified Issue

Even though executions from Workspace behave as expected, when we schedule a Dataform execution, we receive an error.

Root Causes Identified

1️⃣ The workflow_settings.yaml file only affects the Workspace but does not control the actual execution of scheduled workflows.

  • While queries run correctly in aws-us-east-1 from the Workspace, when scheduled, Dataform follows the configuration in release_config.

2️⃣ The release_config was set to us-east1 with target: bigquery.

  • This caused Dataform to execute workflows on GCP instead of AWS Omni.

3️⃣ There was no properly configured release_config for aws-us-east-1.

  • Since no specific release_config existed for AWS Omni, executions defaulted to us-east1 on GCP.

📄 Evidence from Logs:
By analyzing the execution logs, we found that the releaseConfigId is "omni" and it is executing in "location": "us-east1", confirming that Dataform jobs are being executed in the wrong region.

 

json
{
  "insertId": "geq081ch6v",
  "jsonPayload": {
    "@type": "type.googleapis.com/google.cloud.dataform.logging.v1.WorkflowInvocationCompletionLogEntry",
    "releaseConfigId": "omni",
    "workflowInvocationId": "1741637475-a4f89d69-7cf4-476b-82cd-b5c7a2ebd314",
    "terminalState": "FAILED",
    "workflowConfigId": "naruto"
  },
  "resource": {
    "type": "dataform.googleapis.com/Repository",
    "labels": {
      "location": "us-east1",  <--- ⚠️ EXECUTING ON GCP INSTEAD OF AWS Omni
      "resource_container": "626537586202",
      "repository_id": "augusta-bavv-bigquery-omni-aws"
    }
  },
  "timestamp": "2025-03-10T20:11:15.529842342Z",
  "severity": "ERROR",
  "logName": "projects/augusta-bavv-dev-activo/logs/dataform.googleapis.com%2Fworkflow_invocation_completion",
  "receiveTimestamp": "2025-03-10T20:11:16.071909743Z"
}

🔧 Next Steps

To resolve this issue, we believe that:

A release_config must be created and properly configured for aws-us-east-1 so that scheduled executions in Dataform use the correct region instead of defaulting to GCP.

Could you provide guidance on the best approach to configure the release_config in this scenario to ensure that scheduled executions respect the AWS Omni configuration?

We appreciate any additional recommendations.