I am trying to transfer a public dataset to our project. I am specifying a Scheduled Query with a service account. In my project, (the destination project) I have given this service account the bigquery Admin role. I have also made sure the BigQuery Data Transfer service agent has the service account token creator role and the bigquerydatatransfer.serviceAgent role.
Using Terraform I have specified the following data transfer config -
resource "google_bigquery_data_transfer_config" "indications_query_config" {
depends_on = [google_project_iam_member.bq_data_transfer_permissions]
project = module.gcp_project.project_id
display_name = "Open Targets Platform Data Transfer - Indications"
location = "us-west1"
data_source_id = "scheduled_query"
schedule = "14 of month 02:00"
destination_dataset_id = module.bigquery.bigquery_dataset.dataset_id
service_account_name = google_service_account.bigquery_data_transfer_sa.email
params = {
destination_table_name_template = "indication"
write_disposition = "WRITE_APPEND"
query = "SELECT * FROM `bigquery-public-data.open_targets_platform.indication`"
}
}
When this transfer runs, I get the following error message:
Error code 7 : Failed to start BigQuery job. Error: Access Denied: Table bigquery-public-data:open_targets_platform.indication: User does not have permission to query table bigquery-public-data:open_targets_platform.indication, or perhaps it does not exist. User: SA_NAME@PROJECT_ID.iam.gserviceaccount.com.
I went ahead and created a service account key for this service account. I set the env var GOOGLE_APPLICATION_CREDENTIALS as the path to this key and ran the following python code:
from google.cloud import bigquery
import google.auth
SCOPES = [
'https://www.googleapis.com/auth/cloud-platform'
]
CREDENTIALS, GCP_PROJECT = google.auth.default(scopes=SCOPES)
client = bigquery.Client(credentials=CREDENTIALS, project=GCP_PROJECT)
# Perform a query.
QUERY = (
'SELECT * FROM `bigquery-public-data.open_targets_platform.indication`'
'LIMIT 100')
query_job = client.query(QUERY) # API request
rows = query_job.result() # Waits for query to finish
for row in rows:
print(row.id)
When I do this, I am able to query the public dataset. I can also use the debugger to confirm that the bigquery client is indeed using the service account as credentials.
Any idea whats going wrong here?
Solved! Go to Solution.
TL;DR - Give the service agent BigQuery Data Viewer in the project. Subscribe to the public dataset to create a linked dataset in your project. Create/ensure a destination dataset has the same location as the linked dataset (regional vs. multi-regional). Create a scheduled query.