Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Airflow (Composer) connection to private BigQuery dataset

I am trying to have my Composer Airflow DAG operate on private dataset I created in BigQuery. Composer is supposed to have default connections for BigQuery: 

https://cloud.google.com/composer/docs/how-to/managing/connections says "By default, Cloud Composer configures the following Airflow connections for Google Cloud Platform: bigquery_default, google_cloud_default, google_cloud_datastore_default, google_cloud_storage_default. You can use these connections from your DAGs by using the default connection ID"

But these connections seem to work only for public datsets. On my private dataset I'm getting a 401 error saying a login is required. Does anyone know how to fix this?

Thanks,
Carl

0 2 1,332
2 REPLIES 2

You have provide the default service account (compute service account of composer) to have access to BigQuery in order to access it.

Alternatively you can create a separate service account and get the key generated for the same and provide the BQ project access to read the data. Use the details to create a connection and supply that connection as part of your airflow operator parameters.

I found the problem: In my case, I needed to add the location to my operator. So first, check the dataset information if you are not sure the location. Then add it as a parameter in your operator. For example, my dataset was in us-west1 and I was using an operator that looked like this:

    check1 = BigQueryCheckOperator(task_id='check_my_event_data_exists',
                       sql="""
                        select count(*) > 0
                        from my_project.my_dataset.event
                         """,
                       use_legacy_sql=False,
                       location="us-west1")   # THIS WAS THE FIX IN MY CASE