I am trying to have my Composer Airflow DAG operate on private dataset I created in BigQuery. Composer is supposed to have default connections for BigQuery:
https://cloud.google.com/composer/docs/how-to/managing/connections says "By default, Cloud Composer configures the following Airflow connections for Google Cloud Platform: bigquery_default, google_cloud_default, google_cloud_datastore_default, google_cloud_storage_default. You can use these connections from your DAGs by using the default connection ID"
But these connections seem to work only for public datsets. On my private dataset I'm getting a 401 error saying a login is required. Does anyone know how to fix this?
Thanks,
Carl
You have provide the default service account (compute service account of composer) to have access to BigQuery in order to access it.
Alternatively you can create a separate service account and get the key generated for the same and provide the BQ project access to read the data. Use the details to create a connection and supply that connection as part of your airflow operator parameters.
I found the problem: In my case, I needed to add the location to my operator. So first, check the dataset information if you are not sure the location. Then add it as a parameter in your operator. For example, my dataset was in us-west1 and I was using an operator that looked like this:
check1 = BigQueryCheckOperator(task_id='check_my_event_data_exists', sql=""" select count(*) > 0 from my_project.my_dataset.event """, use_legacy_sql=False, location="us-west1") # THIS WAS THE FIX IN MY CASE