Error connecting big query from Zeppelin

The query runs normally for the public dataset, but returns an error for the private dataset.

 
When using private datasets

 

%bigquery
#standardSQL
select idn
  from privit_table

 

-> output : SQL Execution returned an error!

(However, this query is not a query problem because it outputs normal results when running on a big query.)

 

When using public datasets

 

%bigquery
#standardSQL
SELECT departure_airport,count(case when departure_delay>0 then 1 else 0 end) as no_of_delays 
FROM `bigquery-samples.airline_ontime_data.flights`
group by departure_airport 
order by 2 desc 
limit 10

 

-> output : The query results are successfully output.

 

I'm currently running zeppelin-daemon on ubuntu and connecting from a remote server, and I set the environment variable for key.json through export GOOGLE_APPLICATION_CREDENTIALS.

I also specified the project ID.
 

The https://zeppelin.apache.org/docs/latest/storage/storage.html#notebook-storage-in-gcs link written as a reference in the official DOCS document is not currently working, so I have no reference.

Why does it work well on public datasets and only on private datasets get SQL Execution returned an error!? There was someone in StackOverflow who was in the same situation, but the replies didn't help. Please help me!

 

 

 

 

 

 

 

 

3 2 64
2 REPLIES 2

Hi @sseozy ,

Since queries against public datasets are working, your core setup is likely sound. The trouble seems to be specifically related to the permissions or configuration needed for private datasets.

Troubleshooting Steps:

Authentication & Authorization

  • Service Account Permissions: In the Google Cloud Console, go to the specific private dataset, click "SHARE DATASET," and ensure your service account has the "BigQuery Data Viewer" role (or higher).
  • Environment Variable Check: On your Ubuntu server, run the following command:
     
    echo $GOOGLE_APPLICATION_CREDENTIALS 
    
    Ensure the path to your key.json file is correct. Remember to set it in the appropriate context if Zeppelin runs as a service.
  • Zeppelin Configuration: Review the BigQuery interpreter settings in Zeppelin. Verify the project ID and that the authentication mechanism (key file or other) is configured correctly.

Network Considerations

  • Firewall: Double-check that your firewall isn't blocking Google API connections. Outbound connections on HTTPS (port 443) should be allowed.
  • Dataset Region: If your private dataset is in a different region than public ones you've used, you may need to explicitly specify the region in Zeppelin's configuration.

Diagnostic Tests:

  • Command-line Validation: On your Ubuntu server, run:
     
    gcloud auth activate-service-account --key-file=path/to/your/key.json 
    gcloud projects list  
    bq query --use_legacy_sql=false 'SELECT * FROM `your-project.your-dataset.private_table` LIMIT 10' 
    
    If these commands work, it points towards a Zeppelin configuration issue.
  • Simplified Test: Create a new Zeppelin notebook with only a basic query on your private dataset to rule out query-specific complexities.

Logging for Insights

  • Enable Detailed Logging: Look into enabling debug-level logging in Zeppelin. Search the logs for specific error messages related to authentication, access, or BigQuery.
  • BigQuery Audit Logs: In the Google Cloud Console, examine the BigQuery audit logs for any failed access attempts from your service account.

Additional Tips

  • Community Support: The Zeppelin community forums and mailing lists are excellent resources if you get stuck.
  • Updates: Consider updating Zeppelin and related dependencies to stay current.

 

Command-line Validation works very successfully. We also checked that the BigQuery project charges bytes when running bigquery on zeppelin. However, only SQLExecution returned an error! is still displayed on the web.

I was sure it wasn't a big query issue with your response. I'll take advantage of the Zeppelin community. Thank you!