Re: Simba biquery driver not supporting different ...

akashburnwal · 11-03-2023 02:57 AM

Hi Folks,

We are using Simba BigQuery driver for one of our integration. Based on that we have two projects one which is the billing project and one where actually the dataset resides. I tried various options as per the doc https://storage.googleapis.com/simba-bq-release/jdbc/Simba%20Google%20BigQuery%20JDBC%20Connector%20... and tried to use QueryProperties and within that dataset_project_id=<dataset_proejct> however it didn't work.

Requesting veterans and experts to share suggestions how it can be defined for the driver.
Reference doc : https://developer.lumapps.com/datalake (Section : Connecting to the Data lake)

ms4446

The Simba BigQuery JDBC driver has some known limitations when attempting to query datasets in a project other than the billing project. While there is no official solution to this issue, several workarounds:

Modify the Connection String: Some users have reported success by including the project ID of the dataset within the connection string. An example modification might look like this:

jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;ProjectId=desired_project_id;BillingProjectId=billing_pro...;...

However, this approach may not work for everyone and should be tested thoroughly.

Driver Version Adjustment: There have been instances where users resolved issues by either updating to a newer version of the Simba driver or reverting to an older one. It's crucial to review the release notes for each version of the driver to understand the impact of any changes.

Utilize Alternative Client Libraries: For those who can work outside of JDBC, client libraries such as the Python client library for BigQuery offer more flexibility and do not share the same limitations. Here's a brief example of how to use it:

Driver Version Adjustment: There have been instances where users resolved issues by either updating to a newer version of the Simba driver or reverting to an older one. It's crucial to review the release notes for each version of the driver to understand the impact of any changes.

Utilize Alternative Client Libraries: For those who can work outside of JDBC, client libraries such as the Python client library for BigQuery offer more flexibility and do not share the same limitations. Here's a brief example of how to use it:

from google.cloud import bigquery

client = bigquery.Client(project='desired_project_id')

query = "SELECT * FROM dataset.table"

results = client.query(query).result()

for row in results:
    print(row)

More details can be found in the official Python client library documentation.

Explore Third-Party Connectors: Connectors provided by third-party tools may offer alternative methods to connect to BigQuery datasets in different projects. Be aware that these connectors also come with their own set of limitations and should be evaluated for compatibility with your specific requirements.

It's important to note that these workarounds are not officially supported solutions and may not be suitable for all scenarios. They should be tested in a non-production environment to avoid any unintended consequences.

Lastly, if you decide to implement any of these workarounds, ensure that you have a comprehensive backup and recovery strategy in place, and that you understand the potential implications for data integrity and billing.

dmedora

The billing project can be set in Simba JDBC using the ProjectId property. You can specify additional data projects using AdditionalProjects.

akashburnwal

Thanks @dmedora , however this doesn't seem to be working for some reason. I have below parameters here. Can you please provide any further suggestions which i can try

The billing Project : It is the project on which the compute power is billed by google.
The project hosting the data : It corresponds to your production environment where your data is located.
The dataset hosting the data : external.

As per the suggestion tried to define the parameters in below options. Is there anything else that can be tried?
DefaultDataset : External
ProjectId : <Billing Project>
AdditionalProjects : <Dataset Project>

Also, i tried to define AdditionalProjects under QueryProperties : dataset_project_id=<Dataset Project>

dmedora

Hi @akashburnwal , could you share more on what you mean by it's not working? Are there errors? What behaviour do you see?

Kyyyyyyy

Sorry to necro an old subject. Our company recently started migrating to GC and has rolled out simba for odbc.

We still traditionally use Access/Excel/Toad for adhoc queries and minor tools. Using the simba driver, if you set the billing as the default catalog, the tables in the additional projects aren't as user friendly to use.

E.g. you go to link a table in Access and it won't show any tables in the data project because it only looks for tables in the billing project dataset.

If you set the dataset to the actual data project, it will see the tables, but you can't run anything because the billing project is not specified.

I was hoping to instead specify the billing project via the Query properties window but would appreciate advice on feasibility and the type of language to use within (ex: SOURCE=GoogleBigQuery.Database([BillingProject="billing project"]);

haxorcize

@dmedora worked perfectly for me, thanks for sharing the AdditionalProjects parameter.
I set the ProjectId to the billing project id, and AdditionalProjects helped me get to the project holding the data sets I needed.

j_panijel

Hi @dmedora

Is this the correct syntax ? The target project is not appearing with the corresponding dataset/tables in the catalog & in MS Access when connecting to the simba driver.

eric_varnell

Was anyone able to resolve this issue specifically for Microsoft Access? We are having the same problem as others have described with tables not appearing in MS Access when they are in a different project.

j_panijel

Hi, no I never found a work-around, we had to link the simba driver to the billing project, and created a dataset in that project for that specific use.

Simba biquery driver not supporting different project other than billing project