Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

How to connect Presto to DataProc Metastore

 

I can't connect Presto to DataProc Metastore.

Image version I used is 2.0.51-debian10 and also tried with 2.0-debian10 since here in this topic I see that version was used.

I have created DataProc cluster configured to use it as you can see in this screenshot.

davorceman_0-1669737023969.png

But if I try to connect to it in order to create some SQL query, I get this error from connecting to Hive metastore. Seems like it trying to connect to localhost:9083

 

 

davorceman@dev-with-presto-m:~$ presto --catalog hive
presto> show schemas;

Query 20221129_155202_00002_4aevi, FAILED, 3 nodes
Splits: 36 total, 0 done (0.00%)
9.63 [0 rows, 0B] [0 rows/s, 0B/s]

Query 20221129_155202_00002_4aevi failed: Failed connecting to Hive metastore: [dev-with-presto-m:9083]

 

 

If I change 

 

 

/usr/lib/presto/etc/catalog/hive.properties

 

 

by adding my DataProc Metastore URL, after restarting presto service I'm able to query DataProc Metastore.

Is this a common way to set Presto on DataProc to use DataProc Metastore or?

I found this guide
https://cloud.google.com/dataproc/docs/concepts/components/presto

And tried to set presto-catalog with --parameters, but no success.

Here in documentation is described that prodhive.properties file will be created under $PRESTO_HOME/etc/catalog/. I'm assuming that $PRESTO_HOME = /usr/lib/presto, but I also tried to find it under /etc/presto and prodhive.properties was not there as well.

Is it possible somehow to connect Presto to DataProc Metastore in a more convenient way, by avoiding hardcode it with init scripts?

Solved Solved
0 3 886
1 ACCEPTED SOLUTION

What I could suggest is to create an issue in GCPs public issue tracker regarding your request as this might be a potential bug. Please keep in mind that when you create an issue, it still needs to be analyzed and considered by the product team and a definite ETA is not guaranteed. 

View solution in original post

3 REPLIES 3

Can you provide the actual command you used to create the cluster with presto? And also the command you attempted to include --parameters.

First try was with terraform, and after that this command since I was not sure does terraform parsing correctly properties parameter.

Command is almost the same like that one from the url in the first post, just added networking parameters, since I don't have default vpc.

Here it is:

 

gcloud beta dataproc clusters create dev-with-presto  \
 --project=my-project-dev-123456 \
 --subnet europe-west1-private \
 --tags dev-with-presto \
 --region=europe-west1 \
 --num-workers=2 \
 --scopes=cloud-platform \
 --optional-components=PRESTO \
 --image-version=2.0.51-debian10 \
 --enable-component-gateway \
 --properties=presto-catalog:prodhive.connector.name=hive,presto-catalog:prodhive.hive.metastore.uri=thrift://10.167.64.19:9083

 

 

Also I tried with and without this argument

 

--dataproc-metastore=projects/${PROJECT_ID}/locations/europe-west1/services/demo-service

 

 Also beside using CLI, I tried in many ways to deploy it with Terraform using google-beta provider.

What I could suggest is to create an issue in GCPs public issue tracker regarding your request as this might be a potential bug. Please keep in mind that when you create an issue, it still needs to be analyzed and considered by the product team and a definite ETA is not guaranteed.