I can't connect Presto to DataProc Metastore.
Image version I used is 2.0.51-debian10 and also tried with 2.0-debian10 since here in this topic I see that version was used.
I have created DataProc cluster configured to use it as you can see in this screenshot.
But if I try to connect to it in order to create some SQL query, I get this error from connecting to Hive metastore. Seems like it trying to connect to localhost:9083
davorceman@dev-with-presto-m:~$ presto --catalog hive
presto> show schemas;
Query 20221129_155202_00002_4aevi, FAILED, 3 nodes
Splits: 36 total, 0 done (0.00%)
9.63 [0 rows, 0B] [0 rows/s, 0B/s]
Query 20221129_155202_00002_4aevi failed: Failed connecting to Hive metastore: [dev-with-presto-m:9083]
If I change
/usr/lib/presto/etc/catalog/hive.properties
by adding my DataProc Metastore URL, after restarting presto service I'm able to query DataProc Metastore.
Is this a common way to set Presto on DataProc to use DataProc Metastore or?
I found this guide
https://cloud.google.com/dataproc/docs/concepts/components/presto
And tried to set presto-catalog with --parameters, but no success.
Here in documentation is described that prodhive.properties file will be created under $PRESTO_HOME/etc/catalog/. I'm assuming that $PRESTO_HOME = /usr/lib/presto, but I also tried to find it under /etc/presto and prodhive.properties was not there as well.
Is it possible somehow to connect Presto to DataProc Metastore in a more convenient way, by avoiding hardcode it with init scripts?
Solved! Go to Solution.
What I could suggest is to create an issue in GCPs public issue tracker regarding your request as this might be a potential bug. Please keep in mind that when you create an issue, it still needs to be analyzed and considered by the product team and a definite ETA is not guaranteed.
Can you provide the actual command you used to create the cluster with presto? And also the command you attempted to include --parameters.
First try was with terraform, and after that this command since I was not sure does terraform parsing correctly properties parameter.
Command is almost the same like that one from the url in the first post, just added networking parameters, since I don't have default vpc.
Here it is:
gcloud beta dataproc clusters create dev-with-presto \
--project=my-project-dev-123456 \
--subnet europe-west1-private \
--tags dev-with-presto \
--region=europe-west1 \
--num-workers=2 \
--scopes=cloud-platform \
--optional-components=PRESTO \
--image-version=2.0.51-debian10 \
--enable-component-gateway \
--properties=presto-catalog:prodhive.connector.name=hive,presto-catalog:prodhive.hive.metastore.uri=thrift://10.167.64.19:9083
Also I tried with and without this argument
--dataproc-metastore=projects/${PROJECT_ID}/locations/europe-west1/services/demo-service
Also beside using CLI, I tried in many ways to deploy it with Terraform using google-beta provider.
What I could suggest is to create an issue in GCPs public issue tracker regarding your request as this might be a potential bug. Please keep in mind that when you create an issue, it still needs to be analyzed and considered by the product team and a definite ETA is not guaranteed.