Hi everyone,
im trying to enable Trino as optional component within dataproc like described at https://cloud.google.com/dataproc/docs/tutorials/trino-dataproc but its simply not working.
The Web-UI is not opening... browser tells me "The page isn’t redirecting properly"
When i try to connect via SSH and start trino cli i get an error that he is not able to connect to port 8080.
$ trino --catalog hive --schema default
trino:default> Jun 13, 2023 3:05:13 PM com.google.common.cache.LocalCache$Segment$1 run
WARNING: Exception thrown during refresh
java.util.concurrent.ExecutionException: java.io.UncheckedIOException: java.net.ConnectException: Failed to connect to localhost/0:0:0:0:0:0:0:1:8080
at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:588)
at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:547)
at com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:113)
at com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:240)
at com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2317)
at com.google.common.cache.LocalCache$Segment$1.run(LocalCache.java:2297)
at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:31)
at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1270)
at com.google.common.util.concurrent.AbstractFuture.addListener(AbstractFuture.java:761)
at com.google.common.util.concurrent.AbstractFuture$TrustedFuture.addListener(AbstractFuture.java:136)
at com.google.common.cache.LocalCache$Segment.loadAsync(LocalCache.java:2292)
at com.google.common.cache.LocalCache$Segment.refresh(LocalCache.java:2364)
at com.google.common.cache.LocalCache.refresh(LocalCache.java:4138)
at com.google.common.cache.LocalCache$LocalLoadingCache.refresh(LocalCache.java:4969)
at io.trino.cli.TableNameCompleter.lambda$populateCache$0(TableNameCompleter.java:105)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.io.UncheckedIOException: java.net.ConnectException: Failed to connect to localhost/0:0:0:0:0:0:0:1:8080
at io.trino.client.JsonResponse.execute(JsonResponse.java:148)
at io.trino.client.StatementClientV1.<init>(StatementClientV1.java:113)
at io.trino.client.StatementClientFactory.newStatementClient(StatementClientFactory.java:24)
at io.trino.cli.QueryRunner.startInternalQuery(QueryRunner.java:159)
at io.trino.cli.QueryRunner.startInternalQuery(QueryRunner.java:150)
at io.trino.cli.TableNameCompleter.queryMetadata(TableNameCompleter.java:86)
at io.trino.cli.TableNameCompleter.listFunctions(TableNameCompleter.java:80)
at com.google.common.cache.CacheLoader$FunctionToCacheLoader.load(CacheLoader.java:169)
at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3533)
at com.google.common.cache.LocalCache$Segment.loadAsync(LocalCache.java:2291)
... 7 more
Caused by: java.net.ConnectException: Failed to connect to localhost/0:0:0:0:0:0:0:1:8080
at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:265)
at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:183)
at okhttp3.internal.connection.ExchangeFinder.findConnection(ExchangeFinder.java:224)
at okhttp3.internal.connection.ExchangeFinder.findHealthyConnection(ExchangeFinder.java:108)
at okhttp3.internal.connection.ExchangeFinder.find(ExchangeFinder.java:88)
at okhttp3.internal.connection.Transmitter.newExchange(Transmitter.java:169)
at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:41)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117)
at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:94)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117)
at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:88)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142)
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117)
at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:229)
at okhttp3.RealCall.execute(RealCall.java:81)
at io.trino.client.JsonResponse.execute(JsonResponse.java:130)
... 16 more
Caused by: java.net.ConnectException: Connection refused (Connection refused)
at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:412)
at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:255)
at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:237)
at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.base/java.net.Socket.connect(Socket.java:609)
at okhttp3.internal.platform.Platform.connectSocket(Platform.java:130)
at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:263)
... 35 more
any ideas?
Regards
Christian
Here are few steps you might want to consider:
Check the Trino service status: You should ensure that the Trino service is up and running. You can check this using SSH to connect to the master node of your Dataproc cluster and running a command like sudo service trino status
or sudo systemctl status trino
, depending on your setup.
Check firewall rules: If the Trino service is running but you're still unable to connect, it's possible that firewall rules are blocking access to port 8080. You can view and manage firewall rules in the VPC network in the Google Cloud Console. Ensure there's a rule that allows incoming connections on port 8080.
Check the logs: Trino logs can provide valuable information about what might be causing the service not to start or accept connections. The location of the logs may vary, but it's usually in the /var/log/trino
directory.
Revisit the setup instructions: It's also worth revisiting the setup instructions to ensure that you haven't missed any steps or made any mistakes during the setup process.
Trino Service is running fine
$ systemctl status trino
● trino.service - Trino DB
Loaded: loaded (/lib/systemd/system/trino.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2023-06-13 15:32:27 UTC; 3min 43s ago
Process: 517 ExecStart=/usr/lib/trino/bin/launcher.py start (code=exited, status=0/SUCCESS)
Main PID: 611 (trino-server)
Tasks: 149 (limit: 19184)
Memory: 1.2G
CGroup: /system.slice/trino.service
└─611 java -cp /usr/lib/trino/lib/* -server -Xmn512m -XX:+UseConcMarkSweepGC -XX:+ExplicitGCInvokesConcurrent -XX:ReservedCodeCacheSize=150M -XX:+ExplicitGCInvokesConcurrent -XX:+CMSClassUnloadingEnabled -XX:+AggressiveOpts -X>
Jun 13 15:32:26 trino-cluster-m systemd[1]: Starting Trino DB...
Jun 13 15:32:27 trino-cluster-m launcher.py[517]: Started as 611
Jun 13 15:32:27 trino-cluster-m systemd[1]: Started Trino DB.
and its also logging some stuff.
~$ tail /var/log/trino/server.log
2023-06-13T15:35:59.140Z INFO Service Thread io.airlift.stats.JmxGcMonitor Major GC: application 2008ms, stopped 5297ms: 504.65MB -> 505.11MB
2023-06-13T15:36:06.456Z INFO Service Thread io.airlift.stats.JmxGcMonitor Major GC: application 2008ms, stopped 5308ms: 505.32MB -> 506.73MB
2023-06-13T15:36:13.773Z INFO Service Thread io.airlift.stats.JmxGcMonitor Major GC: application 2009ms, stopped 5308ms: 507.77MB -> 508.57MB
2023-06-13T15:36:21.066Z INFO Service Thread io.airlift.stats.JmxGcMonitor Major GC: application 2008ms, stopped 5286ms: 509.15MB -> 509.96MB
2023-06-13T15:36:28.346Z INFO Service Thread io.airlift.stats.JmxGcMonitor Major GC: application 2008ms, stopped 5271ms: 511.09MB -> 512.44MB
2023-06-13T15:36:35.595Z INFO Service Thread io.airlift.stats.JmxGcMonitor Major GC: application 2007ms, stopped 5243ms: 512.63MB -> 514.32MB
2023-06-13T15:36:42.857Z INFO Service Thread io.airlift.stats.JmxGcMonitor Major GC: application 2008ms, stopped 5254ms: 515.01MB -> 516.90MB
2023-06-13T15:36:50.190Z INFO Service Thread io.airlift.stats.JmxGcMonitor Major GC: application 2007ms, stopped 5325ms: 517.26MB -> 518.31MB
2023-06-13T15:36:57.521Z INFO Service Thread io.airlift.stats.JmxGcMonitor Major GC: application 2008ms, stopped 5323ms: 518.53MB -> 519.64MB
2023-06-13T15:37:04.844Z INFO Service Thread io.airlift.stats.JmxGcMonitor Major GC: application 2008ms, stopped 5316ms: 520.23MB -> 522.09MB
Im not sure why i should edit some firewall rules, if im using component gateway. Shouldnt this enable me to access WebUI like i can with jupyter?
Regards Christian
The Component Gateway handles web traffic for accessing application WebUIs, but it does not automatically handle other types of traffic, such as the TCP traffic used by the Trino CLI. If you're using the Trino CLI on a machine outside of your Google Cloud cluster, you'll need to ensure that the appropriate network ports (e.g., 8080 for Trino) are open and accessible. This is where firewall rules might come into play.
However, if you're running the Trino CLI from a machine within the same network as your Trino server (i.e., within the same Google Cloud project and network), then firewall rules may not be the issue, unless you have internal firewall rules that are blocking traffic.
It's also important to note that the Trino WebUI and the Trino CLI are used for different purposes. The WebUI is primarily for monitoring, while the CLI is used for executing SQL queries against your data. So being able to access the WebUI does not necessarily mean you will be able to connect via the CLI, and vice versa. If you're trying to execute queries, you'll need to get the CLI working, or use a JDBC/ODBC driver to connect from an application like a SQL client or a BI tool.
Also, make sure that Trino server is actually running on the master node of the Dataproc cluster and is configured to allow connections on the appropriate port (8080 by default). You can SSH into the master node and check the Trino server status and logs for more information.
Ok thats all clear to me.
I want to use Component Gateway to see the UI for making sure Trino was running.
I´ve just used Trino CLI from the master node, connected via SSH at Cloud Shell, like it was decribed at the tutorial. No need for external access at the moment
Trino Services was active on the master node as shown above, and i was connections to this master node via ssh.
But the problem still persists... Component Gateway says "The page isn’t redirecting properly" and Trino CLI complains that he cannot reach local host.
"...and is configured to allow connections on the appropriate port (8080 by default)"
how should i ensure that? Connections from localhost should always be allowd and i assume that enabling the optional component would configure it that way, or not?
Kind regards,
Christian
You're correct that if Trino was properly installed and configured as an optional component in your Dataproc cluster, it should be set up to allow connections on the appropriate port (8080 by default). And connections from localhost should indeed always be allowed.
However, there could be several reasons why you're still experiencing issues. Here are some steps you can take to troubleshoot:
Check Trino's configuration files: Trino's configuration files are typically located in a directory like /etc/trino
. In particular, you'll want to check the config.properties
file in the etc
directory. This file should contain a line like http-server.http.port=8080
indicating that the server is set up to listen on port 8080.
Check Trino's logs: You can find Trino's logs in a directory like /var/log/trino
. Looking at the most recent logs could provide clues about what's going wrong.
Try connecting to Trino's WebUI directly: From the master node, you can try using a text-based web browser like lynx
or w3m
to connect to http://localhost:8080
. If this works, it indicates that Trino's WebUI is indeed running and accessible from the master node, and the issue might be with the Component Gateway.
Check the Component Gateway's logs and configuration: The Component Gateway should have its own logs that might provide more information about why it's not redirecting properly. You can also check its configuration to ensure it's set up to handle Trino's WebUI.
Check network connectivity on the master node: You can use tools like ping
, traceroute
, and netstat
to check the network connectivity on the master node. In particular, netstat -tuln
will show you all the TCP ports that are being listened on, which should include port 8080 for Trino.
Try restarting the Trino service: If all else fails, you can try restarting the Trino service by running a command like sudo service trino restart
. Sometimes, a simple restart can resolve issues.
christian_pfarr@trino-cluster-m:~$ wget http://localhost:8060 -O -
--2023-06-14 09:16:21-- http://localhost:8060/
Resolving localhost (localhost)... ::1, 127.0.0.1
Connecting to localhost (localhost)|::1|:8060... connected.
HTTP request sent, awaiting response... 303 See Other
Location: http://localhost:8060/ui/ [following]
--2023-06-14 09:16:21-- http://localhost:8060/ui/
Reusing existing connection to [localhost]:8060.
HTTP request sent, awaiting response... 200 OK
Length: 1821 (1.8K) [text/html]
Saving to: ‘STDOUT’
- 0%[ ] 0 --.-KB/s <!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="description" content="Cluster Overview - Trino">
<title>Cluster Overview - Trino</title>
Could it be that the Port konfiguration must be adjusted manually?
I know i could change this via some property but im not sure how i can change the trino port, so that all services could reach it.
I´ve created the cluster with this command:
gcloud beta dataproc clusters create trino-cluster --project=xxx --region=us-central1 --num-workers=2 --scopes=cloud-platform --optional-components=TRINO --image-version=2.1 --enable-component-gateway
Kind regards,
Christian
From the information you've provided, it seems that your Trino service is indeed running and properly configured. The log file does not seem to indicate any problems. However, the port you mentioned in the wget command (8080) and in your configuration file (8060) do not match. You should try connecting to the Web UI through the port mentioned in your configuration file, which is 8060.
So, the command should be:
$ wget http://localhost:8060 -O -
If that doesn't work, there could be a network issue or the service might not be correctly binding to the port. Here are some steps you can take:
Check if the port is listening: Use the command netstat -tuln | grep 8060
to check if the port is open and listening for connections.
Check if firewall rules are blocking the connection: Use the command sudo ufw status
to check the status of your firewall. If the firewall is active, ensure that it allows connections to port 8060.
Check network connectivity: If you're trying to connect to the Web UI from a different machine, ensure that the two machines can reach each other over the network. You can use tools like ping
to check this.
Check Trino server logs for any errors: Look for any error messages in the Trino server logs that might indicate a problem with starting the server or binding to the port.
mmh... this is exactly what i did...
And as mentioned above, port 8060 looks fine.
I think the problem is that it should run on 8080. At least you wrote in one of the answers it should run on 8080.
Or the other possibility, Trino CLI is misconfigured because it tries 8080 as port.
Is it possible that there is something wrong with the default setup?
Is it possible that Component Gateway uses to 8080 (like trino cli) and of course, this doesnt fit to the default configuration? (please keep in mind, i have not changed any property)
To ask it again. Do you know, how i can change this port via --property while creating the cluster?
It seems like the a mismatch in the configuration of the Component Gateway and Trino CLI might be causing the issue.
While creating the cluster, you should be able to specify custom properties using the --properties
flag followed by the property and value that you want to change.
The Component Gateway typically uses port 8080, and if Trino CLI is trying to connect to this port as well, it could lead to conflicts if your cluster is set up to use port 8060 instead.
If you need to change the port that Component Gateway uses, you should be able to do so with a command similar to the following:
gcloud dataproc clusters create cluster-name --properties=componentgateway.port=8080
You would replace "cluster-name" with the name of your cluster and "8080" with the port number you want to use.
If you need to change the port that Trino CLI uses, you would typically do so in the configuration file for Trino CLI. This file is usually located in the etc
directory of your Trino installation and is named config.properties
. You would change the http-server.http.port
property to the port number you want to use.
Before you make these changes, you should check to see if there are any other services running on the port you want to use, as this could also cause conflicts.
"If you need to change the port that Trino CLI uses, you would typically do so in the configuration file for Trino CLI. This file is usually located in the etc directory of your Trino installation and is named config.properties. You would change the http-server.http.port property to the port number you want to use."
I think this is my problem here... i dont want to change anything explicit... i just need to know whats wrong with the standard settings... im totally fine with the standard, but its simply not working.
Can you confirm that Trino (in your standard setting) is using port 8060, like i see it in the configuration files?
If it should not use this port, do you know why its using this port, even if i have not changed anything?
If it should use 8060, can you confirm that component gateway is using this port as well?
Additionally, do you know why trino CLI is not using 8060?
Kind regards,
Christian
To Clarify when Trino is enabled as an optional component within Google Cloud Dataproc, it is configured to use port 8060 by default. This configuration applies to the Trino server and Web UI, and it's specified for the cluster's first master node. If Kerberos is enabled, port 7778 is used instead.
For more details: https://cloud.google.com/dataproc/docs/concepts/components/trino
The reason Trino CLI tries to connect to port 8080 by default might be because Trino itself defaults to this port in a standalone setup. As we've seen, when integrated into Google Cloud Dataproc as an optional component, it's configured to use a different port (8060 or 7778 with Kerberos).
The Component Gateway is a proxy for services and does not enable direct access to node:port interfaces, such as localhost:8060
. It's used to access a specific subset of services automatically.
For more details: https://cloud.google.com/dataproc/docs/concepts/accessing/dataproc-gateways
To use the Trino CLI with the correct port, you would specify the server and port when launching the CLI, like so: ./trino --server http://<your-server>:8060
. This should allow the Trino CLI to connect to the Trino server on the correct port.