Hello Guys,
I'm experiencing an issue with SSH access between two GCP VMs. When I frequently connect from one GCP VM to another using the external IP (e.g. Ansible playbook, the target VM was created using Terraform) within a short period, the SSH connection starts timing out. This issue persists regardless of the network conditions, and it’s worth noting that the target VM has very low resource utilization. Out of approximately 100 SSH attempts, about 2 to 3 result in a timeout.
Has anyone else encountered similar problems when accessing a GCP VM from another GCP VM via external IP, especially when the target VM’s resources are barely being used? If so, how did you resolve it? Any insights or suggestions to prevent these frequent SSH timeouts would be greatly appreciated.
I test with command line:
for i in {1..100}; do ssh -o IPQoS=none -vvv root@<external ip> 'echo Hello, world!'; echo ${i}; done
Stuck in:
OpenSSH_8.9p1 Ubuntu-3ubuntu0.10, OpenSSL 3.0.2 15 Mar 2022
debug1: Reading configuration data /etc/ssh/ssh_config
debug3: /etc/ssh/ssh_config line 19: Including file /etc/ssh/ssh_config.d/50-cloudimg-settings.conf depth 0
debug1: Reading configuration data /etc/ssh/ssh_config.d/50-cloudimg-settings.conf
debug1: /etc/ssh/ssh_config line 21: Applying options for *
debug2: resolve_canonicalize: hostname <external ip> is address
debug3: expanded UserKnownHostsFile '~/.ssh/known_hosts' -> '/root/.ssh/known_hosts'
debug3: expanded UserKnownHostsFile '~/.ssh/known_hosts2' -> '/root/.ssh/known_hosts2'
debug3: ssh_connect_direct: entering
debug1: Connecting to <external ip> [<external ip>] port 22.
Thanks in advance for your help!
Dear TestMan,
For SSH issue, I think the cause could be stemmed to many factors. Based on your explanations it looks like the connection is intermmittent between the source-destination VM. Perhaps the troubleshooting I take would be:
1. Create debug VM in the same subnet as the destination VM, and allow the SSH connection to target VM. Try to SSH and reproduce the issue you described before via internal IP. See whether it does have same result or not.
2. Try to conduct network testing both using debug VM (use internal IP) and original source VM (use external IP), use the mtr. Try, mtr --tcp --port 22 IP-ADDRESS -zb
3. If mtr does not have packet loss, therefore it should be SSH issue. If the mtr does have packet loss, its either something blocking the network access (VPC Firewall, OS-level Firewall, etc.) or might be the network behind the external IP address of your VM is having a problem. You can also conduct network test by starting a webserver in port 80 and try to test that port whether having same issue or not.
4. To determine whether the network behind the external IP address having a problem or not, you can check GCP bulletin or https://status.cloud.google.com/?hl=JA or just try to cycling the IP (get new external IP) or changing the zone of your VM.
5. Based on my experiences, cycling External IP address works in some disruption cases. Try to cycle IP between different IP ranges.
Regards,
Izza