lsb_release -a of GCP Compute Engine(VM):
No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.5 LTS Release: 20.04 Codename: focal
This seems to be network issues between Redis cluster (Deployed on our site's on-prem worker node) and Redis client (On GCP VM that is connected to our site via Cloud VPN). Specifically, the issue is related to the unresponsiveness of the Redis clients (i.e. redis-py or redis-cli) when using the info stats / info memory commands (Large TCP Packets) while connected to the Redis cluster deployed on our site's node.
Like above screenshot, when the Redis cluster is deployed within the Google Compute Engine (GCP) VMs (So at the same place as where redis client is), the info stats / info memory commands work without any issues. (Note how 1,491 bytes for the info memory response)
However, when the Redis cluster is deployed on the site's worker node, the Redis client hangs indefinitely when using the same commands, and the response from the server includes [TCP Previous segment not captured] Response: [fragment] [fragment] in the packet dump (Shown in the above screenshot).
After reading (https://cloud.google.com/vpc/docs/mtu#handling_of_packets_that_exceed_mtu) (Below screenshot), I first thought it could be a problem regarding the MTU on Google VPC because it mentions that IP fragmentation is not supported in TCP.
Thus, my thinking flow was something like since we have more network layers to go through because of VPN, it means more bytes in the response -> And this exceeds the MTU limit -> So the client hangs. To be honest, I am not even sure whether I am on the right track. Valid MTU(https://cloud.google.com/vpc/docs/mtu#valid_mtus) are already specified however just to see if it changes anything, I also tried changing the VPC MTU to Jumbo (8,896 bytes) but did not help at all.
It is worth noting that all other info commands and any other commands work just fine for the same Redis cluster.
Therefore, the issue at hand seems to be network related issue, but I am unable to troubleshoot.
I would appreciate any help!
Hi @imageschool
Based on the screenshots that you have shared, there is no issue from GCP end. End to end connection is being acknowledged. This should be the intended behavior. Now, if you are changing the VPC MTU to Jumbo (8,896 bytes), I believe this can only work for clients on the same VPC network or peered VPC using the same network VPC peering as stated on this documentation.
This needs a further investigation on your project to check the configurations as well as the logs on your GCP console. I recommend you to contact GCP support and create a support ticket.