Intermittent connection Issues with Cloud DNS and load balancer

We've been encountering intermittent issues with our app (URL Removed by Staff). The problem seems to stem from the IP address associated with our load balancer. While this issue initially surfaced about 6-7 months ago, its frequency has recently escalated significantly.

image (3).png

0 4 191
4 REPLIES 4

Hi @akashfp ,

It seems the error is telling there is an issue with the connection of your app to the front-end, which is hosted at the IP address 35.227.216.49 on port 443 (which is typically used for HTTPS traffic). Also, the error indicates that it is trying to connect on a random port of range 49304. (This is coming from  172.16.0.56 )

It would be helpful if you can check both logs of the backend and the load balancer, as it contains information that can help diagnose the issue. You can refer to this documentation on how to view the logs for the load balancer and backend. 

I would also recommend checking the load balancer configurations, especially if the health checks are working correctly.


@akashfp wrote:

While this issue initially surfaced about 6-7 months ago, its frequency has recently escalated significantly.


For this case, the backend servers may be overloaded, or there could be issues with the application code running on the backend servers.

Again, the logs from load balancers and backend will help us pinpoint the cause of the connection error. For now what I provided are the general steps or areas that you can check to isolate the issue. 

Hi @Marvin_Lucero , thanks for reply,
We have checked logs on both load balancer (resource.type="http_load_balancer") and servers (k8s pods), but not able to find logs related to these issues. Also these requests are failing Intermittently for random API, here are some examples (we are tracking these API errors from our front-end to bigquery)

akashfp_0-1713953713643.png

 



without having full access to logs and network info, and I can ping both IPV4 and IPV6 addresses of front.page from network, i do have a question.  It seems you have a backend on a private net and your fronted on a public net.  If so, why are you trying to connect the two over the public networks?  Why not stay private?

That being said, check firewalls and routes on the backend networks, make sure they can get out.  Also, make sure those log entries are not coming from 1 single node, if so, kill it and bring up another one, it could have gotten stuck or something.

Hi @IveGotIt , Actually our frontend is Android app and the above logs are being sent by Android to BigQuery using firebase crashlytics, that is they are not from our backend infra.   
Coming to you second point I've checked firewall rules and Cloud Armor policies and tried testing them on my own IP, the error they give are different than the issue and seems like working fine.
Moreover these problem only Intermittent, after some retry it works on devices, so I don't think the problem exist in any routing on backend networks.