Re: Cloud Run mount error: The NFS server may not ...

maxxer · 11-29-2024 07:42 AM

Hi.

I've a Cloud Run service where I mount two volumes from a VM implementing an NFS server. It's very simple and straightforward, no fancy config.

The VM and the CR service are on the same subnet, there's a firewall rule allowing TCP+UDP/2049 from CR to VM. The volumes seems to be mounted correctly, ss -tuna | grep :2049 on the server shows open connections from several IPs.

Also, when I start a new revision, if I type an invalid mountpoint it fails to start, so I expect the configuration and the connection to be correct.

I made test on a VM in the same subnet of the NFS and of the Cloud Run instance, the NFS shares can be mounted, files listed, created and deleted.

Unfortunately, when the CR application (PHP Symfony app) attempts to access a mounted path it crashes and I get the following error in the logs:

textPayload: "terminated: Application failed to start: container 1: failed to mount volume (type: nfs, name: datastore-staging): The NFS server may not be reachable. Check your VPC connectivity and firewall settings."

I'm struggling to debug the issue, and I ran out of ideas on what could be wrong in the setup. Any hint on what could be wrong or what could be used to debug is welcome.

thanks

kensan

Hi @maxxer ,

Welcome to Google Cloud Community!

Here are some possible reasons and troubleshooting that may help you further to debug the issue:

Verify your Cloud Run IAM roles that have the correct permission to access the VPC resources.
You may also need to open the TCP and UDP port 111 (RPC) to run the RPC portmapper in Linux.
You may also check this link for implementing NFS on GCP guide.

If the issue still persists and you need further assistance, you can file a ticket with our Google Cloud Support.

I hope the information above is helpful.

maxxer

Hi Kensan, thank you for your reply.

The NFS shares are actually mounted, and I can access them from within the app. But occasionally I get the NFS mount error in the logs, and I cannot figure out where it comes from. As I said, the NFS works and it's correctly mounted.

My guess is that the NFS server is not able to handle the client capacity, but the application is very low traffic and it shouldn't really create an overload on the NFS server. It has a server load around 0.10 and 30% RAM occupation. I also increased the number of NFS threads from the default 8 to 24, but nothing changed.

maxxer

I did some debugging on the NFS side, I increased grace and reduced lease, increased threads number but the issue persists.

I've set the maximum number of CR instances to be 1, to exclude overload or concurriencies that could somehow impact the NFS server, but no change.

Around every ~3/4 minutes I get the error in the Cloud Run logs. It seems like when CR spins up a new container or recycle existing ones the error is thrown.

As said, running container have no problems in mounting and accessing the NFS positions.

maxxer

I solved with the help of this thread: modify the firewall rule to use a subnet instead of a network tag as a source.

Cloud Run mount error: The NFS server may not be reachable. Check your VPC connectivity and firewall