Useful commands for "Too many open files" errors on an MP due to connections

If you are seeing a "Too many open files" errors in your Message Processor logs, or if you believe one or more of your Message Processors has a larger number of connections (or requests) from routers than it should have, the following list of commands may be helpful.

Use lsof on the Message Processor, to count the number of connections to each IP address:

# Use "ps -ef | grep java" or similar, to get the PID of your MP process
# And replace $MP_PID below with the PID value:
lsof -ai -p $MP_PID |awk '{print $9}'|awk -F'[>:]' '{print $3}'|sort|uniq -c|sort -nr

Use hmstatus on routers that might have too many connections to the MP - to check connectivity between routers and MPs:

# From the terminal of one or more of the routers, run:
curl http://0:9080/hmstatus?format=csv > hmstatus_out.csv

# If the file is large, count the number of times each MP appears, with:
cat hmstatus_out.csv | sed 's/8998_/8998\n/g' | less | sort | uniq -c | sort -n | tail -n 10 | sort -r 

List all of MP servers, and check for removed MP UUIDs that might still be lurking that have the problematic MPs IP address (and use the steps here to properly remove any that are still being listed):

# Replacing the adminEmail and adminPword with the appropriate values:
curl -u adminEmail:adminPword http://MS_IP:8080/v1/servers?pod=gateway

Depending on the exact nature of the problem, resolving the issue may require a router restarts, removing bad MP UUIDs, or adjusting file descriptor limits (as described here).

Version history
Last update:
‎02-24-2021 08:25 AM
Updated by: