One common use case that I have seen at several customers is that they want their SecOps or Operations folks have the ability to monitor that MSV Actors are up and available.
To prevent false alarms, a script was created to "ping" the actors. The script can be put on a scheduled CRON or be ran manually as a part of the playbook. Feel free to take a look at the Python below and use as needed. For simplicity, I did not include the code to create the HTTPs session to the Director.
# Get the list of non-protected actors
def setup_data_fields():
global nodeData, evalData
response = session.get(f'https://{director_ip}/topology/nodes.json')
if response.status_code != 200:
print('Unable to get node information from director.')
sys.exit(-1)
nodeData = json.loads(response.text)
# get the list of protected actors
def setup_data_fields2():
global nodeData, evalData
response = session.get(f'https://{director_ip}/topology/protected.json')
if response.status_code != 200:
print('Unable to get node information from director.')
sys.exit(-1)
nodeData = json.loads(response.text)
#Just what it says - "Ping the actor"
def refreshTheActor(myID = 0):
response = session.get(f'https://{director_ip}/topology/nodes/{myID}/pull_info')
print (response)
if response.status_code != 422:
print('Unable to refresh the actor.')
sys.exit(-1)
# ping actors that have not checked in in the last 5 minutes
currentRefreshTime = datetime.now() - timedelta(minutes=5)
date_format = "%Y-%m-%dT%H:%M:%S.%fZ"
setup_data_fields()
for myNode in nodeData['registered']:
idNumber = myNode['id']
print(idNumber)
if ((datetime.strptime(myNode['last_comms'],date_format)) > currentRefreshTime):
#add your ALERT HERE
refreshTheActor(idNumber)
setup_data_fields2()
for myNode in nodeData['protected_nodes']:
idNumber = myNode['id']
print(idNumber)
if ((datetime.strptime(myNode['last_comms'],date_format)) > currentRefreshTime):
#add your ALERT HERE
refreshTheActor(idNumber)
Hi @TomAtGoogle ,
I am not sure I understand the difference between this code and what can be achieved with the โOperational Statusโ page in the Director.
When the โCommunications Testโ returns โUnable to communicate with Actorโ, does that mean that the same communications test you entered in your script has failed?
Thank you,
Paolo
Hi @TomAtGoogle , more details about my previous question.
I experience some times a strange stituation like the following:
I get an alert from the Director that one or more actors are not communicating with it.
As soon as I check, actor shows Last Comm "A minute ago". See for example the screenshot.
Last check of the communication test is 06:18UTC. I took the screenshot at 8:19 CEST (6:19 UTC), qjust a minute after the communication test. The last comms shows "A minute ago". The message passed on by the Director is ambiguous to say the least.
So, I'm interested in understanding if the test you suggest with your script is more reliable than the one executed by the Director.
Thank you,
Paolo
Hi Paolo,
My script simply forces a refresh between the Actor and the Director. There are times when an actor doesn't check in as scheduled or the Director doesn't register the check in. This "ping" just forces a sync between the two of them.
That being said, operational status checks only happen once a day unless the default configuration is changed. If it fails at 6:18 am, it will stay failed until the next day even if the actor comes back online.
Ok @TomAtGoogle , thanks for the explanation.