MSV Operational Status and Monitoring

This post will discuss briefly the operational readiness tests in MSV, as this feature is sometimes overlooked.

There are several situations that could degrade an MSV deployment, for example ;

  1. Time skew : The MSV actor host relies on its time for the matching algorithm between the actions executedand the returned events from an integration. Gradually the MSV actors time would skew from the director's time causing possible time mismatching betwee the events and host time.
  2. Integrations Failing : For example this could happen when a service account expires, or the target integration platform is down. This would cause the integrations events detection to fail thus impacting the actions result.

MSV has already built-in features to detect and possibly alert for performance impacting issues like this by following these steps ;

  1. Activate the different test on https://<MSV URL>/settings/operational_status  . There are different test available including Director/Remote Integration Test and Actor Time Skew Test to address the previously discussed issue.
    AbdElHafez_0-1735593633957.png
    Also make sure to add an Email/Syslog/Webhook notification to the systsem owners for at least the failures on the same panel.
    AbdElHafez_10-1735595724789.png
    These tests will run periodically and populate a status dashboard https://<MSV URL>/operational_status . 

    AbdElHafez_1-1735594911692.png

    AbdElHafez_3-1735594967077.png

    AbdElHafez_4-1735595024029.png

    AbdElHafez_7-1735595058620.png
    AbdElHafez_2-1735594963843.png

  2. Run or schedule the Operational Readiness Actions : These are a set of benign actions that attempt basic activities like executing "whoami" or downnloading txt files via TCP. These actions are listed per actor in https://<MSV URL>/operational_readiness  
    AbdElHafez_8-1735595237226.png
  3. Check for any error or time skew problems in https://<MSV URL>/settings/system . This panel will indicate if there are any time skews that could be fixed, in addition to the current actor version. 
    AbdElHafez_9-1735595580912.png
  4. Check the actors status in https://<MSV URL>/topology/nodes for any offline or non-communicating actors. 
0 0 173
0 REPLIES 0