Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Overview

Updated to agent 1.6.2

SphereShield Agent is a Windows Service to monitor other AGAT service and restart it if needed.
Monitored service services can be:

  • Sip Filter (AgatSipFilter)

  • Bastion (for LAC, Teams Protector, Webex Protector filters)

  • Authentication Extender

  • Casb CASB Adapter (AgatSphereShieldCasbAdapter)

  • Content Manager (AgatContentManagerService)

  • Maintenance Service (AgatSphereShieldMaintenanceService)

  • ADSync Adapter (AgatSphereShieldADSyncAdapter)

...

  • checking if monitored service is running in windows service management and start it if not running

  • checking in DB if monitored service is alive using Service Management mechanism - relevant for all services except Authentication Extender

  • only for Bastion: sending a health check request to the Bastion and its filters. If the Bastion and filters are not healthy, the agent will try to restart the Bastion service.

  • from 1.6.2, only for ADSync: if enabled, checking in the logs of the service if it contains the alive message

...

  • [DB mode] write agent alive time in service management table for monitored service row

  • check if monitored service is running and start it if not

  • [DB mode] check if monitored service is alive in service management table

  • If log alive enabled, check if alive message ([IS ALIVE]) appears in the service log since the last check

  • [Bastion] check if Bastion and filters are OK:

    • Bastion healthcheck procedure:

      • for forward proxy:
        request https://[BastionHealthcheckHost]/healthcheck with proxy BastionIP
        for example https://test.skypeshield.com/teams_protection/healthcheck with proxy 127.0.0.1

      • for reverse proxy:
        request https://[BastionIp]/skypeshieldhealth with host header BastionHealthcheckHost
        for example https://127.0.0.1/skypeshieldhealth with host header test.skypeshield.com

      • if received HTTP 200 status code (during response time of BastionMaxHealthcheckLatencyMilliseconds if set not to 0) - Bastion and filters are OK (no restart is done)

      • if received other HTTP status or error/exception - except statuses 404 (Not Found), 403 (Forbidden) and 401 (Unauthorized) - will try to restart Bastion service after 3 consecutive failures every 10 seconds - only if already in production mode.

    • if healthcheck result not OK and the agent is in production mode (received 5 sequence OK results) - consider healthcheck as not passed

    • otherwise if healthcheck result is OK or the agent not in production mode (not received 5 sequence OK results) - consider healthcheck as passed

    • the agent will go into production mode (restart on error) only after receiving good result for 5 times indicating the correct operation to avoid misconfiguration in install.

  • If alive check or bastion healthcheck not passed - restart the monitored service

  • If failed to start the service X (X = ServiceMonitorNumberOfAttemptsBeforeRestart) times - kill the monitored service

...

  • RESTART - restart the monitored service

  • START- start the monitored service

  • STOP - stop the monitored service

  • RESTART_AGENT - restart the agent itself

  • START_TRBL - start troubleshooting process

  • FINISH_TRBL- finish the troubleshooting process

  • CRITICAL - service entered critical state
    When SipFilter Write to DB - "Critical State", the agent needs to do the following:

    1. Shutdown SipFilter

    2. Send mail about it to admin

    3. Write into the event viewer

Troubleshooting Processing

...