Overview
Updated to agent 1.6.2
SphereShield Agent is a Windows Service to monitor other AGAT service and restart it if needed.
Monitored service services can be:
Sip Filter (AgatSipFilter)
Bastion (for LAC, Teams Protector, Webex Protector filters)
Authentication Extender
Casb CASB Adapter (AgatSphereShieldCasbAdapter)
Content Manager (AgatContentManagerService)
Maintenance Service (AgatSphereShieldMaintenanceService)
ADSync Adapter (AgatSphereShieldADSyncAdapter)
...
checking if monitored service is running in windows service management and start it if not running
checking in DB if monitored service is alive using Service Management mechanism - relevant for all services except Authentication Extender
only for Bastion: sending a health check request to the Bastion and its filters. If the Bastion and filters are not healthy, the agent will try to restart the Bastion service.
from 1.6.2, only for ADSync: if enabled, checking in the logs of the service if it contains the alive message
...
[DB mode] write agent alive time in service management table for monitored service row
check if monitored service is running and start it if not
[DB mode] check if monitored service is alive in service management table
If log alive enabled, check if alive message ([IS ALIVE]) appears in the service log since the last check
[Bastion] check if Bastion and filters are OK:
Bastion healthcheck procedure:
for forward proxy:
request https://[BastionHealthcheckHost]/healthcheck with proxy BastionIP
for example https://test.skypeshield.com/teams_protection/healthcheck with proxy 127.0.0.1for reverse proxy:
request https://[BastionIp]/skypeshieldhealth with host header BastionHealthcheckHost
for example https://127.0.0.1/skypeshieldhealth with host header test.skypeshield.comif received HTTP 200 status code (during response time of BastionMaxHealthcheckLatencyMilliseconds if set not to 0) - Bastion and filters are OK (no restart is done)
if received other HTTP status or error/exception - except statuses 404 (Not Found), 403 (Forbidden) and 401 (Unauthorized) - will try to restart Bastion service after 3 consecutive failures every 10 seconds - only if already in production mode.
if healthcheck result not OK and the agent is in production mode (received 5 sequence OK results) - consider healthcheck as not passed
otherwise if healthcheck result is OK or the agent not in production mode (not received 5 sequence OK results) - consider healthcheck as passed
the agent will go into production mode (restart on error) only after receiving good result for 5 times indicating the correct operation to avoid misconfiguration in install.
If alive check or bastion healthcheck not passed - restart the monitored service
If failed to start the service X (X = ServiceMonitorNumberOfAttemptsBeforeRestart) times - kill the monitored service
...
RESTART - restart the monitored service
START- start the monitored service
STOP - stop the monitored service
RESTART_AGENT - restart the agent itself
START_TRBL - start troubleshooting process
FINISH_TRBL- finish the troubleshooting process
CRITICAL - service entered critical state
When SipFilter Write to DB - "Critical State", the agent needs to do the following:Shutdown SipFilter
Send mail about it to admin
Write into the event viewer
Troubleshooting Processing
...