SphereShield Service Agent- How it works

Overview

Updated to agent 1.6.2

SphereShield Agent is a Windows Service to monitor other AGAT service and restart it if needed.
Monitored services can be:

Sip Filter (AgatSipFilter)
Bastion (for LAC, Teams Protector, Webex Protector filters)
Authentication Extender
CASB Adapter (AgatSphereShieldCasbAdapter)
Content Manager (AgatContentManagerService)
Maintenance Service (AgatSphereShieldMaintenanceService)
ADSync Adapter (AgatSphereShieldADSyncAdapter)

Monitoring operation consists of three main parts:

checking if monitored service is running in windows service management and start it if not running
checking in DB if monitored service is alive using Service Management mechanism - relevant for all services except Authentication Extender
only for Bastion: sending a health check request to the Bastion and its filters. If the Bastion and filters are not healthy, the agent will try to restart the Bastion service.
from 1.6.2, only for ADSync: if enabled, checking in the logs of the service if it contains the alive message

Service name: AgatSphereShieldServiceAgent[CustomerName]
Service display name: AGAT SphereShield Service Agent [Customer Name]

The agent is installed and configured by the installer.

To install the service manually (run as administrator):

> AgatSphereShieldServiceAgent.exe install

To uninstall the service (run as administrator):

> AgatSphereShieldServiceAgent.exe remove

Configuration

Configuration added in Version 1.6.2

There is an AgatSphereShieldServiceAgent.config file with configuration for the agent. The agent writes to a log file (default at C:\Agat\Logs\ServiceAgent\CustomerName]) and to Event Log with source "AGAT SphereShield Service Agent".

Logging Configuration

CustomerName - Can install multiple agents for different customers on the same machine and customer name should be different for each one.
LogFileFullName - The path to the agent logs. Need to replace AGAT with customer name. If installed with installer it does the work.
LogFileMaxSize - Defines the maximum size of the log file before the agent will clear out and create a new log.
LogFileLevel - The severity level of the logs generated by the agent. Possible values: off, fatal, error, warn, debug, info, all, alert, critical
EventLogLevel - The severity of the logs sent to the event viewer.

DB Connection Configuration

DBRequired - Agent can work without DB. This mode does not support portal UI operation - service management operation for remote restart and is designed mainly for Authentication Extender monitoring. To work without DB set DBRequired to false.
ConnectionString - This is needed when DBRequired is set to true. Need to replace values of SQLSERVER, DataBaseName, username, password.
Key/IV - AES encryption keys needed when DBRequired is set to true.

Monitored service Configuration

ServiceName - name of the service that agent will monitor. Possible values: AgatSipFilter, Bastion, AgatSphereShieldCasbAdapter[CustomerName], AgatContentManagerService
CheckServiceAliveInLog - true/false, enable log file monitoring for alive message. Checks in the logs of the service if it contains the text “[IS ALIVE]” in the last time set in the next setting. If this is not detected, the agent will attempt to restart the service.
For now, this feature is supported only with AdSync version 1.2.0.2
CheckServiceAliveInLogMinutes - how often check the alive message in the service log.
For now, this feature is supported only with AdSync version 1.2.0.2
ServiceRestartTimeoutSeconds - How long should the agent wait for restart to complete. If it does not manage to start, the agent will create event in the event log for manual operation to be done.
ServiceMonitorFrequencySeconds - Define how often will the monitoring happen (in seconds).
Note: Restart will occur only after ServiceMonitorNumberOfAttemptsBeforeRestart consecutive failures. Therefore cycle time should be configured accordingly.
If ConnectionString is set, this setting will be ignored as the relevant value will be read from DB.
ServiceMonitorNumberOfAttemptsBeforeRestart - Number of checks before service restart.
If ConnectionString is set, this setting will be ignored as the relevant value will be read from DB.
MinutesToWaitBetweenRestarts - How many minutes to wait after restart before the next restart in order to avoid continuous restarts in a situation in which restart does not help.
This does not affect the time set how often to check - ServiceMonitorFrequencySeconds.

Ethical Wall load Configuration- SIP Filter only

MonitorEthicalWallLoad - Ethical Wall load monitoring - relevant for SIP Filter only
MonitorEthicalWallLoadFrequencyMinutes - Define how often will the Ethical Wall load monitoring happen

Bastion healthcheck Configuration - Bastion only

BastionForwardProxy - Set to true if Bastion is running as Forward proxy, false if Bastion is running as Reverse proxy.
BastionIp - Bastion IP for the healthcheck request. If the Agent is installed on the Bastion use localhost address. Make sure to use a port which the Bastion listens to (and is used by the required channel).
Note: Default port is 443 for Reverse Proxy and 80 for Forward Proxy.
If a port other than the default is used, please add :<portnumber> to the end of the IP.
BastionHealthcheckHost - The host to whom the health check request will be sent to.
BastionMaxHealthcheckLatencyMilliseconds - maximum latency for the health check response. Set 0 to disable latency check.

Bastion & LAC Filter troubleshooting

TroubleshootingOutputFolder - Folder for output of troubleshooting procedure, will include archive of log files.
TroubleshootingSplitIntoVolumes - Set to true to split troubleshooting archive into volumes, useful for email attachments
TroubleshootingSplitVolumeSize - Size of troubleshooting archive split volume in MB.
TroubleshootingDaysRange - Number of last days to include in troubleshooting archive.

Email notifications to admin Configuration

Settings for admin notification when the agent detects an issue.

EmailIssues - for which type of issues should an email be sent. you can set the following values: all, dbConnectionFailure, bastionDbConnectionFailure, restartFailure, restartSuccess.
Multiple values may be configured by comma, may be left empty to disable emailing at all.
Note that for any value except empty - SMTP should be configured in DB for DB mode or in the following settings.

If ConnectionString is set, no need to set the following SMTP configuration settings as they are read from DB.

SMTP Hostname: SMTP server Address.
SMTP Port: the port the SMTP server is listening on.
SMTP Account Name: Sender Address for the Agent.
SMTP Account Password: If SMTP requires authentication, this is the password for the sender account.
SMTP Requires SSL: Change to True if the SMTP server requires TLS/SSL.
SMTP Requires Authentication: Change to True if the SMTP server requires authentication
SMTP Mail Recipient: Administrator e-mail to receive notifications from the agent, can be multiple emails separated by , or ;
SMTP_Sending_Frequency - The frequency in which a mail notification will be sent.
This value depends on the "Service Monitoring Frequency (seconds)" value in Admin Portal (ServiceMonitorFrequencySeconds setting).
For example, if ServiceMonitorFrequencySeconds is set to 60 seconds and SMTP_Sending_Frequency is set to 10, the agent will send mail when issue detected and than additional mail every 10 min ( 60X10 = 600 sec = 10 min )

Email notifications to support Configuration

Settings for support notification when the agent detects an issue.

SupportEmailIssues - for which type of issues should an email be sent. you can set the following values: all, dbConnectionFailure, bastionDbConnectionFailure, restartFailure, restartSuccess.
Multiple values may be configured by comma, may be left empty to disable emailing at all.
Note that for any value except empty - SMTP should be configured in the following settings.

SMTP settings for support team notification are the same as SMTP settings for admin, starting with Support prefix. Note that these settings are set only in config file and not in DB.

Monitoring Processing in detail

The agent runs the monitoring each defined number of seconds (default 60) and does the following:

[DB mode] write agent alive time in service management table for monitored service row
check if monitored service is running and start it if not
[DB mode] check if monitored service is alive in service management table
If log alive enabled, check if alive message ([IS ALIVE]) appears in the service log since the last check
[Bastion] check if Bastion and filters are OK:
- Bastion healthcheck procedure:
  - for forward proxy:
    request https://[BastionHealthcheckHost]/healthcheck with proxy BastionIP
    for example https://test.skypeshield.com/teams_protection/healthcheck with proxy 127.0.0.1
  - for reverse proxy:
    request https://[BastionIp]/skypeshieldhealth with host header BastionHealthcheckHost
    for example https://127.0.0.1/skypeshieldhealth with host header test.skypeshield.com
  - if received HTTP 200 status code (during response time of BastionMaxHealthcheckLatencyMilliseconds if set not to 0) - Bastion and filters are OK (no restart is done)
  - if received other HTTP status or error/exception - except statuses 404 (Not Found), 403 (Forbidden) and 401 (Unauthorized) - will try to restart Bastion service after 3 consecutive failures every 10 seconds - only if already in production mode.
- if healthcheck result not OK and the agent is in production mode (received 5 sequence OK results) - consider healthcheck as not passed
- otherwise if healthcheck result is OK or the agent not in production mode (not received 5 sequence OK results) - consider healthcheck as passed
- the agent will go into production mode (restart on error) only after receiving good result for 5 times indicating the correct operation to avoid misconfiguration in install.
If alive check or bastion healthcheck not passed - restart the monitored service
If failed to start the service X (X = ServiceMonitorNumberOfAttemptsBeforeRestart) times - kill the monitored service

Service Management Processing

https://agatsoftware.atlassian.net/wiki/spaces/SKYP/pages/2599157770/SERVICE+MANAGEMENT - table schema

Agent receives commands through database table Service_Management and performing the required command on the monitored service. The commands are sent by Admin Portal from Service Management page, or troubleshooting commands are sent from Troubleshooting wizard. The agent listens to the table and starts to perform the command when the Operation field is changed in the table.

Available commands are:

RESTART - restart the monitored service
START- start the monitored service
STOP - stop the monitored service
RESTART_AGENT - restart the agent itself
START_TRBL - start troubleshooting process
FINISH_TRBL- finish the troubleshooting process
CRITICAL - service entered critical state
When SipFilter Write to DB - "Critical State", the agent needs to do the following:
1. Shutdown SipFilter
2. Send mail about it to admin
3. Write into the event viewer

Troubleshooting Processing

Troubleshooting is available only for Bastion with LAC filter.
More details here: https://agatsoftware.atlassian.net/wiki/spaces/SKYP/pages/1126367233

SphereShield CASB Knowledge Base