Each container should be assigned a separate GPU.
A GPU that is intended for use with the LLM may be used for embedding during a pre rollout phase to speed up the ingestion of a company's initial dataset.
For higher resiliency, each Gateway container can be deployed to a separate host
Single instances of SQL and Vector databases will need to be shared by all Gateway containers

SQL

...

+ Vector DB Servers

Highly available SQL Server and Postgres VectorDB deployments are supported.

...