...
Each container should be assigned a separate GPU.
A GPU that is intended for use with the LLM may be used for embedding during a pre rollout phase to speed up the ingestion of a company's initial dataset.
For higher resiliency, each Gateway container can be deployed to a separate host
Single instances of SQL and Vector databases will need to be shared by all Gateway containers
SQL
...
+ Vector DB Servers
Highly available SQL Server and Postgres VectorDB deployments are supported.
...