AWS Private Cloud Instance System Requirements
Number of Required Servers
POC Requirements
Purpose | Option 1 (GPU embedding) | Option 2 (CPU embedding) |
---|---|---|
Gateway (Linux) | t3a.xlarge | t3a.2xlarge |
GPU Embedder (Linux) | g4dn.xlarge | None. Embedding uses CPU of Gateway |
Local LLM (Linux) | g6.xlarge | g6.xlarge |
Dashboard / Ingestor | t3a.large | t3a.large |
1x Linux Instance ( Gateway)
100 GB SSD disk
1x Linux GPU Instance (LLM)
g6.xlarge - 60GB SSD disk
1x Windows Instance (Dashboard/ Ingestor)
t3a.large- 80GB SSD disk
1x Microsoft SQL Server Express. May be co-located.
SSL certificate (optional)
Customer may provide an SSL certificate to secure access to the dashboard website.
Network Access
Ensure that the BGPT dashboard server has network access to any privately hosted services containing data that needs to be ingested. E.g. Data Center confluence.
Â
Â
Production
2+ GPU Linux Instances (1 for Gateway + AI model, 1 or more for extra AI model capacity)
2+ Windows Instances (Dashboard, Ingestor, Database). More depending on capacity + HA requirements.
1x Application Load balancer
1x Microsoft SQL Server (Express/Standard). May be co-located.
SSL certificate (optional)
Customer may provide an SSL certificate to secure access to the dashboard website.
Embedding
Embedding is the process of converting data ingested into the system into a format that can be efficiently stored and retrieved to produce answers to questions.
Embedding can be performed using a CPU or GPU. It is a one time process for each piece of data during ingestion. If the bulk of the customer’s data is ingested during the product onboarding it may be optimal to use a GPU for embedding at the beginning and then have further embedding take place with a CPU.
If significant amounts of data need to be embedded on a continual basic, a GPU for embedding may be desirable in addition to the GPU required for the LLM.
Here are some benchmarks for comparison.
Document | GPU time Nvidia T4 (16GB) g4dn.xlarge One task at a time | CPU time AMD EPYC 7571 2.5Ghz t3a.2xlarge Two tasks at a time |
---|---|---|
JSON call transcript - Text document Words: 103,000 Â | 19 Seconds | 587 Seconds |
Microsoft Word document with graphics Words: 13,000 Characters: 85,000 | 18 Seconds | 805 Seconds |
Large text only Microsoft Word document Words: 630,000 Characters: 3,273,000 | 20 Mins, 26 Seconds (1226 Seconds) | Â |
1,000 documents Word/PDF Av size 500 kb | Â | Â |
Machine Types
EC2 Linux Instances
For LLM + Gateway Services
Nvidia Graphics card required for LLM, optional for Gateway Services
E.g. G6.xlarge
$0.803 / Hour (Llama 3) - 15.8 tokens/sec - Around 670$ (after Tax) for 24X7X30 days= full month.
1x required
Â
EC2Windows Instances
For Dashboard, Database and Ingestor services.
E.g. m5a.large
Load Balancer
EC2 Application Load Balancer
Required if using more than one server for each component.
Â
Minimum Required Permissions for AGAT
During Deployment
Continuous RDP/ SSH / all ports access to servers from AGAT offices static IP address
Optional - IAM access to deploy instances and set up security configuration
After Deployment
Temporary access for support from AGAT offices static IP address