Number of Required Servers

POC Requirements

Purpose	Option 1 (GPU embedding)	Option 2 (CPU embedding)

Purpose	Option 1 (GPU embedding)	Option 2 (CPU embedding)
Gateway (Linux)	t3a.xlarge	t3a.2xlarge
GPU Embedder (Linux)	g4dn.xlarge	None. Embedding uses CPU of Gateway
Local LLM (Linux)	g6.xlarge	g6.xlarge
Dashboard / Ingestor	t3a.large	t3a.large

1x Linux Instance ( Gateway)

100 GB SSD disk

1x Linux GPU Instance (LLM)

g6.xlarge - 60GB SSD disk

1x Windows Instance (Dashboard/ Ingestor)

t3a.large- 80GB SSD disk

1x Microsoft SQL Server Express. May be co-located.

SSL certificate (optional)

Customer may provide an SSL certificate to secure access to the dashboard website.

Network Access

Ensure that the BGPT dashboard server has network access to any privately hosted services containing data that needs to be ingested. E.g. Data Center confluence.

Production

2+ GPU Linux Instances (1 for Gateway + AI model, 1 or more for extra AI model capacity)

2+ Windows Instances (Dashboard, Ingestor, Database). More depending on capacity + HA requirements.

1x Application Load balancer

1x Microsoft SQL Server (Express/Standard). May be co-located.

SSL certificate (optional)

Customer may provide an SSL certificate to secure access to the dashboard website.

Embedding

Embedding is the process of converting data ingested into the system into a format that can be efficiently stored and retrieved to produce answers to questions.

Embedding can be performed using a CPU or GPU. It is a one time process for each piece of data during ingestion. If the bulk of the customer’s data is ingested during the product onboarding it may be optimal to use a GPU for embedding at the beginning and then have further embedding take place with a CPU.

If significant amounts of data need to be embedded on a continual basic, a GPU for embedding may be desirable in addition to the GPU required for the LLM.

Here are some benchmarks for comparison.

Document	GPU time Nvidia T4 (16GB) g4dn.xlarge One task at a time	CPU time AMD EPYC 7571 2.5Ghz t3a.2xlarge Two tasks at a time

Document	GPU time Nvidia T4 (16GB) g4dn.xlarge One task at a time	CPU time AMD EPYC 7571 2.5Ghz t3a.2xlarge Two tasks at a time
JSON call transcript - Text document Characters: 767,000 Words: 103,000	19 Seconds	587 Seconds
Microsoft Word document with graphics Words: 13,000 Characters: 85,000	18 Seconds	805 Seconds
Large text only Microsoft Word document Words: 630,000 Characters: 3,273,000	20 Mins, 26 Seconds (1226 Seconds)
1,000 documents Word/PDF Av size 500 kb

Machine Types

EC2 Linux Instances

For LLM + Gateway Services

Nvidia Graphics card required for LLM, optional for Gateway Services

E.g. G6.xlarge

$0.803 / Hour (Llama 3) - 15.8 tokens/sec - Around 670$ (after Tax) for 24X7X30 days= full month.

1x required

EC2Windows Instances

For Dashboard, Database and Ingestor services.

E.g. m5a.large

Load Balancer

EC2 Application Load Balancer

Required if using more than one server for each component.

Minimum Required Permissions for AGAT

During Deployment

Continuous RDP/ SSH / all ports access to servers from AGAT offices static IP address

Optional - IAM access to deploy instances and set up security configuration

After Deployment

Temporary access for support from AGAT offices static IP address

BusinessGPT AI Governance & Security

AWS Private Cloud Instance System Requirements