AWS Private Cloud Instance System Requirements

Number of Required Servers

POC Requirements

Purpose

Option 1 (GPU embedding)

Option 2 (CPU embedding)

Purpose

Option 1 (GPU embedding)

Option 2 (CPU embedding)

Gateway (Linux)

t3a.xlarge

t3a.2xlarge

GPU Embedder (Linux)

g4dn.xlarge

None. Embedding uses CPU of Gateway

Local LLM (Linux)

g6.xlarge

g6.xlarge

Dashboard / Ingestor

t3a.large

t3a.large

1x Linux Instance ( Gateway)

100 GB SSD disk

1x Linux GPU Instance (LLM)

g6.xlarge - 60GB SSD disk

1x Windows Instance (Dashboard/ Ingestor)

t3a.large- 80GB SSD disk

1x Microsoft SQL Server Express. May be co-located.

SSL certificate (optional)

Customer may provide an SSL certificate to secure access to the dashboard website.

Network Access

Ensure that the BGPT dashboard server has network access to any privately hosted services containing data that needs to be ingested. E.g. Data Center confluence.

 

 

Production

2+ GPU Linux Instances (1 for Gateway + AI model, 1 or more for extra AI model capacity)

2+ Windows Instances (Dashboard, Ingestor, Database). More depending on capacity + HA requirements.

1x Application Load balancer

1x Microsoft SQL Server (Express/Standard). May be co-located.

SSL certificate (optional)

Customer may provide an SSL certificate to secure access to the dashboard website.

Embedding

Embedding is the process of converting data ingested into the system into a format that can be efficiently stored and retrieved to produce answers to questions.

Embedding can be performed using a CPU or GPU. It is a one time process for each piece of data during ingestion. If the bulk of the customer’s data is ingested during the product onboarding it may be optimal to use a GPU for embedding at the beginning and then have further embedding take place with a CPU.

If significant amounts of data need to be embedded on a continual basic, a GPU for embedding may be desirable in addition to the GPU required for the LLM.

Here are some benchmarks for comparison.

Document

GPU time

Nvidia T4 (16GB)

g4dn.xlarge

One task at a time

CPU time

AMD EPYC 7571 2.5Ghz

t3a.2xlarge

Two tasks at a time

Document

GPU time

Nvidia T4 (16GB)

g4dn.xlarge

One task at a time

CPU time

AMD EPYC 7571 2.5Ghz

t3a.2xlarge

Two tasks at a time

JSON call transcript - Text document
Characters: 767,000

Words: 103,000

 

19 Seconds

587 Seconds

Microsoft Word document with graphics

Words: 13,000

Characters: 85,000

18 Seconds

805 Seconds

Large text only Microsoft Word document

Words: 630,000

Characters: 3,273,000

20 Mins, 26 Seconds

(1226 Seconds)

 

1,000 documents

Word/PDF

Av size 500 kb

 

 

Machine Types

EC2 Linux Instances

For LLM + Gateway Services

Nvidia Graphics card required for LLM, optional for Gateway Services

E.g. G6.xlarge

$0.803 / Hour (Llama 3) - 15.8 tokens/sec - Around 670$ (after Tax) for 24X7X30 days= full month.

1x required

 

EC2Windows Instances

For Dashboard, Database and Ingestor services.

E.g. m5a.large

Load Balancer

EC2 Application Load Balancer

Required if using more than one server for each component.

 

Minimum Required Permissions for AGAT

During Deployment

Continuous RDP/ SSH / all ports access to servers from AGAT offices static IP address

Optional - IAM access to deploy instances and set up security configuration

After Deployment

Temporary access for support from AGAT offices static IP address