Number of Required Servers

POC Requirements

Purpose	Option 1 (GPU embedding)	Option 2 (CPU embedding)
Gateway (Linux)	t3a.xlarge	t3a.2xlarge
GPU Embedder (Linux)	g4dn.xlarge	None. Embedding uses CPU of Gateway
LLM (Linux)	g6.xlarge	g6.xlarge
Dashboard	t3a.large	t3a.large

1x Linux Instance ( Gateway)

t3a.2xlarge - 100 GB SSD disk

1x Linux GPU Instance (LLM)

...

Customer may provide an SSL certificate to secure access to the dashboard website.

Embedding

Embedding is the process of converting data ingested into the system into a format that can be efficiently stored and retrieved to produce answers to questions.

Embedding can be performed using a CPU or GPU. It is a one time process for each piece of data during ingestion. If the bulk of the customer’s data is ingested during the product onboarding it may be optimal to use a GPU for embedding at the beginning and then have further embedding take place with a CPU.

If significant amounts of data need to be embedded on a continual basic, a GPU for embedding may be desirable in addition to the GPU required for the LLM.

Here are some benchmarks for comparison.

Document	GPU time Nvidia T4 (16GB) g4dn.xlarge One task at a time	CPU time AMD EPYC 7571 2.5Ghz t3a.2xlarge Two tasks at a time
JSON call transcript - Text document Characters: 767,000 Words: 103,000	19 Seconds	587 Seconds
Microsoft Word document with graphics Words: 13,000 Characters: 85,000	18 Seconds	805 Seconds
Large text only Microsoft Word document Words: 630,000 Characters: 3,273,000	20 Mins, 26 Seconds (1226 Seconds)
1,000 documents Word/PDF Av size 500 kb

Machine Types

EC2 Linux Instances

...

E.g. G6.xlarge

$0.803 / Hour (Llama 3) - 15.8 tokens/sec - Around 670$ (after Tax) for 24X7X30 days= full month.

1x required

EC2Windows Instances

...

Versions Compared

Old Version 10

New Version Current

Key

Number of Required Servers

POC Requirements

Embedding

Machine Types

EC2 Linux Instances

EC2Windows Instances

Page Comparison

Versions Compared

Old Version 10

New Version Current

Key

Number of Required Servers

POC Requirements

Embedding

Machine Types

EC2 Linux Instances

EC2Windows Instances