Sample Hardware Spec
Single Unit
Component | Small | Medium | Large | Bigger Storage Optimised | Faster Speed Optimised |
---|---|---|---|---|---|
CPU | 12 Cores i7 /Xeon | 16 cores i7 /Xeon | 32 cores i7 /Xeon | 64 Cores i7 /Xeon | 20 cores i7 /Xeon |
Memory | 48 GB DDR4+ | 64 GB DDR4+ | 128 GB DDR4+ | 256 GB DDR4+ | 128 GB DDR4+ |
NVMe SSD | 256 GB | 500 GB | 4 TB | 8 TB (Source Files not stored) | 4 TB |
Consumer grade GPU | 1 x RTX 4090 24GB 1 x RTX 4070 Ti 12 GB | 4 x RTX 4090 24GB 1 x RTX 4070 Ti | 8 x RTX 4090 24GB 2 x RTX 4070 Ti | 10 x RTX 4090 24GB | 1 x H100 NVL |
Data Center grade GPU | 2 x L4 24 GB | 1 x H100NVL GPU | 2 x H100NVL GPU | 2 x H100NVL GPU | 3 x H100NVL GPU |
LLM Supported Options (Consumer Grade GPU) | Runs 1x Llama 3.1 8B 1 Embedding GPU | 4x Llama 3.1 8B OR 1x Llama 3.3 70B (20k tokens) -- 1 Embedding GPU | 8x Llama 3.1 8B OR 2x Llama 3.3 70B (20k tokens) OR 1x Llama 3.3 70B (128k tokens) or a combination of the above -- 2 embedding GPUs | 8x Llama 3.1 8B OR 2x Llama 3.3 70B (20k tokens) OR 1x Llama 3.3 70B (128k tokens) or a combination of the above -- GPUs allocated for embedding/ LLM usage according to requirements | 1x Llama 3.1 70B (20k tokens) Advantage: Single query speed is 2x as fast as with RTX 4090 -- 2 embedding GPUs |
Number of Users Supported | 150 | 800 (8B) OR 150 (70B) | 2000 (8B) / 400 (70B) | 2000 | 400 |
Amount of data supported | 500 GB | 700 GB | 1.4 TB | 30 TB | 1.4 TB |
Llama 3.1 70B is recommended for code generation and data analysis