/
Supported LLM Models
Supported LLM Models
deepseek-ai/DeepSeek-R1-Distill-Llama-8B
vRam: 24 GB
Disk: 18GB
Context window: 20k - 100k tokens
AWS: G6.Xlarge
Runpod: $0.22
On Prem: RTX 4090
--host 0.0.0.0 --port 8000 --model deepseek-ai/DeepSeek-R1-Distill-Llama-8B --tensor-parallel-size 1 --max-model-len 20000 --enforce-eager
ai21labs/AI21-Jamba-1.5-Mini
vRAM: 80 GB
Context Window: 100k Tokens
Disk: 110 GB
Runpod cost: $0.80
AWS: Unfeasible
On Prem: 1x A100 80GB / 4 x RTX 4000 ADA
--host 0.0.0.0 --port 8000 --model ai21labs/AI21-Jamba-1.5-Mini --tensor-parallel-size 4 --max-model-len 100000 --quantization experts_int8
, multiple selections available,
Related content
On-prem System Requirements
On-prem System Requirements
More like this
POC with LLM running on a CPU
POC with LLM running on a CPU
More like this
Sample Hardware Spec
Sample Hardware Spec
More like this