Supported LLM Models

LLm Model vRAM requirements

Model

vRam Required

Notes

Model

vRam Required

Notes

Llama 3.1 8B

24 GB

20k Context

Llama 3.1 8B

48 GB

Full 128k Context

Llama 3.3 70B

196 GB

 

DeepSeek-R1-Distill-Llama-8B

24 GB

20k Context

ai21labs/AI21-Jamba-1.5-Mini

80 GB

100k Context

deepseek-ai/DeepSeek-R1-Distill-Llama-8B

vRam: 24 GB

Disk: 18GB

Context window: 20k - 100k tokens

AWS: G6.Xlarge

Runpod: $0.22

On Prem: RTX 4090

--host 0.0.0.0 --port 8000 --model deepseek-ai/DeepSeek-R1-Distill-Llama-8B --tensor-parallel-size 1 --max-model-len 20000 --enforce-eager

ai21labs/AI21-Jamba-1.5-Mini

vRAM: 80 GB

Context Window: 100k Tokens

Disk: 110 GB

Runpod cost: $0.80

AWS: Unfeasible

On Prem: 1x A100 80GB / 4 x RTX 4000 ADA

--host 0.0.0.0 --port 8000 --model ai21labs/AI21-Jamba-1.5-Mini --tensor-parallel-size 4 --max-model-len 100000 --quantization experts_int8

Related content