Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Runpod: $0.22

On Prem: RTX 4090

--host 0.0.0.0 --port 8000 --model deepseek-ai/DeepSeek-R1-Distill-Llama-8B --tensor-parallel-size 1 --max-model-len 20000 --enforce-eager

ai21labs/AI21-Jamba-1.5-Mini

...

On Prem: 1x A100 80GB / 4 x RTX 4000 ADA

--host 0.0.0.0 --port 8000 --model ai21labs/AI21-Jamba-1.5-Mini --tensor-parallel-size 4 --max-model-len 100000 --quantization experts_int8