Supported Self Hosted LLM Models
AWS Savings plans can offer discounts of around 30% on the AWS prices shown below. Prices are shown for on-demand instances.
Model* | vRam | GPU Examples for Purchase | AWS Instance / Cost per hour** | AWS 24X7 monthly *** | AWS *** | Features (ratings / 10) | Cloud API Option |
|---|---|---|---|---|---|---|---|
Open AI GPT OSS 20B | 16 GB |
Data Center GPU: | N. Virginia / Stockholm: G4dn.xlarge ($0.52) Israel: g5.xlarge · 1×A10G (24 GB) |
$400
$900 |
$100
$240 | Answer Accuracy - 7 Code Generation - 8 Agent - Translation - Vision - No | Bedrock |
Open AI GPT OSS 120B | 80 GB | Data Center GPU: 2x L40s, 96 GB total ($16,000)
Consumer GPU: 4x 4090, 96 GB Total ($9600) | N. Virginia g6.12xlarge ($4.60) Stockholm g5.12xlarge ($6) Israel g5.12xlarge ($6.64) | $3500
$4500
$5000 | $900
$1200
$1300 | Answer Accuracy - 8 Code Generation - 9 Agent - Translation - Vision - No | Bedrock |
Gemma 3 27 B | 80 GB | Data Center GPU: 2x L40s, 96 GB total ($16,000) (65,000 NIS in Israel)
Consumer GPU: Information upon request | N. Virginia g6.12xlarge ($4.60) Stockholm g5.12xlarge ($6) Israel g5.12xlarge ($6.64) | $3500
$4500
$5000 | $900
$1200
$1300 | Answer Accuracy - 7 Code Generation - 8 Agent - Vision - Yes Translation - Yes | OpenRouter |
Llama 4 Maverick | 640 GB | 8× H100 80 GB ($200,000) | p5.48xlarge ( $55/ hr) | $41,000 | $11,000 | Answer Accuracy - 8 Code Generation - 9 Agent - Yes Vision - Yes Translation - Yes | Bedrock |
Llama 3.1 8B (Deprecated) | 24 GB | Data Center GPU: Consumer GPU: | N. Virginia g6.xlarge ($0.80) | $600 | $160 | Answer Accuracy - 5 Code Generation - N/A Agent - No Translation - No Vision - No | Bedrock |
* Above models are official releases only
** LLM Instance only. Windows Dashboard and Linux Gateway instances required too.
*** Prices are approximate
Also tested:
Oss 20B: RTX 2000 ADA, RTX A4500 (RTX A4000 didn’t pass testing)