How many GPUs does Llama 4 Maverick need?

Llama 4 Maverick (400B MoE) typically requires 4-8 A100 80GB or 2-4 H100 GPUs depending on quantization. FP16 requires more VRAM; INT8/INT4 quantization reduces the requirement significantly.

When does self-hosting Llama 4 become cheaper than API?

Self-hosting typically breaks even at around 50-100M tokens per month for Maverick-class models. Below that volume, managed APIs (Together, Fireworks, Groq) are usually more economical.

Llama 4 Self-Hosting Cost Estimator

Self-hosting Llama 4 can save significantly over API pricing at scale — but only if you choose the right infrastructure. This estimator calculates monthly GPU costs across AWS, GCP, and Azure for Llama 4 Scout, Maverick, and Behemoth variants.