Loading...
Loading...
DeepInfra provides serverless inference for popular open-source models including Llama, Mistral, Qwen, Stable Diffusion, and Whisper. Known for competitive pricing, fast response times, and a simple OpenAI-compatible API at significantly lower costs than running your own infrastructure.
/v1/openai/chat/completionsOpenAI-compatible chat completions for open-source models.
Models
Pricing
| Model | Input Price | Output Price | Context Window |
|---|---|---|---|
| Llama 3.3 70B | $0.35 | $0.40 | 128.0K |
| Llama 3.1 8B | $0.03 | $0.06 | 128.0K |
| Qwen 2.5 72B | $0.35 | $0.40 | 32.0K |
Test DeepInfra APIs directly in our interactive playground with your own API key.
Category
Pricing Model
Pay per TokenPricing
Input Price
$0.35 per 1M tokens
Output Price
$0.40 per 1M tokens
Context Window
128.0K tokens
Country
US