DeepInfra

DeepInfra provides serverless inference for popular open-source models including Llama, Mistral, Qwen, Stable Diffusion, and Whisper. Known for competitive pricing, fast response times, and a simple OpenAI-compatible API at significantly lower costs than running your own infrastructure.

Official Website Documentation

Endpoints

POST

Chat Completions

/v1/openai/chat/completions

OpenAI-compatible chat completions for open-source models.

Streaming

Vision

Function Calling

Models

meta-llama/Llama-3.3-70B-Instructmeta-llama/Llama-3.1-8B-Instructmistralai/Mistral-7B-Instruct-v0.3Qwen/Qwen2.5-72B-Instruct

Pricing

Model	Input Price	Output Price	Context Window
Llama 3.3 70B	$0.35	$0.4	128.0K
Llama 3.1 8B	$0.03	$0.055	128.0K
Qwen 2.5 72B	$0.35	$0.4	32.0K

Capabilities

Streaming

JSON Mode

Embeddings

Image Input

Try it in Playground

Test DeepInfra APIs directly in our interactive playground with your own API key.

Try it in Playground

DeepInfra

Endpoints

Chat Completions

Capabilities

Try it in Playground

Overview

DeepInfra

Endpoints

Chat Completions

Capabilities

Try it in Playground

Overview