R1 Distill Llama 70B pricing
DeepSeek R1 Distill Llama 70B is a distilled large language model based on [Llama-3.3-70B-Instruct](/meta-llama/llama-3.3-70b-instruct), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). The model combines advanced distillation techniques to achieve high performance across multiple benchmarks, including: - AIME 2024 pass@1: 70.0 - MATH-500 pass@1: 94.5 - CodeForces Rating: 1633 The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to lar. Live index: 10 priced offers. Best input $0.200 per million tokens from DeepInfra. Best output $0.375 per million tokens from Nscale.
Pricing across providers
Every row is a seller of R1 Distill Llama 70B with token pricing we track. The cheapest input in this snapshot is from DeepInfra. The bar chart shows the same input and output dollars per million for a quick scan.
| Provider | Input / 1M | Output / 1M | Cached input | Batch |
|---|---|---|---|---|
O Openrouter | $0.700 | $0.800 | — | — |
VA Vercel Ai Gateway | $0.750 | $0.990 | — | — |
N Novita | $0.800 | $0.800 | — | — |
G Gradient | $0.990 | $0.990 | — | — |
N Nscale | $0.375 | $0.375 | — | — |
N Nebius | $0.250 | $0.750 | — | — |
O Ovhcloud | $0.670 | $0.670 | — | — |
S Sambanova | $0.700 | $1.40 | — | — |
FA Fireworks AI | $0.900 | $0.900 | — | — |
D DeepInfra | $0.200 | $0.600 | — | — |
Input vs output · per provider
Cost calculator
The calculator uses the same dollars per million tokens as the table. Adjust sliders to see how R1 Distill Llama 70B cost scales with traffic.
Provider
0.070000¢ / req
0.040000¢ / req
Model specifications
Context length, caps, and capability flags for R1 Distill Llama 70B. Family: DeepSeek R1. Values follow the main provider (Meta) record in our index.
- Context window
- 131,072 tokens
- Max output
- 131,072 tokens
- Vision (images)
- No
- Tool / function calling
- No
- Streaming
- No
- Released
- Jan 2025
- Primary provider
- Meta
- Model family
- DeepSeek R1
Compare R1 Distill Llama 70B
Jump into a comparison when you want one table for two models instead of two tabs. 6 curated matches for R1 Distill Llama 70B.
- R1 Distill Llama 70B vs Llama 3.1 70B
Compare pricing side by side
- R1 Distill Llama 70B vs Llama 3.1 8B
Compare pricing side by side
- R1 Distill Llama 70B vs GPT-4o
Compare pricing side by side
- R1 Distill Llama 70B vs GPT-4o mini
Compare pricing side by side
- R1 Distill Llama 70B vs Claude Sonnet 4.6
Compare pricing side by side
- R1 Distill Llama 70B vs Gemini 2.0 Flash
Compare pricing side by side
Frequently asked questions
Answers pull from the same numbers you see on this page. The short model note from our index: DeepSeek R1 Distill Llama 70B is a distilled large language model based on [Llama-3.3-70B-Instruct](/meta-llama/llama-3.3-70b-instruct), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). The model combines advance...
Also from Meta
Other models by Meta with live pricing in our catalog.