Pricing
We currently offer three tiers of pricing, each suited for different types of users. Please reach out to us to discuss your requirements, and we can work with you to find a solution that fits your needs.
Free
$0
Lower rate limits
Community support
Exploration
Pay per token
coming soon
Up to 100 RPMCommunity support
Enterprise
Custom pricing
Custom rate limits
Fine-tuned models
Custom SLAs
Dedicated support
Our free tier supports a context length of 8,192 tokens. For all supported models, we also offer context lengths up to 128K upon request. To gain access, please contact us here!
Exploration Tier Pricing
Model | Speed | Input | Output |
---|---|---|---|
Llama 4 Scout | ~2600 tokens/s | $0.65/M tokens | $0.85/M tokens |
Llama 3.1 8B | ~2200 tokens/s | $0.10/M tokens | $0.10/M tokens |
Llama 3.3 70B | ~2100 tokens/s | $0.85/M tokens | $1.20/M tokens |
Qwen 3 32B | ~2100 tokens/s | $0.40/M tokens | $0.80/M tokens |
Deepseek R1 Distill Llama 70B | ~1700 tokens/s | $2.20/M tokens | $2.50/M tokens |
Enterprise Tier Pricing
Our enterprise tier offers flat monthly pricing with flexible contract terms of 3, 6, or 12 months. Your monthly rate is based on your required token processing capacity, specifically the maximum number of input and output tokens you need to process per minute. Contact us for a trial package.
In addition to the models available in the free and exploration tiers, enterprise customers have access to:
- Llama 3.1 405B
- Mixtral 8x22B
- Mixtral 8x7B