Access state-of-the-art open-source AI models through a simple, unified API. From chat to embeddings, find the right model for your use case.
29 models
chat
Smallest available chat model for on-device and extreme-latency scenarios.
chat
Ultra-low latency chat for simple tasks. Ideal for autocomplete, short summaries, and high-volume pipelines. 2-4B edge model.
chat
Ultra-lightweight model for high-throughput, cost-sensitive text generation.
chat
Upgraded compact model with strong multilingual support.
chat
Ultra-fast compact chat. Optimised for low-latency tasks.
chat
Compact open-weight chat with automatic multi-tier failover.
chat
Lightweight instruct chat with automatic multi-tier failover.
chat
Highest-throughput free chat tier for high-frequency lightweight tasks. Automatic multi-tier failover.
chat
Balanced MoE chat with automatic multi-tier failover.
chat
Efficient general chat with automatic multi-tier failover.
chat
Indic-languages-optimized chat with automatic multi-tier failover.
chat
Japanese-optimized chat with automatic multi-tier failover.
chat
Multilingual frontier chat with automatic multi-tier failover.
chat
Long-context agentic chat with automatic multi-tier failover.
chat
Classic sparse-MoE chat with automatic multi-tier failover.
chat
Quality between turbo and pro tiers.
chat
Flagship MoE chat with automatic multi-tier failover.
chat
Frontier-grade 550B MoE general chat with automatic multi-tier failover.
chat
Broad-capability open-weight chat. Free tier.
chat
Mixture-of-Experts model optimized for instruction-following, function calling, and agentic tool use. Best speed-to-capability ratio.
chat
Advanced conversational AI with frontier-class reasoning. 128K context window with 500+ tokens/second throughput.
chat
Enhanced 70B model with stronger instruction-following, code comprehension, and multi-language support. Ideal for agentic workflows.
chat
High-capacity 120B chat model. Strong reasoning and instruction following.
chat
Highest-quality free chat tier. Best for complex reasoning and long-context tasks.
chat
Fast, multimodal chat with 1M-token context. Free tier with automatic multi-tier failover.
chat
Mixture-of-Experts smart routing. Automatically selects the best model from the pool based on task type, load, and quality requirements. Ideal for unpredictable workloads.
chat
Frontier-grade 675B MoE model for the most demanding reasoning, analysis, and generation tasks. Ideal for enterprise workloads requiring maximum accuracy.
chat
Fast, low-latency MoE chat with automatic multi-tier failover.
We're constantly adding new models. Let us know what you need and we'll work on adding it.
Request a Model