Access state-of-the-art open-source AI models through a simple, unified API. From chat to embeddings, find the right model for your use case.
74 models
chat
Smallest available chat model for on-device and extreme-latency scenarios.
chat
Ultra-low latency chat for simple tasks. Ideal for autocomplete, short summaries, and high-volume pipelines. 2-4B edge model.
chat
Ultra-lightweight model for high-throughput, cost-sensitive text generation.
chat
Upgraded compact model with strong multilingual support.
chat
Ultra-fast compact chat. Optimised for low-latency tasks.
chat
Compact open-weight chat with automatic multi-tier failover.
chat
Lightweight instruct chat with automatic multi-tier failover.
chat
Highest-throughput free chat tier for high-frequency lightweight tasks. Automatic multi-tier failover.
chat
Balanced MoE chat with automatic multi-tier failover.
chat
Efficient general chat with automatic multi-tier failover.
chat
Indic-languages-optimized chat with automatic multi-tier failover.
chat
Japanese-optimized chat with automatic multi-tier failover.
chat
Multilingual frontier chat with automatic multi-tier failover.
chat
Long-context agentic chat with automatic multi-tier failover.
chat
Classic sparse-MoE chat with automatic multi-tier failover.
chat
Quality between turbo and pro tiers.
chat
Flagship MoE chat with automatic multi-tier failover.
chat
Frontier-grade 550B MoE general chat with automatic multi-tier failover.
chat
Broad-capability open-weight chat. Free tier.
chat
Mixture-of-Experts model optimized for instruction-following, function calling, and agentic tool use. Best speed-to-capability ratio.
chat
Advanced conversational AI with frontier-class reasoning. 128K context window with 500+ tokens/second throughput.
chat
Enhanced 70B model with stronger instruction-following, code comprehension, and multi-language support. Ideal for agentic workflows.
chat
High-capacity 120B chat model. Strong reasoning and instruction following.
chat
Highest-quality free chat tier. Best for complex reasoning and long-context tasks.
chat
Fast, multimodal chat with 1M-token context. Free tier with automatic multi-tier failover.
chat
Mixture-of-Experts smart routing. Automatically selects the best model from the pool based on task type, load, and quality requirements. Ideal for unpredictable workloads.
chat
Frontier-grade 675B MoE model for the most demanding reasoning, analysis, and generation tasks. Ideal for enterprise workloads requiring maximum accuracy.
chat
Fast, low-latency MoE chat with automatic multi-tier failover.
chat
High-quality MoE chat with automatic multi-tier failover.
classification
Active speaker detection — identifies which person is speaking in broadcast and video conference streams.
classification
Sparse end-to-end autonomous-vehicle perception and planning model.
code
Enhanced 70B code model fine-tuned for code generation, summarization, and multi-language tasks including Python, Java, Go, and Rust.
code
Instant code completion and simple generation. Optimized for IDE integrations and high-throughput code suggestion pipelines.
code
Strong general coder, faster than the flagship tier.
code
Expert code generation model optimized for software development. Generate, debug, and refactor code in 90+ programming languages. 128K context window.
code
World-class 480B agentic coding model with 256K context. Tops leaderboards for software engineering tasks, browser use, and complex multi-file refactors.
code
MisarCoder free MoE pipeline tuned for code. Routes across multiple expert tiers based on task size and latency target.
embedding
Text embeddings with high retrieval accuracy for RAG, semantic search, and classification. Supports long documents.
embedding
State-of-the-art multilingual text embeddings with 1024 dimensions. Supports 100+ languages for semantic search and retrieval.
embedding
MTEB top-ranked embedding model. 3072-dim Matryoshka Representation Learning (MRL). Free tier with automatic failover.
embedding
Specialized code embedding model optimized for code search and retrieval. Supports text, code, and hybrid queries across 20+ programming languages.
moderation
Content-safety classifier. Dedicated safety model; no chat fallback.
moderation
Topic-control classifier. Flags off-topic or policy-violating conversations.
moderation
Content safety classifier. Returns safe/unsafe labels. Applied automatically on open-source model routes.
moderation
Advanced content safety model detecting harmful content across 14 categories. Fast, accurate, and production-ready for content moderation.
moderation
Multi-modal content safety classifier covering text and images. Identifies unsafe inputs and outputs across 13+ safety categories.
moderation
Detect and classify Personally Identifiable Information (PII) in text. Identifies names, emails, phone numbers, addresses, and 20+ PII categories.
moderation
Context-aware content safety model with reasoning capabilities. Applies domain-specific policies and explains safety decisions.
reasoning
Compact reasoning with automatic multi-tier failover.
reasoning
Fast reasoning with automatic multi-tier failover.
reasoning
Multimodal reasoning with automatic multi-tier failover.
reasoning
Rapid reasoning with lower latency than v3.
reasoning
Fast mid-tier reasoning model.
reasoning
Deep multi-step reasoning with automatic multi-tier failover.
reasoning
Deep reasoning model with 230B parameters excelling in coding, complex reasoning, and office productivity tasks.
reasoning
Fast reasoning model built on a 200B sparse MoE architecture. Excels at multi-step math, logic, and scientific problem-solving with chain-of-thought.
rerank
High-precision document reranking for improved search relevance. Supports up to 1000 documents per request with multilingual support.
rerank
GPU-accelerated passage reranker providing probability scores for QA relevance. Ideal for two-stage retrieval pipelines.
science
End-to-end protein 3D structure prediction from amino acid sequences. Drug discovery and structural biology.
science
Protein language model generating high-quality embeddings from amino acid sequences. Ideal for drug discovery, protein function prediction, and bioinformatics pipelines.
speech
Ultra-fast speech-to-text transcription supporting 100+ languages. Industry-leading accuracy with 10× realtime processing speed.
translation
Neural machine translation across 12 language pairs. Supports few-shot example prompts for domain-specific terminology control.
tts
High-quality neural text-to-speech with 300+ natural-sounding voices across 100+ languages. Speed and pitch control included.
tts
Zero-shot voice cloning TTS — generate expressive speech in any voice from a short audio sample. No fine-tuning required.
video
Physical-world simulation model for generating realistic synthetic video data.
video
Physics-aware world-state video generation from text prompts and spatial control inputs. Designed for physical AI development and simulation.
vision
Efficient vision-language model with automatic multi-tier failover.
vision
Compact vision-language model with automatic multi-tier failover.
vision
Multilingual multimodal model with 128 experts and 17B active parameters. Processes images and text for detailed visual analysis, OCR, and multimodal Q&A.
vision
Premium multimodal model accepting image and audio inputs. Best for complex image reasoning, document analysis, and audio-visual tasks.
vision
Advanced multimodal vision model with image understanding, OCR, visual Q&A, and document analysis capabilities. 128K context window.
vision
High-quality multimodal understanding.
voice
Real-time voice-optimised conversational model for voice assistants and telephony.
We're constantly adding new models. Let us know what you need and we'll work on adding it.
Request a Model