What is Assisters API?

Assisters API is an OpenAI-compatible AI inference platform with 9 proprietary models: Chat (128K context), Vision, Code, Embeddings (100+ languages), Moderation (14 categories), Reranking, Speech-to-Text, Text-to-Speech, and Image Generation. Just change your base URL to migrate from OpenAI.

How much does Assisters API cost?

Start free with 100K tokens/month. Paid plans: Developer $29/mo (5M tokens), Startup $99/mo (25M tokens), Enterprise custom. Model pricing from $0.01/M tokens for embeddings to $0.20/M for chat output. Annual plans save 17%.

Is Assisters API compatible with OpenAI?

Yes, Assisters API is 100% OpenAI-compatible. Use official OpenAI SDKs (Python, JavaScript, Ruby, Go) with base_url='https://api.assisters.dev/v1'. Only change base URL and API key to migrate existing code in minutes.

What models does Assisters offer?

Assisters offers 9 proprietary models: assisters-chat-v1, assisters-vision-v1, assisters-code-v1, assisters-embed-v1, assisters-moderation-v1, assisters-rerank-v1, assisters-whisper-v1, assisters-tts-v1, and assisters-image-v1. All optimized for performance and reliability.

Assisters

Now live — 50+ modelsOpenAI-compatible

One API, every model,
zero training.

Drop-in OpenAI replacement. Route across 50+ models, build fallback chains, and fine-tune — all under one key. Your data stays yours.

Get API key — free View docs

import OpenAI from 'openai';
const ai = new OpenAI({
  baseURL: 'https://api.assisters.dev/v1',
  apiKey: process.env.ASSISTERS_API_KEY!,
});
const r = await ai.chat.completions.create({
  model: 'assisters-chat-v1',
  messages: [{ role: 'user', content: 'Hello' }],
});

Click the orbiting spheres to explore model categories ↑

Works seamlessly with your stack

99.9%

Uptime SLA

Enterprise-grade reliability

<100ms

P95 Latency

Fast inference globally

10+

AI Models

Chat, embed, moderate, rerank

100K

Free Tokens

No credit card required

OpenAI

Compatible

Drop-in replacement

Start building in 3 steps

From zero to production in under a minute.

Create Account

Get API Key

Generate your key instantly from the developer dashboard.

First API Call

Production-ready in under 60 seconds.

< 60s

Average time to first API response from signup

Every API you need to build AI apps

Four production-ready endpoints. One unified API. One API key.

Chat Completions

Build conversational AI applications with streaming support. OpenAI-compatible API — migrate in minutes.

Customer support automation
AI writing & coding assistants
Interactive tutoring systems

View docs

chat.ts

const stream = await client.chat.completions.create({
  model: 'assisters-chat-v1',
  messages: [{role: 'user', content: 'Hello'} ],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? '');
}

Embeddings

Generate dense vector representations for semantic search, RAG pipelines, and similarity matching at scale.

Semantic search & RAG
Document clustering
Recommendation systems

View docs

embeddings.ts

const result = await client.embeddings.create({
  model: 'assisters-embed-v1',
  input: 'The quick brown fox',
});

// 1024-dimensional vector
const vector = result.data[0].embedding;
console.log(vector.length); // 1024

Moderation

Detect harmful, unsafe, or policy-violating content with sub-100ms response times. Protect your users at scale.

Community content safety
Real-time chat moderation
User-generated content filtering

View docs

moderation.ts

const result = await client.moderations.create({
  model: 'assisters-moderation-v1',
  input: userMessage,
});

if (result.results[0].flagged) {
  console.warn('Content flagged');
}

Reranking

Boost search relevance by re-scoring candidate documents with a cross-encoder. Dramatically improves RAG accuracy.

Improve RAG retrieval quality
Enterprise search relevance
Hybrid search pipelines

View docs

reranking.ts

const ranked = await fetch(
  'https://api.assisters.dev/v1/rerank',
  { method: 'POST',
    body: JSON.stringify({
      model: 'assisters-rerank-v1',
      query: 'AI inference API',
      documents: docs,
    },
  }
);

Drop-in replacement. Works with any language.

One API, every runtime. Migrate from OpenAI in under 2 minutes.

TypeScriptPythoncURLGo

chat.tsTypeScript

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://api.assisters.dev/v1',
  apiKey: process.env.ASSISTERS_API_KEY,
});

const chat = await client.chat.completions.create({
  model: 'assisters-chat-v1',
  messages: [{ role: 'user', content: 'Hello!' }],
});

console.log(chat.choices[0].message.content);

chat.pyPython

from openai import OpenAI

client = OpenAI(
    base_url='https://api.assisters.dev/v1',
    api_key=os.environ['ASSISTERS_API_KEY'],
)

response = client.chat.completions.create(
    model='assisters-chat-v1',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)
print(response.choices[0].message.content)

request.shcURL

curl https://api.assisters.dev/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $ASSISTERS_API_KEY" \
  -d '{
    "model": "assisters-chat-v1",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'

chat.goGo

package main

import (
    "context"
    openai "github.com/sashabaranov/go-openai"
)

cfg := openai.DefaultConfig(apiKey)
cfg.BaseURL = "https://api.assisters.dev/v1"
client := openai.NewClientWithConfig(cfg)

resp, _ := client.CreateChatCompletion(
    context.Background(),
    openai.ChatCompletionRequest{
        Model: "assisters-chat-v1",
    },
)

Explore our models

State-of-the-art open-source models for every use case.

Assisters Chat v1

Assisters

Chat

Context

128K

Price/M

$0.10

Latency

Fast

Assisters Embed v1

Assisters

Embeddings

Context

Price/M

$0.01

Latency

Ultra-fast

Assisters Moderation v1

Assisters

Moderation

Context

Price/M

$0.05

Latency

Ultra-fast

Assisters Rerank v1

Assisters

Reranking

Context

Price/M

$0.05

Latency

Fast

View all models

Integrate in minutes

Works with every framework and tool your team already uses.

LangChain

Vercel AI SDK

LlamaIndex

OpenAI Python

Next.js

Supabase Edge Fn

Cloudflare Workers

Docker

FastAPI

LiteLLM

Deno

Bun

Simple, transparent pricing

Start free, pay as you grow. No hidden fees, no surprises.

Free

$0/mo

Perfect for prototyping

100K tokens/month
10 RPM rate limit
All 4 API endpoints
Community support

Start Building Free

Developer

$29/mo

For production apps

5M tokens/month
100 RPM rate limit
All 4 API endpoints
Email support
Usage dashboard
Webhook support

Start Free Trial

Enterprise

Custom

For teams at scale

Unlimited tokens
Custom rate limits
Dedicated endpoints
Dedicated support
99.9% SLA
Custom contracts

Contact Sales

Feature	Free	Developer	Enterprise
Free tokens/month	100K	5M	Unlimited
Rate limit (RPM)	10	100	Custom
Chat completions
Embeddings
Moderation
Reranking
Usage dashboard	—
Email support	—
SLA guarantee	—	—
Dedicated support	—	—

100% transparent. No surprise bills. View full pricing →

Technical Insights: AI Inference APIs

How does Assisters API differ from OpenAI?

Assisters API is a drop-in replacement for OpenAI that uses the same SDK and API format. The key differences are proprietary models optimized for specific tasks, usage-based pricing that's up to 70% cheaper, and guaranteed 99.9% uptime SLA. Migration takes minutes—just change your base URL to https://api.assisters.dev/v1 and swap your API key.

What makes Assisters embeddings better for multilingual RAG?

Assisters-embed-v1 is specifically optimized for multilingual retrieval with 100+ language support and 1024-dimensional vectors. Unlike English-first models, it maintains semantic accuracy across language pairs, making it ideal for global RAG systems. Combined with assisters-rerank-v1, you get a complete multilingual search pipeline at a fraction of competitor costs.

Can I use Assisters for production applications with SLA requirements?

Yes. Assisters API is enterprise-ready with 99.9% uptime SLA backed by service agreements, <100ms P95 latency for inference endpoints, and SOC 2 Type II compliance. Dedicated support is available for paid plans. Companies process millions of tokens daily through our infrastructure without issues.

How does token-based pricing work at Assisters?

Assisters uses transparent per-token pricing with no hidden fees. You pay for exactly what you use: input tokens (prompts) and output tokens (completions) are priced separately. Free tier includes 100K tokens/month. Subscription plans include token allowances plus volume discounts. Wallet credits never expire, letting you prepay for predictable budgeting.

100K free tokens — no credit card

Ship AI features today

Get your API key in 30 seconds. Build your first integration in under a minute. Scale to millions of requests without changing a line of code.

Get Started Free Read the Docs

Talk to our team →

One API, every model,zero training.

Start building in 3 steps

Create Account

Get API Key

First API Call

Every API you need to build AI apps

Chat Completions

Embeddings

Moderation

Reranking

Drop-in replacement. Works with any language.

Explore our models

Assisters Chat v1

Assisters Embed v1

Assisters Moderation v1

Assisters Rerank v1

Integrate in minutes

Simple, transparent pricing

Free

Developer

Enterprise

Technical Insights: AI Inference APIs

How does Assisters API differ from OpenAI?

What makes Assisters embeddings better for multilingual RAG?

Can I use Assisters for production applications with SLA requirements?

How does token-based pricing work at Assisters?

Ship AI features today

One API, every model,
zero training.