Rate Limiting

Understand Mixedbread's API rate limiting policies, including tier-based limits, handling rate limit errors, and best practices for optimal API usage. Learn how to manage your requests efficiently and upgrade your limits as your needs grow.

Quick Overview

Each endpoint has its own rate limits
Limits are based on requests per minute, tokens per minute, and requests per day
Exceeding limits may result in request throttling or rejection
Need higher limits? Contact us

Rate Limit Tiers

We offer five tiers with increasing limits. Here's a breakdown for the Embeddings & Reranking endpoint:

Tier	Requests/Min	Tokens/Min	Requests/Day	Burst
Home Baker (Free)	100	250,000	5,000	10
Professional Baker	300	500,000	10,000	20
Bakery Shop	500	1,000,000	10,000	50
Bakery Chain	1,000	10,000,000	50,000	100
Bakery Franchise	2,000	10,000,000	100,000	100

Custom tiers are available upon request.

Handling Rate Limits

When you hit a rate limit:

You'll receive a 429 Too Many Requests response
The response will include a Retry-After header
Wait for the specified time before retrying

Example error response:

{
    "type": "too_many_requests_error",
    "url": "https://www.mixedbread.ai/api-reference",
    "message": "Rate limit exceeded. Please try again later.",
    "details": {
        "retry_after": 60,
        "limit": "100",
        "remaining": "0",
        "reset": "1630000000",
        "tier": "1"
    }
}

Best Practices

Implement exponential backoff in your client code, if not using an SDK
Cache results when possible to reduce API calls
Optimize your requests to use fewer tokens

Need Higher Limits?

If you need higher limits:

Contact us or join our Discord community
Provide details about your use case and expected request volume
We'll review and adjust your limits if feasible