Rate Limiting

Understand Mixedbread's API rate limiting policies, including tier-based limits, handling rate limit errors, and best practices for optimal API usage. Learn how to manage your requests efficiently and upgrade your limits as your needs grow.

Quick Overview

  • Each endpoint has its own rate limits
  • Limits are based on requests per minute, tokens per minute, and requests per day
  • Exceeding limits may result in request throttling or rejection
  • Need higher limits?

Rate Limit Tiers

We offer five tiers with increasing limits. Here's a breakdown for the Embeddings & Reranking endpoint:

Home Baker (Free)100250,0005,00010
Professional Baker300500,00010,00020
Bakery Shop5001,000,00010,00050
Bakery Chain1,00010,000,00050,000100
Bakery Franchise2,00010,000,000100,000100

Custom tiers are available upon request.

Handling Rate Limits

When you hit a rate limit:

  1. You'll receive a 429 Too Many Requests response
  2. The response will include a Retry-After header
  3. Wait for the specified time before retrying

Example error response:

    "type": "too_many_requests_error",
    "url": "https://www.mixedbread.ai/api-reference",
    "message": "Rate limit exceeded. Please try again later.",
    "details": {
        "retry_after": 60,
        "limit": "100",
        "remaining": "0",
        "reset": "1630000000",
        "tier": "1"

Best Practices

  • Implement exponential backoff in your client code, if not using an SDK
  • Cache results when possible to reduce API calls
  • Optimize your requests to use fewer tokens

Need Higher Limits?

If you need higher limits:

  1. or
  2. Provide details about your use case and expected request volume
  3. We'll review and adjust your limits if feasible

Remember, we're here to help you succeed. Don't hesitate to reach out if you have any questions or need assistance optimizing your API usage!

On this page