Embeddings API

Create embeddings

This endpoint provides access to our embedding models. It returns embeddings for the input text you provide, which can be used for various tasks such as text similarity, clustering, and more.

The endpoint is also a superset of the OpenAI embedding API. This means you can use the OpenAI API client, pointing it to https://api.mixedbread.ai. However, note that not all mixedbread-specific features may be available through the OpenAI client.

Request Body

  • Name
    input*
    Type
    string|string[]
    Description

    A string or a list of strings, where each string represents a sentence or chunk of text to be embedded.

    • Between 1-256 items.
    • Texts will be truncated if longer than the model's maximum sequence length
  • Name
    model*
    Type
    string
    Description

    The model to be used for generating embeddings.

  • Name
    prompt
    Type
    string
    Description

    An optional prompt to provide context to the model. Refer to the model's documentation for more information.

    • A string between 1 and 256 characters
  • Name
    normalized
    Type
    boolean
    Description

    Option to normalize the embeddings. Defaults to true.

  • Name
    dimensions
    Type
    number
    Description

    The desired number of dimensions in the output vectors. Defaults to the model's maximum.

    • A number between 1 and the model's maximum output dimensions
    • Only applicable for Matryoshka-based models
  • Name
    encoding_format
    Type
    string|string[]
    Description

    The desired format for the embeddings. Defaults to "float". If multiple formats are requested, the response will include an object with each format for each embedding.

    • Options: float, float16, binary, ubinary, int8, uint8, base64
  • Name
    truncation_strategy
    Type
    string
    Description

    The strategy for truncating input text that exceeds the model's maximum length. Defaults to "start". Setting it to "none" will result in an error if the text is too long.

    • Options: start, end, none

Response Body

  • Name
    model*
    Type
    string
    Description

    The embedding model used, which can be one of our hosted models or a custom fine-tuned model.

  • Name
    object*
    Type
    string
    Description

    The type of the returned object. Always "list".

  • Name
    data*
    Type
    object[]
    Description

    A list of the generated embeddings.

  • Name
    data[x].embedding*
    Type
    number[]|object
    Description

    The vector representing the embedding, or an object with different encodings if multiple formats were requested.

  • Name
    data[x].index*
    Type
    number
    Description

    The index of the input text corresponding to this embedding.

  • Name
    data[x].object*
    Type
    number
    Description

    The type of the returned object. Always "embedding".

  • Name
    usage*
    Type
    object
    Description

    Information about API usage for this request.

  • Name
    usage.prompt_tokens*
    Type
    number
    Description

    The number of prompt tokens used to generate the embeddings.

  • Name
    usage.total_tokens*
    Type
    number
    Description

    The total number of tokens used to generate the embeddings.

  • Name
    normalized*
    Type
    boolean
    Description

    Indicates whether the embeddings are normalized.

Rate Limiting

To ensure smooth operation for all users, we have rate limits in place. If you exceed the rate limit, you will receive a 429 Too Many Requests error. Please wait and try again after a short delay. The table below outlines the rate limits for each tier:

TierRequests per MinuteTokens per MinuteRequests per DayBurst
1 - Home Baker (Free)100250,0005,00010
2 - Professional Baker300500,00010,00020
3 - Bakery Shop5001,000,00010,00050
4 - Bakery Chain100010,000,00050,000100
5 - Bakery Franchise200010,000,000100,000100
CustomCustomCustomCustomCustom

Requesting a Rate Limit Increase

If you require a higher rate limit, we're here to help! and provide the following information:

  • Your use case and what you're working on
  • The estimated number of requests you anticipate needing
  • Any additional details that would help us understand your requirements

We will review your request and collaborate with you to determine an appropriate limit.