Mixedbread

mxbai-rerank-base-v1


Model description

mxbai-rerank-base-v1 is part of the Mixedbread rerank model family, a set of best-in-class reranking models that are fully open-source under the Apache 2.0 license. These models are designed to boost search results by adding a semantic layer to existing search systems, making it easier to find relevant results.

The models were trained using a large collection of real-life search queries and the top-10 results from search engines for these queries. First, a large language model ranked the results according to their relevance to the query. These signals were then used to train the rerank models. Experiments show that these models significantly boost search performance, particularly for complex and domain-specific queries.

When used in combination with a keyword-based search engine, such as Elasticsearch, OpenSearch, or Solr, the rerank model can be added to the end of an existing search workflow, allowing users to incorporate semantic relevance into their keyword-based search system without changing the existing infrastructure. This is an easy, low-complexity method of improving search results by introducing semantic search technology into a user's stack with one line of code.

offers the best balance between size and performance in the Mixedbread rerank model family. On a subset of 11 BEIR datasets, achieves an NDCG@10 score of 46.9 and an Accuracy@3 score of 72.3, outperforming lexical search and other reranking models of similar size.

Recommended Sequence LengthLanguage
512English

Suitable Scoring Methods

  • Model Output: The model directly scores the relevance of each document to the query. You can use the model output directly. If you want a score between 0 and 1, you can use the on the scores.

Limitations

  • Language: mxbai-rerank-base-v1 is trained on English text and is specifically designed for the English language.
  • Sequence Length: The suggested maximum sequence length is 512 tokens. Longer sequences may be truncated, leading to a loss of information. Please note that max sequence length is for the query and document combined. It means that len(query) + len(document) should not be longer than 512 tokens.

Examples

On this page