See all posts
14 min read

Boost Your Search With The Crispy mixedbread Rerank Models

Boost Your Search With The Crispy mixedbread Rerank Models

Today, we are releasing a family of best-in-class reranking models. They come with a fully open-source Apache 2.0 license too! The mixedbread team is happy to share these crispy models with the community 🍞

Read on to learn more about our approach and to check out our benchmarks. If you want to skip right to the models instead, you can access them here:

Why Rerank?

Searching data using traditional keyword-based search can be challenging and frustrating. We’ve all been in a situation where we were looking for a specific piece of information and got back results that had almost nothing to do with the things we were looking for. One way to boost your search is using embeddings-based semantic search systems, which can contextualize the meaning of a user’s query, allowing them to return more relevant and accurate results.

However, many companies have built large pipelines and systems around keyword-based search. Migrating to a semantic embedding search would be resource-intensive and costly.

With our rerank models, companies can leverage their existing search infrastructure and add a semantic boost on top. The models perform extremely well on industry relevant use cases. What’s more? They’re open-source and perform on par with or even better than many closed source competitors.

Two-stage search flow including rerank

Two-stage search flow including rerank

Reranking is applied after a first stage retrieval step. Keyword-based search systems like Elasticsearch or Solr can be used to retrieve the top 100 or more candidates and our reranking models can be applied at the last stage to get the most relevant candidates to the top.

Introducing The mixedbread Rerank Model Family

We are thrilled to share our rerank model family with you. The models are fully open-source and you can host them yourself or use them with our upcoming API. These models can become an integral part of any high-performing search system.

The models were trained using a large collection of real-life search queries and the top-10 results from search engines for these queries. A large language model ranks the results according to their relevance to the query. These signals were then used to train our rerank models. Our experiments show that our models significantly boost the search performance, particularly for complex and domain-specific queries.

When used in combination with a keyword-based search engine, such as Elasticsearch, OpenSearch, or Solr, our rerank model endpoint can be added to the end of an existing search workflow and will allow users to incorporate semantic relevance into their keyword-based search system without changing the existing infrastructure. This is an easy, low-complexity method of improving search results by introducing semantic search technology into a user’s stack with one line of code.

Boosting Search Quality

Our models are extremely easy to use with your existing search stack. Once you get the initial results from your existing search engine, pass the initial query and list of results to the model. You’ll have two options: use the model either offline by hosting it yourself or online using our (upcoming) API. Our models come in three sizes:

Using It Locally

To get started, install the necessary packages:

Here is a quick example: Given the query “Who wrote 'To Kill a Mockingbird'?”, we want to retrieve the most relevant passage to that query.

This will yield a list of sorted documents by their score:

The corpus_id is the index from the input list of documents, the score and input text.

You can try it out directly in your browser . Big thanks to the Hugging Face Team and Joshua Lochner for providing the web interface and helping out in general!

Learn more about or .

The reranking process from query to ranking

The reranking process from query to ranking

Upcoming API Integration

We are currently working hard to make the models available through our endpoint, so you won’t have to worry about hosting and infrastructure on your end. The usage via the API will also provide some additional benefits, which we’ll announce soon. Stay tuned!

Evaluation: Best In Class

We benchmarked our models against other models on common benchmarks like BEIR (using a subset). First, we benchmarked for NDCG@10, a measure of the overall quality of search results, factoring in the position of the relevant documents in the list of search results and the results’ relevance grades with a heavier weighting of results higher in the list. Additionally, we tested for Accuracy@3, the number of search queries for which the model includes a highly relevant search result in the top three results. Accuracy is a particularly relevant benchmark for the real-life use cases of search and other tasks like RAG.

Now, we are going to present the evaluation results of our models on a subset of 11 BEIR datasets. The subset was chosen for its appropriate ratio between computational demand and real-world applicability. Please note that our models have never seen any samples from these evaluation datasets, while current models regularly suffer from severe data leakage.

First we compare the performance with NDCG@10:

Comparison of overall relevance scores between the mixedbread rerank family and other models

Comparison of overall relevance scores between the mixedbread rerank family and other models

Clearly, all of our models provide a significant boost over regular lexical (keyword-based) search on the overall relevance of search results. Even more, they consistently outperform current models of the same size or even larger ones, including embeddings-based semantic search models. Now, we benchmark our rerank models for accuracy:

ModelBEIR Accuracy (11 datasets)
Lexical Search (Pyserini)66.4
Comparison of accuracy scores between the mixedbread rerank family and other models

As the data shows, the mixedbread rerank models again consistently perform on par with or even stronger than the other currently available models, especially when factoring in a comparison of model sizes. This also includes embeddings-based semantic search models. The accuracy metric is particularly relevant because it reflects the real-world user experience of searching for information and expecting the most relevant result to show up on the screen at the first glance. You can find more information regarding the benchmarks .

As an inspiration, it has to be noted that using the rerank models as a second stage after embeddings-based semantic search, rather than keyword-based search, will yield even more awesome results!

Build Amazing Things With Rerank Models

It’s our firm belief that Open Sourcing the mixedbread rerank models will help the community build amazing things, given the clear benefits of our model family:

  • Simplicity: The rerank step is just one line of code away from boosting your search performance.
  • Practicability: Our models can boost existing systems instead of requiring their replacement.
  • Performance: We deliver State-of-the-Art performance, built for real-world use cases.

So, what are you waiting for? Go to , and see for yourself!

Give Us Feedback

This our first open model release, and we welcome any feedback to make our models better and refine their user-friendliness or capabilities. Please let us know if you’re hungry for any new features or have encountered any issues. We value your feedback!

Please share your feedback and thoughts through our . We are here to help and also always happy to chat about the exciting field of machine learning!