Mixedbread

Reranking Models

The Mixedbread rerank family is a collection of state-of-the-art, open-source reranking models designed to significantly enhance search accuracy across various domains. These models can be seamlessly integrated into existing search systems, offering best-in-class performance and easy implementation for improved user satisfaction in search results.

Boost your search with our crispy reranking models! The Mixedbread rerank family, , offers state-of-the-art performance across a large variety of domains and can be easily integrated into your existing search stack.

What's new in the Mixedbread rerank family?

We recently finished baking a fresh set of rerank models, the mxbai-rerank-v1 series. After receiving a wave of interest from the community, we're now happy to provide access to the model with the highest demand via our API:

ModelStatusContext Length (tokens)Description
512Delivers the highest accuracy and performance
API unavailable512Strikes balance between size and performance
API unavailable512Focuses on capacity-efficiency while retaining performance

Why Mixedbread rerank?

Not only are the Mixedbread rerank models powerful and fully open-source, they're also extremely easy to integrate into your current search stack. All you need to do is give the original search query as well as your search system's output to our reranking models, and they will tremendously boost your search accuracy - your users will love it!

We evaluated our models by letting them perform the reranking step on the top 100 lexical search results on a subset of the BEIR benchmark, a commonly used collection of evaluation datasets. Specifically, we used the NDCG@10 metric, which measures the overall relevance of the search results compared to the order in which they are ranked by the model, and the accuracy@3 metric, which measures the likelihood of a highly relevant search result appearing in the top 3 results - in our opinion, this is the most important metric to anticipate user satisfaction.

For illustrative purposes, we also included classic keyword search and a current full semantic search model in the evaluation. The results make us confident that our models show best-in-class performance in their size category:

Comparison of overall relevance scores between the Mixedbread rerank family and other models

Comparison of overall relevance scores between the Mixedbread rerank family and other models

ModelBEIR Accuracy (11 datasets)
Lexical Search (Pyserini)66.4
bge-reranker-base66.9
bge-reranker-large70.6
cohere-embed-v370.9
mxbai-rerank-xsmall-v170.0
mxbai-rerank-base-v172.3
mxbai-rerank-large-v174.9
Comparison of accuracy scores between the Mixedbread rerank family and other models

Why should you use our API?

To get started, you can easily use our open-source version of the models. However, the models provided through the API are trained on new data every month. This ensures that the models understand ongoing developments in the world and can identify the most relevant information for any questions they might be asked without a knowledge cutoff. Naturally, our quality control ensures that the models' performance always remains at least similar to previous versions.

Reranking Models: Your Secret Weapon for Killer Search Results

Hey there, search aficionado! πŸ‘‹ Ready to take your search game to the next level? Let's dive into the Mixedbread rerank family – your new best friends in the quest for perfect search results.

Meet the family: Mixedbread rerank models

We've baked up a fresh batch of reranking models, each with its own special flavor. Let's break 'em down:

ModelStatusContext LengthSuperpower
512 tokensThe heavyweight champ. Highest accuracy and performance.
API coming soon512 tokensThe all-rounder. Great balance of size and performance.
API coming soon512 tokensThe compact powerhouse. Small but mighty.

Why Choose Mixedbread Rerank?

  1. Open-source goodness: Peek under the hood, tweak to your heart's content.
  2. State-of-the-art performance: We're not just tooting our own horn – check out the benchmarks below!
  3. Easy peasy integration: Slip these models into your existing search stack faster than you can say "relevant results."

Show Me the Numbers!

Alright, data nerds (we say that with love), feast your eyes on these performance metrics:

Reranking Model Performance Comparison NDCG@10 scores across 11 BEIR datasets. Higher is better!

ModelBEIR Accuracy (11 datasets)
Lexical Search (Pyserini)66.4
bge-reranker-base66.9
bge-reranker-large70.6
cohere-embed-v370.9
mxbai-rerank-xsmall-v170.0
mxbai-rerank-base-v172.3
mxbai-rerank-large-v174.9

As you can see, our models are bringing their A-game, even outperforming some closed-source heavyweights!

Real-World Magic: Use Cases and Integration

These reranking models aren't just pretty numbers – they're problem solvers. Here's where they shine:

  1. E-commerce search: Help customers find that perfect product faster.
  2. Content recommendation: Serve up the most relevant articles, videos, or podcasts.
  3. Enterprise search: Make finding that needle in the company's document haystack a breeze.
  4. Customer support: Surface the most helpful support documents for customer queries.

Integration tip: These models work best as a second-stage reranker. Use your favorite first-stage retrieval method (Elasticsearch, anyone?) to grab the top 100-1000 results, then let our reranking models work their magic to bubble up the crème de la crème.

Show Me the Code!

Enough talk – let's see some action! Here's a quick example of how to use the reranking API:

import requests
import json
 
API_KEY = "your_api_key_here"
API_URL = "https://api.mixedbread.ai/v1/rerank"
 
def rerank_documents(query, documents):
    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {API_KEY}"
    }
 
    data = {
        "query": query,
        "documents": documents,
        "model": "mxbai-rerank-large-v1"
    }
 
    response = requests.post(API_URL, headers=headers, data=json.dumps(data))
    return response.json()["results"]
 
# Example usage
query = "What's the capital of France?"
documents = [
    "Paris is the capital of France.",
    "London is the capital of the UK.",
    "Berlin is the capital of Germany.",
    "France is a country in Europe.",
    "The Eiffel Tower is in Paris."
]
 
reranked_results = rerank_documents(query, documents)
 
for result in reranked_results:
    print(f"Score: {result['score']}, Document: {result['document']}")

FAQs: You've Got Questions, We've Got Answers

Q: Can I fine-tune these models on my own data? A: You bet! Our models are open-source, so fine-tune away. We're also exploring custom fine-tuning services – if you're interested!

Q: How do I choose between the different model sizes? A: It's all about balancing performance and resources. Start with the largest model your system can handle, then scale down if needed. The xsmall model is great for edge devices or resource-constrained environments.

Q: Can these models handle non-English text? A: Currently, our models are optimized for English. Multilingual support is on our roadmap – stay tuned!

What's Next?

Ready to supercharge your search? Here are your next steps:

  1. if you haven't already.
  2. Check out our for all the nitty-gritty details.
  3. Join our to share your experiences and get tips from other developers.

Happy reranking, and may your search results always be relevant! πŸš€πŸ”