Mixedbread - Blog

NLP is a fast-moving field. We want to share our insights and continue to learn with you. Writing about research, software, and more.

Open Source Gets DE-licious:
Mixedbread x deepset German/English Embeddings

Open Source Gets DE-licious: Mixedbread x deepset German/English Embeddings

Introducing deepset-mxbai-embed-large-v1, a new open-source German/English embedding model, developed through collaboration between deepset and Mixedbread. This model sets a new performance standard among open source peers, supporting binary quantization and Matryoshka representation learning for significant cost reductions. Outperforming domain-specific alternatives in real-world applications, it offers 97%+ infrastructure cost savings through binary MRL.

July 18, 20246 min read

View Article
64 bytes per embedding, yee-haw 🤠

64 bytes per embedding, yee-haw 🤠

Binary MRL combines two popular approaches to deal with the scalability issues of embeddings. It helps our embedding model achieve a 64x gain in efficiency while retaining more than 90% of performance, drastically reducing infrastructure costs and enabling new applications.

April 12, 20248 min read

View Article
ColBERTus Maximus - Introducing mxbai-colbert-large-v1

ColBERTus Maximus - Introducing mxbai-colbert-large-v1

mxbai-colbert-large-v1 is a state-of-the-art ColBERT model for reranking and retrieval tasks. It is based on the mxbai-embed-large-v1 model and achieves state-of-the-art performance on 13 publicly available BEIR benchmarks. It's available on Hugging Face.

March 19, 20246 min read

View Article
Open Source Strikes Bread - New Fluffy Embedding Model

Open Source Strikes Bread - New Fluffy Embedding Model

Our English embedding model provides state-of-the-art performance among other efficiently sized models. It outperforms closed source models like OpenAI's text-embedding-v3.

March 8, 20246 min read

View Article
Fresh 2D-Matryoshka Embedding Model

Fresh 2D-Matryoshka Embedding Model

The 2D-🪆 model introduces a novel approach that enables you to reduce both the number of layers and the dimensions of embeddings within the model. This dual reduction strategy allows for a more compact model size while still delivering competitive performance compared to leading models, such as Nomic's embedding model. Specifically, reducing the model's layers by approximately 50% retains up to 85% of its original performance, even without additional training.

March 4, 20248 min read

View Article
Boost Your Search With The Crispy mixedbread Rerank Models

Boost Your Search With The Crispy mixedbread Rerank Models

Introducing mixedbread rerank models - Upgrade your search results with our new, open-source reranking models from mixedbread. These models, available in three sizes, make it easier to find relevant results by adding a semantic layer to existing search systems. They're simple to use, work with your current setup, and are proven to boost performance with many traditional and semantic search models. Check them out for a more accurate, efficient search experience.

February 29, 20246 min read

View Article