Why did we open-source our inference engine? Read the post
← All Glossary Articles

What is BGE-M3?

BGE-M3 is an open-source text embedding model developed by BAAI (Beijing Academy of Artificial Intelligence) that supports three retrieval modes simultaneously: dense retrieval, sparse retrieval, and multi-vector (ColBERT-style) retrieval. It supports 100+ languages and is one of the highest-performing general-purpose embedding models available.


Why is BGE-M3 significant?

Most embedding models only support one retrieval mode, typically dense (single vector) retrieval. BGE-M3 is unusual in that a single model can produce all three types of representations:

  • Dense vectors: a single fixed-size vector per text, used for standard semantic search
  • Sparse vectors: term-weighted representations similar to BM25, good for keyword-sensitive queries
  • Multi-vectors: one vector per token (ColBERT-style), enabling fine-grained token-level matching

This means you can run hybrid retrieval using a single model, and combine all three signals for maximum accuracy without deploying multiple models.


BGE-M3 capabilities at a glance

CapabilityDetail
Languages100+
Max input length8,192 tokens
Retrieval modesDense, sparse, multi-vector
Model size~570M parameters
Open source✓ (Apache 2.0)
MTEB performanceTop-tier general-purpose

How does BGE-M3 work?

BGE-M3 is based on the XLM-RoBERTa architecture, extended and fine-tuned using a multi-stage training process:

  1. RetroMAE pre-training: improves the model’s general text understanding
  2. Multi-task fine-tuning: trains the model across dense, sparse, and multi-vector objectives simultaneously
  3. Self-knowledge distillation: uses the model’s own multi-vector output to improve dense and sparse representations

The result is a single model that outperforms specialised models in each individual retrieval mode.


When should you use BGE-M3?

BGE-M3 is a strong default choice for:

  • Multilingual search: supports 100+ languages with a single model
  • Long documents: 8,192 token context handles entire pages or legal clauses
  • Hybrid retrieval: produce dense + sparse vectors from one model
  • RAG pipelines: reliable performance across diverse document types

For highly specialised domains (legal, medical, code), consider a LoRA adapter fine-tuned on domain data on top of BGE-M3.


How do you run BGE-M3 with SIE?

SIE supports BGE-M3 out of the box across all three retrieval modes:

from sie_sdk import SIEClient
from sie_sdk.types import Item
client = SIEClient("http://localhost:8080")
# Dense retrieval (standard semantic search)
dense_results = client.encode("BAAI/bge-m3", [Item(text=d) for d in documents])
dense_vectors = [r["dense"] for r in dense_results]
# With sparse vectors for hybrid retrieval
hybrid_results = client.encode(
"BAAI/bge-m3",
[Item(text=d) for d in documents],
output_types=["dense", "sparse"],
)
hybrid_dense = [r["dense"] for r in hybrid_results]
hybrid_sparse = [r["sparse"] for r in hybrid_results]
# With a domain LoRA adapter
legal_results = client.encode(
"BAAI/bge-m3",
[Item(text=d) for d in documents],
options={"lora_id": "org/bge-m3-legal-lora"},
)
legal_vectors = [r["dense"] for r in legal_results]

SIE’s self-hosted deployment means your documents never leave your AWS or GCP environment, and GPU batching makes encoding large corpora significantly faster than managed API calls.


BGE-M3 vs other embedding models

ModelMultilingualMax tokensRetrieval modesSelf-hostable
BGE-M3✓ (100+)8,192Dense + sparse + multi-vector
E5-largeLimited512Dense
OpenAI text-embedding-38,191Dense
Cohere Embed v3512Dense + sparse

Frequently asked questions

Is BGE-M3 free to use? Yes. BGE-M3 is released under the Apache 2.0 licence and can be used freely for commercial applications. SIE is also Apache 2.0 licensed.

How does BGE-M3 compare to OpenAI’s embedding models? On MTEB benchmarks, BGE-M3 is competitive with OpenAI’s text-embedding-3-large, particularly for multilingual and long-document tasks. The key advantage is that BGE-M3 is fully self-hostable, with no per-token fees and no data leaving your infrastructure.

Can BGE-M3 be fine-tuned for specific domains? Yes. SIE supports LoRA hot-loading, allowing you to apply domain-specific fine-tuned adapters to BGE-M3 at inference time without restarting the server.


Open source inference for agents

Open-source inference for the models behind your agents. Run it yourself, or let us run it for you.

Github 2.1K

Contact us

Tell us about your use case and we'll get back to you shortly.

Apply for an inference grant

Free capacity on our hosted cluster for selected projects. Tell us what you run and we reply by email.