Models

What is BGE-M3?

BGE-M3 is an open-source text embedding model developed by BAAI (Beijing Academy of Artificial Intelligence) that supports three retrieval modes simultaneously: dense retrieval, sparse retrieval, and multi-vector (ColBERT-style) retrieval. It supports 100+ languages and is one of the highest-performing general-purpose embedding models available.

Why is BGE-M3 significant?

Most embedding models only support one retrieval mode, typically dense (single vector) retrieval. BGE-M3 is unusual in that a single model can produce all three types of representations:

Dense vectors: a single fixed-size vector per text, used for standard semantic search
Sparse vectors: term-weighted representations similar to BM25, good for keyword-sensitive queries
Multi-vectors: one vector per token (ColBERT-style), enabling fine-grained token-level matching

This means you can run hybrid retrieval using a single model, and combine all three signals for maximum accuracy without deploying multiple models.

BGE-M3 capabilities at a glance

Capability	Detail
Languages	100+
Max input length	8,192 tokens
Retrieval modes	Dense, sparse, multi-vector
Model size	~570M parameters
Open source	✓ (Apache 2.0)
MTEB performance	Top-tier general-purpose

How does BGE-M3 work?

BGE-M3 is based on the XLM-RoBERTa architecture, extended and fine-tuned using a multi-stage training process:

RetroMAE pre-training: improves the model’s general text understanding
Multi-task fine-tuning: trains the model across dense, sparse, and multi-vector objectives simultaneously
Self-knowledge distillation: uses the model’s own multi-vector output to improve dense and sparse representations

The result is a single model that outperforms specialised models in each individual retrieval mode.

When should you use BGE-M3?

BGE-M3 is a strong default choice for:

Multilingual search: supports 100+ languages with a single model
Long documents: 8,192 token context handles entire pages or legal clauses
Hybrid retrieval: produce dense + sparse vectors from one model
RAG pipelines: reliable performance across diverse document types

For highly specialised domains (legal, medical, code), consider a LoRA adapter fine-tuned on domain data on top of BGE-M3.

How do you run BGE-M3 with SIE?

SIE supports BGE-M3 out of the box across all three retrieval modes:

from sie_sdk import SIEClient
from sie_sdk.types import Item

client = SIEClient("http://localhost:8080")

# Dense retrieval (standard semantic search)
dense_results = client.encode("BAAI/bge-m3", [Item(text=d) for d in documents])
dense_vectors = [r["dense"] for r in dense_results]

# With sparse vectors for hybrid retrieval
hybrid_results = client.encode(
    "BAAI/bge-m3",
    [Item(text=d) for d in documents],
    output_types=["dense", "sparse"],
)
hybrid_dense = [r["dense"] for r in hybrid_results]
hybrid_sparse = [r["sparse"] for r in hybrid_results]

# With a domain LoRA adapter
legal_results = client.encode(
    "BAAI/bge-m3",
    [Item(text=d) for d in documents],
    options={"lora_id": "org/bge-m3-legal-lora"},
)
legal_vectors = [r["dense"] for r in legal_results]

SIE’s self-hosted deployment means your documents never leave your AWS or GCP environment, and GPU batching makes encoding large corpora significantly faster than managed API calls.

BGE-M3 vs other embedding models

Model	Multilingual	Max tokens	Retrieval modes	Self-hostable
BGE-M3	✓ (100+)	8,192	Dense + sparse + multi-vector	✓
E5-large	Limited	512	Dense	✓
OpenAI text-embedding-3	✓	8,191	Dense	✗
Cohere Embed v3	✓	512	Dense + sparse	✗

Frequently asked questions

Is BGE-M3 free to use? Yes. BGE-M3 is released under the Apache 2.0 licence and can be used freely for commercial applications. SIE is also Apache 2.0 licensed.

How does BGE-M3 compare to OpenAI’s embedding models? On MTEB benchmarks, BGE-M3 is competitive with OpenAI’s text-embedding-3-large, particularly for multilingual and long-document tasks. The key advantage is that BGE-M3 is fully self-hostable, with no per-token fees and no data leaving your infrastructure.

Can BGE-M3 be fine-tuned for specific domains? Yes. SIE supports LoRA hot-loading, allowing you to apply domain-specific fine-tuned adapters to BGE-M3 at inference time without restarting the server.