Search & Retrieval

What is Multi-Vector Search?

Multi-vector search is a retrieval technique where each document is represented by multiple vectors (one per token or passage) rather than a single fixed-size vector. At query time, the query’s token vectors are compared against all document token vectors, enabling fine-grained token-level matching that captures nuanced relevance signals that single-vector retrieval misses.

Why does multi-vector search matter?

Single-vector retrieval compresses an entire document into one vector, losing fine-grained detail in the process. A query about a specific clause in a legal contract, or a precise technical term in a research paper, may not match well against a document-level summary vector, even if the exact answer is present in the document.

Multi-vector search solves this by preserving token-level representations. The matching happens at the token level, so a specific query term can find its exact counterpart in a long document, even if the overall document is only partially relevant.

How does multi-vector search work?

Instead of pooling token representations into one vector:

Encode document → retain one vector per token: [v₁, v₂, ..., vₙ]
Encode query → retain one vector per token: [q₁, q₂, ..., qₘ]
Score with MaxSim → for each query token, find its maximum similarity across all document tokens, then sum:

Score(Q, D) = Σᵢ max_j (qᵢ · dⱼ)

This is the ColBERT scoring mechanism. Every query token gets matched to its best corresponding document token, and these scores are summed into a final relevance score.

Multi-vector vs single-vector vs sparse retrieval

	Single-vector	Multi-vector (ColBERT)	Sparse (BM25)
Vectors per doc	1	N (one per token)	Vocab-size sparse
Captures semantics	✓	✓ (token-level)	✗
Handles exact terms	✗	✓	✓
Storage cost	Low	High	Medium
Retrieval speed	Fastest	Slower	Fast
Accuracy	Good	Highest	Good for keywords

Multi-vector retrieval achieves the highest accuracy but at significant storage cost: a 512-token document produces 512 vectors instead of 1.

What is BGE-M3’s multi-vector capability?

BGE-M3 is unique in supporting all three retrieval modes from a single model, including multi-vector. This means you can produce ColBERT-style multi-vector representations without a separate model:

from sie_sdk import SIEClient
from sie_sdk.types import Item

client = SIEClient("http://localhost:8080")

# Encode with multi-vector (ColBERT-style) output
results = client.encode(
    "BAAI/bge-m3",
    [Item(text=d) for d in documents],
    output_types=["dense", "sparse", "multivector"],
)

dense_vectors = [r["dense"] for r in results]
sparse_vectors = [r["sparse"] for r in results]
colbert_vectors = [r["multivector"] for r in results]  # one [num_tokens, 128] array per doc

You can then combine all three signals for maximum retrieval accuracy, the approach used in BGE-M3’s MIRACL and BEIR benchmark results.

When should you use multi-vector search?

Multi-vector retrieval is worth the extra storage and compute when:

High-precision retrieval is critical: legal, medical, or compliance document search where missing a relevant clause has real consequences
Long documents: single vectors compress too much information out of long texts; token-level matching preserves it
Specific term lookup: when queries contain precise technical terms that need exact matching alongside semantic understanding
You’re combining with reranking: use multi-vector for first-stage retrieval to maximise recall, then a reranker for precision

For most general-purpose search, single-vector with a reranker achieves comparable quality at lower infrastructure cost.

Storage considerations for multi-vector

A 512-token document produces 512 vectors of 128 dimensions each (ColBERT uses smaller per-token dimensions). For 1 million documents:

Single-vector (768 dims, float32): ~3GB
Multi-vector ColBERT (512 tokens × 128 dims): ~256GB

This is why multi-vector is used selectively, often for a high-value subset of your corpus, with single-vector covering the rest.

Qdrant and Weaviate both support multi-vector indexing natively.

Frequently asked questions

Is multi-vector search the same as ColBERT? ColBERT is the most prominent multi-vector retrieval architecture. Multi-vector search is the broader category; ColBERT is one implementation using late interaction (MaxSim scoring).

Can I use multi-vector retrieval with any vector database? Not all vector databases support multi-vector natively. Qdrant supports it via multi-vectors. Weaviate has ColBERT support. Check your vector DB’s documentation before committing to a multi-vector approach.

Does SIE support multi-vector encoding? Yes. BGE-M3 on SIE can return ColBERT-style token vectors alongside dense and sparse representations in a single encode call.