---
title: Choosing a Model
description: How to pick the right embedding model for your use case.
canonical_url: https://superlinked.com/docs/choosing
last_updated: 2026-05-18
---

SIE supports 85+ models. This guide helps you pick the right one based on your use case, language requirements, and performance needs.

## Quick Recommendations

| Use Case | Recommended Model | Why |
|----------|-------------------|-----|
| **English-only, balanced** | `NovaSearch/stella_en_400M_v5` | Strong MTEB scores, efficient size |
| **English-only, max quality** | `nvidia/NV-Embed-v2` | Top MTEB scores, 4096 dims |
| **Speed-optimized** | `sentence-transformers/all-MiniLM-L6-v2` | 22M params, 384 dims, very fast |
| **Multilingual** | `BAAI/bge-m3` | 100+ languages, also supports sparse + multivector |
| **Hybrid search** | `BAAI/bge-m3` or `naver/splade-v3` | Dense + sparse from one model, or dedicated sparse |
| **Late interaction (ColBERT)** | `jinaai/jina-colbert-v2` | Best ColBERT quality, multilingual |
| **Vision / image search** | `google/siglip-so400m-patch14-384` | Image-text similarity |
| **Multilingual, fast** | `Qwen/Qwen3-Embedding-0.6B` | 1024 dims, 32K context, 100+ languages |
| **Document vision (PDF)** | `vidore/colpali-v1.3-hf` | Visual document retrieval |
| **ColBERT reranking** | `answerdotai/answerai-colbert-small-v1` | Fast MaxSim reranking; also `jina-colbert-v2`, `GTE-ModernColBERT-v1` |
| **Reranking (multilingual)** | `BAAI/bge-reranker-v2-m3` | Strong cross-language reranking |
| **Reranking (English)** | `mixedbread-ai/mxbai-rerank-large-v2` | High quality, 8192 max length |
| **Entity extraction** | `urchade/gliner_multi-v2.1` | Zero-shot NER, multilingual |

---

## Decision Guide

| Use Case | Scenario | Recommended Models |
|----------|----------|--------------------|
| **Semantic Search / RAG** | English-only | `stella_en_400M_v5`, `NV-Embed-v2`, `all-MiniLM-L6-v2` |
| | Multilingual | `BAAI/bge-m3` |
| | Hybrid (dense + sparse) | `BAAI/bge-m3` + `naver/splade-v3` |
| **Image Search** | Text ↔ Image | `SigLIP`, `CLIP` |
| | Visual docs | `ColPali` |
| **Reranking** | Multilingual | `BAAI/bge-reranker-v2-m3` |
| | English | `mixedbread-ai/mxbai-rerank-large-v2` |
| **Entity Extraction** | NER | `GLiNER` |
| | Relations | `GLiREL` |
| | Classification | `GLiClass` |

---

## Tradeoff Axes

### Quality vs Speed vs Memory

| Model | Params | Dims | VRAM | Relative Speed | Quality |
|-------|--------|------|------|----------------|---------|
| all-MiniLM-L6-v2 | 22M | 384 | ~200MB | Fastest | Good |
| stella_en_400M_v5 | 400M | 1024 | ~1.5GB | Fast | Very good |
| bge-m3 | 568M | 1024 | ~2GB | Fast | Very good |
| NV-Embed-v2 | 7B | 4096 | ~14GB | Slow | Best |

**Rule of thumb:** For English, start with `stella_en_400M_v5`. For multilingual or hybrid search, use `BAAI/bge-m3`. Only move to 7B+ models if benchmarks show a meaningful gap on your data.

### Dense vs Sparse vs Multi-vector

| Output Type | Storage | Search Speed | Quality | Best For |
|-------------|---------|-------------|---------|----------|
| Dense | Small (1024 floats) | Fast | Good | Standard semantic search |
| Sparse | Variable | Fast | Good for keywords | Hybrid search, keyword matching |
| Multi-vector (ColBERT) | Large (N * 128 floats) | Slower | Best | When accuracy is critical |

**Recommendation:** Use dense for most cases. Add sparse for hybrid search if you need keyword matching. Use multi-vector only when you need the best possible retrieval quality and can afford the storage.

---

## Language Support

| Language Need | Models |
|--------------|--------|
| English only | Stella, NV-Embed-v2, all-MiniLM, GTE-Qwen2 |
| Multilingual (100+ languages) | BGE-M3, multilingual-e5-large, Qwen3-Embedding-0.6B |
| Chinese-focused | GTE-Qwen2, BGE-M3 |

---

## GPU Memory Planning

| GPU | VRAM | Models That Fit |
|-----|------|-----------------|
| T4 | 16GB | Most models up to ~1B params |
| L4 | 24GB | All standard models, 2-3 loaded simultaneously |
| A100 40GB | 40GB | Large models, 5+ loaded simultaneously |
| A100 80GB | 80GB | 7B+ parameter models (NV-Embed-v2, e5-mistral-7b) |

With LRU eviction, you can serve all 85+ models from a single GPU - only the most recently used models stay in memory.

---

## When to Add Reranking

Almost always. Two-stage retrieval (retrieve with embeddings, then rerank with a cross-encoder) consistently improves quality:

1. **Retrieve** 20-50 candidates with dense embeddings (fast)
2. **Rerank** to top 5-10 with a cross-encoder (more accurate)

The reranker sees both query and document together, enabling deeper semantic matching than embedding similarity alone.

#### Python

```python
# Stage 1: Fast retrieval
results = vector_db.search(query_embedding, k=20)

# Stage 2: Accurate reranking
reranked = client.score(
    "mixedbread-ai/mxbai-rerank-large-v2",
    query=Item(text="What is machine learning?"),
    items=[Item(text=r.text) for r in results]
)
```

#### TypeScript

```typescript
// Stage 1: Fast retrieval
const results = await vectorDb.search(queryEmbedding, { k: 20 });

// Stage 2: Accurate reranking
const reranked = await client.score(
  "mixedbread-ai/mxbai-rerank-large-v2",
  { text: "What is machine learning?" },
  results.map(r => ({ text: r.text })),
);
```

---

## When to Use Hybrid Search

Add sparse embeddings when your data has:
- **Domain-specific terminology** that dense models might miss
- **Exact keyword matching** requirements (product codes, identifiers)
- **Mixed content** where some queries are keyword-like and others are semantic

#### Python

```python
# Get both dense and sparse from one model
result = client.encode(
    "BAAI/bge-m3",
    Item(text="your text"),
    output_types=["dense", "sparse"]
)
# Use dense for semantic search, sparse for keyword matching
# Combine scores for hybrid retrieval
```

#### TypeScript

```typescript
// Get both dense and sparse from one model
const result = await client.encode(
  "BAAI/bge-m3",
  { text: "your text" },
  { outputTypes: ["dense", "sparse"] },
);
// Use dense for semantic search, sparse for keyword matching
// Combine scores for hybrid retrieval
```

---

## What's Next

- [Model Catalog](/models) - full list of all supported models
- [Sparse Embeddings](/docs/encode/sparse/) - hybrid search patterns
- [Multi-vector / ColBERT](/docs/encode/multivector/) - late interaction retrieval
- [Quantization](/docs/encode/quantization/) - reduce embedding size for storage