---
title: Adding Models
description: Configure custom models for the SIE inference engine.
canonical_url: https://superlinked.com/docs/engine/adding-models
last_updated: 2026-05-18
---

Add any HuggingFace model by creating a config file. No code changes required.

---

## Directory Layout

Source: [packages/sie_server/models/](https://github.com/superlinked/sie/blob/main/packages/sie_server/models/)

Model configs are flat YAML files in the models directory, named `{Org}__{name}.yaml` — the org and model name from the HuggingFace ID joined by a double underscore, with original casing preserved.

```
models/
  BAAI__bge-m3.yaml
  my-org__my-custom-model.yaml
```

For Docker deployments, mount your custom models directory:

```bash
docker run --gpus all -p 8080:8080 \
  -v /path/to/custom-models:/app/models:ro \
  ghcr.io/superlinked/sie-server:latest-cuda12-default
```

---

## Config File Structure

Source: [packages/sie_server/src/sie_server/config/model.py](https://github.com/superlinked/sie/blob/main/packages/sie_server/src/sie_server/config/model.py)

Each model needs a config YAML file. Here is a minimal example:

```yaml
name: my-org/my-model
hf_id: my-org/my-model
adapter: sie_server.adapters.pytorch_embedding:PyTorchEmbeddingAdapter
inputs:
  - text
outputs:
  - dense
dims:
  dense: 768
max_sequence_length: 512
```

---

## Required Fields

| Field | Type | Description |
|-------|------|-------------|
| `name` | string | Model name used in API requests |
| `hf_id` | string | HuggingFace model ID for weight download |
| `adapter` | string | Adapter class path (see adapters below) |
| `inputs` | list | Input modalities: `text`, `image`, `audio`, `video` |
| `outputs` | list | Output types: `dense`, `sparse`, `multivector`, `score`, `extract` |
| `dims` | object | Embedding dimensions per output type |

### Weight Source

At least one weight source is required (unless using `base_model`):

| Field | Description |
|-------|-------------|
| `hf_id` | HuggingFace model ID (e.g., `BAAI/bge-m3`) |
| `weights_path` | Local path to weights (takes precedence over `hf_id`) |

### Adapter Resolution

Specify how the model should be loaded:

| Field | Description |
|-------|-------------|
| `adapter` | Adapter path: `module:Class` or `file.py:Class` |
| `base_model` | Inherit adapter from another model |

---

## Optional Fields

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `max_sequence_length` | int | 512 | Maximum input tokens |
| `pooling` | string | null | Pooling strategy: `cls`, `mean`, `last_token`, `splade`, `none` |
| `normalize` | bool | true | L2-normalize output embeddings |
| `max_batch_tokens` | int | 16384 | Maximum tokens per batch |
| `compute_precision` | string | null | Override precision: `float16`, `bfloat16`, `float32` |

---

## Profiles

Source: [packages/sie_server/models/BAAI__bge-m3.yaml](https://github.com/superlinked/sie/blob/main/packages/sie_server/models/BAAI__bge-m3.yaml)

Profiles define named combinations of runtime options. One profile must have `is_default: true`.

```yaml
profiles:
  default:
    is_default: true
  sparse:
    output_types:
      - sparse
  banking:
    lora_id: saivamshiatukuri/bge-m3-banking77-lora
    instruction: "Classify banking intent"
```

### Adapter Options

Options split into loadtime (require reload) and runtime (per-request override):

```yaml
adapter_options_loadtime:
  attn_implementation: sdpa
  compute_precision: bfloat16

adapter_options_runtime:
  query_template: 'Instruct: {instruction}\nQuery:{text}'
  default_instruction: "Retrieve relevant passages"
```

---

## Available Adapters

Source: [packages/sie_server/src/sie_server/adapters/](https://github.com/superlinked/sie/blob/main/packages/sie_server/src/sie_server/adapters/)

| Adapter | Use Case |
|---------|----------|
| `sie_server.adapters.pytorch_embedding:PyTorchEmbeddingAdapter` | Standard embedding models |
| `sie_server.adapters.bge_m3_flash:BGEM3FlashAdapter` | BGE-M3 with flash attention |
| `sie_server.adapters.cross_encoder:CrossEncoderAdapter` | Reranking models |
| `sie_server.adapters.gliner:GLiNERAdapter` | Entity extraction models |
| `sie_server.adapters.clip:CLIPAdapter` | CLIP vision-text models |
| `sie_server.adapters.colbert:ColBERTAdapter` | Multi-vector (ColBERT) models |

---

## Complete Example

A full config with profiles, targets, and runtime options:

```yaml
name: sentence-transformers/all-MiniLM-L6-v2
hf_id: sentence-transformers/all-MiniLM-L6-v2
adapter: sie_server.adapters.pytorch_embedding:PyTorchEmbeddingAdapter
inputs:
  - text
outputs:
  - dense
dims:
  dense: 384
max_sequence_length: 256
pooling: mean
normalize: true
max_batch_tokens: 16384

profiles:
  default:
    is_default: true

adapter_options_runtime:
  pooling: mean
  normalize: true
```

---

## Testing Your Model

After creating the config, verify the model loads and produces correct outputs.

### 1. Start the server

```bash
docker run --gpus all -p 8080:8080 \
  -v /path/to/custom-models:/app/models:ro \
  ghcr.io/superlinked/sie-server:latest-cuda12-default
```

### 2. Check model is listed

```bash
curl http://localhost:8080/v1/models | jq '.models[].name'
```

### 3. Generate embeddings

#### Python

```python
from sie_sdk import SIEClient
from sie_sdk.types import Item

client = SIEClient("http://localhost:8080")
result = client.encode("my-org/my-model", Item(text="test input"))
print(result["dense"].shape)  # Should match dims.dense
```

#### TypeScript

```typescript
import { SIEClient } from "@superlinked/sie-sdk";

const client = new SIEClient("http://localhost:8080");
const result = await client.encode("my-org/my-model", { text: "test input" });
console.log(result.dense?.length); // Should match dims.dense
```

### 4. Run quality eval

```bash
mise run eval my-org/my-model -t mteb/NanoFiQA2018Retrieval --type quality -s sie
```

---

## Hot Reload

Source: [packages/sie_server/src/sie_server/core/hot_reload.py](https://github.com/superlinked/sie/blob/main/packages/sie_server/src/sie_server/core/hot_reload.py)

The server monitors the models directory for changes. Add new configs without restarting:

1. Create a new `models/{org}-{name}.yaml` file
2. The server detects the new config automatically
3. Model weights load on first request

For Docker, the mounted volume updates are detected. Changes to existing configs require a server restart.

For adding models to a running cluster without filesystem changes, use the [Config API](/docs/engine/config-api/).

---

## What's Next

- [Config API](/docs/engine/config-api/) - add models at runtime via REST
- [Model Catalog](/models) - browse 85+ supported models
- [Benchmarking](/docs/evals/) - evaluate model quality and performance
