---
title: How to deploy SIE
description: SIE deploys as a single Docker container with no external dependencies. Start with Docker for simple or single-server deployments. Move to Kubernetes when you need autoscaling or high availability.
canonical_url: https://superlinked.com/docs/deployment
last_updated: 2026-05-18
---

**SIE deploys as a single container with no external dependencies.** There are two deployment paths: Docker for simplicity, and Kubernetes for scaling and high availability. Both use the same image. There is no separate production build.

---

## Which Deployment Path Should I Use?

**Use Docker if:**
- You are running on a single server or VM
- You are in development or running a low-traffic service
- You want the simplest possible setup

**Use Kubernetes if:**
- You need horizontal scaling or autoscaling to zero
- You need high availability across multiple nodes
- You are deploying on GCP or AWS with GPU node pools

| | Docker | Kubernetes |
|---|---|---|
| Setup time | Minutes | Hours |
| Scaling | Manual | Automatic |
| High availability | No | Yes |
| Scale-to-zero | No | Yes |
| Best for | Dev, single-server | Production, high traffic |

See [Kubernetes on GCP](https://superlinked.com/docs/deployment/cloud-gcp/) and [Kubernetes on AWS](https://superlinked.com/docs/deployment/cloud-aws/) for cloud-specific guides.

---

## Getting Started With Docker

The fastest way to run SIE is a single `docker run`:

```bash
# CPU only
docker run -p 8080:8080 ghcr.io/superlinked/sie-server:latest-cpu-default

# With GPU (recommended)
docker run --gpus all -p 8080:8080 ghcr.io/superlinked/sie-server:latest-cuda12-default
```

The server starts on port 8080. Models load on first request with no pre-configuration needed.

### Common Options

```bash
# Persistent model cache (avoids re-downloading on restart)
docker run --gpus all \
  -p 8080:8080 \
  -v ~/.cache/sie:/root/.cache/sie \
  ghcr.io/superlinked/sie-server:default

# Custom port
docker run --gpus all -p 3000:8080 ghcr.io/superlinked/sie-server:latest-cuda12-default

# Specific models only (faster startup)
docker run --gpus all -p 8080:8080 ghcr.io/superlinked/sie-server:latest-cuda12-default \
  sie-server serve -m BAAI/bge-m3,BAAI/bge-reranker-v2-m3

# Persistent model cache (skip re-downloads)
docker run --gpus all -p 8080:8080 \
  -v ~/.cache/huggingface:/app/.cache/huggingface \
  ghcr.io/superlinked/sie-server:latest-cuda12-default

# Different bundle (e.g. SGLang backend for large LLM embeddings)
docker run --gpus all -p 8080:8080 ghcr.io/superlinked/sie-server:latest-cuda12-sglang
```

See the full [Docker deployment guide](https://superlinked.com/docs/deployment/docker/).

---

## What Hardware Does SIE Need?

### Minimum Specs

| Component | Minimum | Recommended |
|---|---|---|
| CPU | 4 cores | 8+ cores |
| RAM | 8GB | 16GB+ |
| GPU | Optional | Any NVIDIA with 16GB+ VRAM |
| Disk | 20GB | 100GB+ for model cache |

### GPU Recommendations by Workload

| GPU | VRAM | Best for |
|---|---|---|
| T4 | 16GB | Development, light production |
| L4 | 24GB | Standard production (recommended starting point) |
| A100 40GB | 40GB | High-throughput or large model serving |
| A100 80GB | 80GB | 7B+ parameter models |

See [Hardware and Capacity](https://superlinked.com/docs/deployment/resources/) for full sizing guidance.

---

## When Should I Move to Kubernetes?

Move from Docker to Kubernetes when you need:

- **Autoscaling** to handle traffic spikes by spinning up additional workers
- **Scale-to-zero** to save costs by scaling down during idle periods
- **High availability** with multiple replicas to survive node failures
- **Multi-region** deployment to serve users in different geographies

Note: Kubernetes clusters with scale-to-zero have cold start times of 5 to 7 minutes. Use `wait_for_capacity=True` in the Python SDK (or `waitForCapacity: true` in TypeScript) to handle this gracefully. See [Scale-from-Zero and Autoscaling](https://superlinked.com/docs/deployment/autoscaling/).

---

## Frequently Asked Questions

**Can SIE run without a GPU?**
Yes. SIE runs on CPU and works well for development and low-traffic workloads. For production inference at scale, a GPU is strongly recommended, especially for batch encoding. See [Hardware and Capacity](https://superlinked.com/docs/deployment/resources/).

**How do I monitor a SIE deployment?**
SIE exposes Prometheus metrics and structured logs. See [Monitoring and Observability](https://superlinked.com/docs/deployment/monitoring/) for dashboards, alerting, and log configuration.

**How do I tune SIE for better performance?**
The main levers are batch size, worker concurrency, and model preloading. See [Performance Tuning](https://superlinked.com/docs/deployment/tuning/) for a step-by-step guide.

**How do I upgrade SIE without downtime?**
See the [Upgrade Runbook](https://superlinked.com/docs/deployment/upgrades/) for rolling upgrade procedures on both Docker and Kubernetes.

**Is there a managed cloud option?**
Superlinked offers managed SIE deployments for teams that do not want to manage infrastructure themselves. [Contact us](https://superlinked.com/) to learn more.
