How to deploy SIE
SIE deploys as a single container with no external dependencies. There are two deployment paths: Docker for simplicity, and Kubernetes for scaling and high availability. Both use the same image. There is no separate production build.
Which Deployment Path Should I Use?
Section titled “Which Deployment Path Should I Use?”Use Docker if:
- You are running on a single server or VM
- You are in development or running a low-traffic service
- You want the simplest possible setup
Use Kubernetes if:
- You need horizontal scaling or autoscaling to zero
- You need high availability across multiple nodes
- You are deploying on GCP or AWS with GPU node pools
| Docker | Kubernetes | |
|---|---|---|
| Setup time | Minutes | Hours |
| Scaling | Manual | Automatic |
| High availability | No | Yes |
| Scale-to-zero | No | Yes |
| Best for | Dev, single-server | Production, high traffic |
See Kubernetes on GCP and Kubernetes on AWS for cloud-specific guides.
Getting Started With Docker
Section titled “Getting Started With Docker”The fastest way to run SIE is a single docker run:
# CPU onlydocker run -p 8080:8080 ghcr.io/superlinked/sie-server:latest-cpu-default
# With GPU (recommended)docker run --gpus all -p 8080:8080 ghcr.io/superlinked/sie-server:latest-cuda12-defaultThe server starts on port 8080. Models load on first request with no pre-configuration needed.
Common Options
Section titled “Common Options”# Persistent model cache (avoids re-downloading on restart)docker run --gpus all \ -p 8080:8080 \ -v ~/.cache/sie:/root/.cache/sie \ ghcr.io/superlinked/sie-server:default
# Custom portdocker run --gpus all -p 3000:8080 ghcr.io/superlinked/sie-server:latest-cuda12-default
# Specific models only (faster startup)docker run --gpus all -p 8080:8080 ghcr.io/superlinked/sie-server:latest-cuda12-default \ sie-server serve -m BAAI/bge-m3,BAAI/bge-reranker-v2-m3
# Persistent model cache (skip re-downloads)docker run --gpus all -p 8080:8080 \ -v ~/.cache/huggingface:/app/.cache/huggingface \ ghcr.io/superlinked/sie-server:latest-cuda12-default
# Different bundle (e.g. SGLang backend for large LLM embeddings)docker run --gpus all -p 8080:8080 ghcr.io/superlinked/sie-server:latest-cuda12-sglangSee the full Docker deployment guide.
What Hardware Does SIE Need?
Section titled “What Hardware Does SIE Need?”Minimum Specs
Section titled “Minimum Specs”| Component | Minimum | Recommended |
|---|---|---|
| CPU | 4 cores | 8+ cores |
| RAM | 8GB | 16GB+ |
| GPU | Optional | Any NVIDIA with 16GB+ VRAM |
| Disk | 20GB | 100GB+ for model cache |
GPU Recommendations by Workload
Section titled “GPU Recommendations by Workload”| GPU | VRAM | Best for |
|---|---|---|
| T4 | 16GB | Development, light production |
| L4 | 24GB | Standard production (recommended starting point) |
| A100 40GB | 40GB | High-throughput or large model serving |
| A100 80GB | 80GB | 7B+ parameter models |
See Hardware and Capacity for full sizing guidance.
When Should I Move to Kubernetes?
Section titled “When Should I Move to Kubernetes?”Move from Docker to Kubernetes when you need:
- Autoscaling to handle traffic spikes by spinning up additional workers
- Scale-to-zero to save costs by scaling down during idle periods
- High availability with multiple replicas to survive node failures
- Multi-region deployment to serve users in different geographies
Note: Kubernetes clusters with scale-to-zero have cold start times of 5 to 7 minutes. Use wait_for_capacity=True in the Python SDK (or waitForCapacity: true in TypeScript) to handle this gracefully. See Scale-from-Zero and Autoscaling.
Frequently Asked Questions
Section titled “Frequently Asked Questions”Can SIE run without a GPU? Yes. SIE runs on CPU and works well for development and low-traffic workloads. For production inference at scale, a GPU is strongly recommended, especially for batch encoding. See Hardware and Capacity.
How do I monitor a SIE deployment? SIE exposes Prometheus metrics and structured logs. See Monitoring and Observability for dashboards, alerting, and log configuration.
How do I tune SIE for better performance? The main levers are batch size, worker concurrency, and model preloading. See Performance Tuning for a step-by-step guide.
How do I upgrade SIE without downtime? See the Upgrade Runbook for rolling upgrade procedures on both Docker and Kubernetes.
Is there a managed cloud option? Superlinked offers managed SIE deployments for teams that do not want to manage infrastructure themselves. Contact us to learn more.