Offline / Air-Gapped Deployment

Bring SIE up in a cluster with no public internet access. The worker pods normally pull model weights from HuggingFace and container images from GHCR; both of those need to come from inside your network instead.

This guide covers a typical air-gapped flow:

Snapshot model weights on a workstation that has internet access.
Mirror the snapshot to private S3-compatible storage reachable from the cluster.
Configure the chart to read weights from that store and skip HuggingFace.
Mirror the SIE container images to a private registry.
Verify first inference with no egress.

The same pattern works for “restricted egress” clusters that allow private object storage but block public HuggingFace.

1. Snapshot model weights

Use the hf CLI from the huggingface_hub package (huggingface-cli is the deprecated alias of the same tool and now prints a deprecation warning):

export HF_HUB_CACHE=./offline-weights

# One model
hf download BAAI/bge-m3 --cache-dir ./offline-weights

# A bundle's worth of models, repeated for each model in the bundle
hf download intfloat/e5-base-v2 --cache-dir ./offline-weights
hf download mixedbread-ai/mxbai-rerank-large-v1 --cache-dir ./offline-weights

The result is a directory in HuggingFace cache layout (./offline-weights/models--BAAI--bge-m3/snapshots/<sha>/...) that the chart can mount as HF_HUB_CACHE. The cache layout stores both blob files and snapshot symlinks, so the on-disk and mirrored sizes will be roughly 2x the model’s raw byte count. Expected, not duplication.

Set HF_TOKEN before running for any gated models.

2. Mirror to private storage

Push the snapshot to S3-compatible storage that the cluster can reach. AWS S3, GCS, MinIO, and Ceph all work; the chart treats them the same.

# AWS S3
aws s3 sync ./offline-weights s3://sie-models-private/weights/

# MinIO (in-cluster or on-prem)
mc mirror ./offline-weights minio/sie-models-private/weights/

# GCS
gsutil -m rsync -r ./offline-weights gs://sie-models-private/weights/

Whatever you choose, the URL handed to the chart in the next step must be reachable from worker pods.

3. Configure the cluster cache

Point the chart’s workers.common.clusterCache at the mirrored bucket. The Python sie-server adapter containers in worker pods read weights from there instead of HuggingFace.

# values-offline.yaml
workers:
  common:
    extraEnv:
      - name: SIE_HF_FALLBACK
        value: "false"

    clusterCache:
      enabled: true
      url: s3://sie-models-private/weights/   # or gs:// for GCS

    hfCache:
      home: /models/huggingface
      tokenSecret: ""

# Skip HF token wiring entirely in air-gapped clusters
hfToken:
  create: false

For S3, worker pods authenticate via IRSA (EKS) or static credentials supplied through extraEnv. For GCS, they use Workload Identity (GKE). For MinIO or other S3-compatibles, mount credentials via a secret and pass them through workers.common.extraEnv.

4. Mirror container images

The chart pulls these public SIE images from GHCR by default:

Image	Where it’s set	Tag form
`ghcr.io/superlinked/sie-server`	`workers.common.image.repository`	`vX.Y.Z-{platform}-{bundle}` (e.g. `v0.6.6-cuda12-default`)
`ghcr.io/superlinked/sie-server-sidecar`	`workers.common.workerSidecar.image.repository`	plain `vX.Y.Z`
`ghcr.io/superlinked/sie-gateway`	`gateway.image.repository`	plain `vX.Y.Z`
`ghcr.io/superlinked/sie-config`	`config.image.repository`	plain `vX.Y.Z`

sie-server is only published with -{platform}-{bundle} suffixes. ghcr.io/superlinked/sie-server:v0.6.6 (plain) does not exist, and the chart’s worker template assembles the full tag from workers.common.image.tag + -${platform}-${bundle} at install time.

The ghcr.io/superlinked/sie-server-sidecar image backs the SIE server sidecar in Kubernetes. Helm renders that sidecar as the worker-sidecar container for release compatibility.

The chart also pulls NATS images via the bundled nats sub-chart when nats.install=true, which is the default. For a truly air-gapped cluster, one where the cluster host has no public egress across the whole cluster, these must be mirrored too:

Image	Source
`nats:2.12.6-alpine`	docker.io / nats.io
`natsio/nats-server-config-reloader:0.21.1`	docker.io
`natsio/nats-box:0.19.3`	docker.io

If you enable optional sub-charts (keda.install=true, kube-prometheus-stack.install=true, dcgm-exporter.install=true, loki.install=true, alloy.install=true), each pulls additional images. Run helm template oci://ghcr.io/superlinked/charts/sie-cluster --version 0.6.6 -f values-offline.yaml | grep -oE 'image:.*' | sort -u to extract the full set for your config.

Mirror the SIE images once:

TAG=v0.6.6
PLATFORM=cuda12   # or `cpu` for a CPU-only worker pool
BUNDLE=default

# sie-server: platform/bundle suffix is required. There is no plain `:$TAG` tag.
docker pull ghcr.io/superlinked/sie-server:${TAG}-${PLATFORM}-${BUNDLE}
docker tag  ghcr.io/superlinked/sie-server:${TAG}-${PLATFORM}-${BUNDLE} \
            private-registry.example.com/sie/sie-server:${TAG}-${PLATFORM}-${BUNDLE}
docker push private-registry.example.com/sie/sie-server:${TAG}-${PLATFORM}-${BUNDLE}

# sie-gateway, sie-config, and sie-server-sidecar: plain version tag
for img in sie-gateway sie-config sie-server-sidecar; do
  docker pull ghcr.io/superlinked/$img:$TAG
  docker tag  ghcr.io/superlinked/$img:$TAG private-registry.example.com/sie/$img:$TAG
  docker push private-registry.example.com/sie/$img:$TAG
done

Note on architecture mismatch: docker pull on a host whose architecture differs from the cluster nodes’ (e.g. an arm64 Mac mirroring images for an amd64 EKS cluster) will silently pull the wrong platform unless you pass --platform, and the subsequent docker push will publish a multi-arch index with only the pulled platforms. Worker pods on a mismatched node arch will then fail with no match for platform in manifest. For arch-safe mirroring use crane (brew install crane); it copies all platforms without going through the host’s container runtime:
crane copy ghcr.io/superlinked/sie-server:${TAG}-${PLATFORM}-${BUNDLE} \
           private-registry.example.com/sie/sie-server:${TAG}-${PLATFORM}-${BUNDLE}

Then point the chart at your registry. Note workers.common.image.tag stays as the plain version; the chart appends -{platform}-{bundle} automatically:

# values-offline.yaml (continued)
gateway:
  image:
    repository: private-registry.example.com/sie/sie-gateway
    tag: v0.6.6

config:
  image:
    repository: private-registry.example.com/sie/sie-config
    tag: v0.6.6

workers:
  common:
    image:
      repository: private-registry.example.com/sie/sie-server
      tag: v0.6.6          # chart appends -${platform}-${bundle} at install time
    platform: cuda12       # or "cpu"
    bundle: default
    workerSidecar:
      image:
        repository: private-registry.example.com/sie/sie-server-sidecar
        tag: v0.6.6

global:
  imagePullSecrets:
    - name: regcred

If your registry needs auth, create the regcred Docker secret in the sie namespace before installing the chart:

kubectl create secret docker-registry regcred \
  --docker-server=private-registry.example.com \
  --docker-username=... \
  --docker-password=... \
  -n sie

5. Install and verify

Install the chart with the offline values, no internet egress required:

helm upgrade --install sie oci://ghcr.io/superlinked/charts/sie-cluster \
  --version 0.6.6 \
  -f values-offline.yaml \
  -n sie --create-namespace

If you also mirrored the chart itself (recommended for fully air-gapped), pull it once with helm pull oci://ghcr.io/superlinked/charts/sie-cluster --version 0.6.6 and install from the local .tgz:

helm pull oci://ghcr.io/superlinked/charts/sie-cluster --version 0.6.6
# Move sie-cluster-0.6.6.tgz onto the air-gapped workstation, then:
helm upgrade --install sie ./sie-cluster-0.6.6.tgz \
  -f values-offline.yaml \
  -n sie --create-namespace

Verify first inference. Install the SDK and run the GPU or CPU smoke test depending on your worker pool:

kubectl -n sie port-forward svc/sie-sie-cluster-gateway 8080:8080 &

# Install the Python SDK. Requires Python 3.12; see the SDK README for newer or older Python notes.
pip install sie-sdk

For a GPU worker pool (workers.common.platform: cuda12, workers.pools.l4.enabled: true):

python3 -c "
from sie_sdk import SIEClient

client = SIEClient('http://localhost:8080')
result = client.encode('BAAI/bge-m3', {'text': 'hello world'},
                       gpu='l4', wait_for_capacity=True, provision_timeout_s=600)
print(result['dense'].shape)  # (1024,)
"

For a CPU worker pool (workers.common.platform: cpu, workers.pools.cpu.enabled: true, useful for local clusters or small offline deployments without a GPU):

python3 -c "
from sie_sdk import SIEClient

client = SIEClient('http://localhost:8080')
result = client.encode('BAAI/bge-m3', {'text': 'hello world'},
                       gpu='cpu', wait_for_capacity=True, provision_timeout_s=600)
print(result['dense'].shape)  # (1024,)
"

The first request still pays the cold-start cost, but the weight load now comes from your private store rather than HuggingFace. CPU inference will be substantially slower than GPU for the same model.

Troubleshooting

Symptom	Likely cause
Worker pod stuck in `Init` with `403 Forbidden` from S3/GCS	IRSA/Workload Identity missing the bucket-read permission
`ImagePullBackOff` on a worker pod	Registry credentials missing, or `imagePullSecrets` not wired
Worker logs show `OSError: Couldn't reach huggingface.co`	`clusterCache` URL typo or bucket missing the requested model
Chart install hangs on dependency download	Sub-charts (KEDA, kube-prometheus-stack, DCGM) trying to fetch from public Artifact Hub. Use `helm pull` with `--untar` and install the local copy.

What’s Next

Kubernetes in GCP for the online quickstart this builds on
Kubernetes in AWS for the EKS counterpart
Config GitOps Workflow for managing model configs without redeploying the chart
Upgrade Runbook for rolling updates and rollback