Skip to content
Why did we open-source our inference engine? Read the post

Self-hosted product search in 5 min

A complete product search demo you can clone and run on a laptop in five minutes. Type wireless bluetooth headphones and get ranked Amazon products back with extracted brand, color, and material filters. The whole pipeline (zero-shot attribute extraction, dense embeddings, cross-encoder reranking) runs through one local SIE server with three SDK calls. No vector DB to provision, no separate reranker service, no hand-rolled regex for attributes.

E-Commerce Product Search: extract, encode, score on a single SIE cluster

Built by Vipul Maheshwari.

Once the server is running at http://localhost:8888, open it in your browser and try:

QueryFilterWhat it exercises
wireless bluetooth headphonesAll ElectronicsDense retrieval plus cross-encoder rerank
lightweight waterproof hiking bootsnoneMulti-attribute semantic match
gold jewelry for womennoneStyle and material phrasing, rerank matters
ceramic coffee mugAmazon HomeCategory filter applied to extracted attributes
power drill cordlessbrand=DEWALTFilter on an attribute that was extracted zero-shot

Each query flows through the full pipeline: encode() the query, vector search against the saved matrix, filter by extracted attributes, score() rerank with a cross-encoder, return the top 10.

You need Docker, Python 3.12, and roughly 3 GB of disk for the model weights. No API keys, no signup, no cluster.

# 1. Start a local SIE server on CPU.
docker run -p 8080:8080 ghcr.io/superlinked/sie-server:latest-cpu-default
# With an NVIDIA GPU:
# docker run --gpus all -p 8080:8080 ghcr.io/superlinked/sie-server:latest-cuda12-default
# 2. In another terminal: clone, install, fetch data.
git clone https://github.com/superlinked/sie
cd sie/examples/ecommerce-product-search
pip install -r python/requirements.txt
python data/fetch_dataset.py
# 3. Build the index and start the search server.
python python/ingest.py
uvicorn --app-dir python server:app --port 8888

Open http://localhost:8888 and start searching.

export SIE_CLUSTER_URL="https://your-cluster-url"
export SIE_API_KEY="your-api-key"

Everything else stays identical. The defaults in config.yaml point at http://localhost:8080 so env vars are only needed when you are hitting something remote.

Same flow, Node 22+ and pnpm required. See the TypeScript README for the Express version.

Ingest builds a searchable index once:

  1. extract() pulls structured attributes (brand, color, material, size, product_type) from raw product descriptions with urchade/gliner_multi-v2.1. Zero-shot, no training data, no custom schema work.
  2. encode() embeds every product’s title and description into 1,024-dim dense vectors with NovaSearch/stella_en_400M_v5.

The index is just two files on disk (data/embeddings.npy and data/metadata.json).

Search runs this on every incoming query:

  1. encode() the query with a query-side prefix (Stella is asymmetric).
  2. Cosine similarity against the in-memory matrix gives the top 100 candidates.
  3. Filter by extracted attributes if the request specifies any.
  4. score() reranks the candidates with the BAAI/bge-reranker-v2-m3 cross-encoder and returns the top 10.

Three model families, three roles, one endpoint. No per-model container to ship, no GPU scheduling to babysit, no separate reranker service.

Both backends expose the same three endpoints.

ParameterRequiredDescription
qyesSearch query
categorynoCategory filter (matches against the product category)
brandnoBrand filter (matches against the extracted brand attribute)
curl "http://localhost:8888/api/search?q=waterproof+boots&brand=Columbia"

Response:

{
"query": "waterproof boots",
"filters": { "brand": "Columbia" },
"results": [
{
"id": "<product-id>",
"title": "<product-title>",
"description": "<truncated-product-description>",
"category": "AMAZON FASHION",
"price": "<number-or-null>",
"rating": "<number-or-null>",
"image": "<image-url-or-null>",
"features": ["<feature-text>", "<feature-text>"],
"attributes": {
"brand": "Columbia",
"color": "<extracted-color>"
},
"scores": {
"vector": "<similarity-score>",
"rerank": "<rerank-score>"
}
}
]
}

Returns category counts as [name, count] tuples (feeds the dropdown in the UI).

Returns index size, embedding dims, and the three model names.

SymptomFix
Connection refused on port 8080docker run is not running or crashed. Check docker ps.
Ingest seems stuckFirst run is downloading ~3 GB of model weights. Check docker logs for progress.
Port 8888 already in usePick another: uvicorn --app-dir python server:app --port 9000
Filter returns 0 matchesServer logs a warning and falls back to unfiltered top-k so the UI still shows something.
On CPU, ingest is too slowEdit config.yaml, set dataset.sample_size: 300.

Both implementations read the same config.yaml. The defaults work out of the box for local Docker. Tune only when you need to.

cluster:
url: "http://localhost:8080"
api_key: ""
gpu: "" # set to "l4-spot" or similar when targeting a managed multi-GPU cluster
provision_timeout_s: 600
models:
embedding: "NovaSearch/stella_en_400M_v5"
reranker: "BAAI/bge-reranker-v2-m3"
extractor: "urchade/gliner_multi-v2.1"
extract_labels:
- brand
- color
- material
- size
- product_type
search:
top_k_candidates: 100
top_k_results: 10
ingest:
batch_size_extraction: 8
batch_size_encoding: 32
confidence_threshold: 0.5
junk_text_max_len: 15
dataset:
name: "milistu/AMAZON-Products-2023"
sample_size: 3000
min_description_length: 100

Swap any of the three models for another one in the SIE catalog and the pipeline keeps working. That is the point.

examples/ecommerce-product-search/
├── config.yaml
├── data/
│ ├── fetch_dataset.py
│ └── products.json # generated by fetch_dataset.py
├── python/
│ ├── ingest.py
│ ├── search.py
│ ├── server.py
│ └── requirements.txt
├── static/
│ └── index.html # shared browser UI
└── typescript/
├── package.json
└── src/
├── ingest.ts
├── search.ts
└── server.ts

By Vipul Maheshwari.

Contact us

Tell us about your use case and we'll get back to you shortly.