---
title: CrewAI
description: Use SIE sparse embeddings in CrewAI hybrid search workflows.
canonical_url: https://superlinked.com/docs/integrations/crewai
last_updated: 2026-05-20
---

The `sie-crewai` package provides CrewAI tools and embedders: `SIERerankerTool` for reranking, `SIEExtractorTool` for extraction (entities, relations, classifications, and object detection), and `SIESparseEmbedder` for hybrid search.

## Installation

```bash
pip install sie-crewai
```

This installs `sie-sdk` and `crewai` as dependencies.

## Start the Server

Source: [packages/sie_server/src/sie_server/cli.py](https://github.com/superlinked/sie/blob/main/packages/sie_server/src/sie_server/cli.py)

```bash
# Docker (recommended)
docker run -p 8080:8080 ghcr.io/superlinked/sie-server:latest-cpu-default

# Or with GPU
docker run --gpus all -p 8080:8080 ghcr.io/superlinked/sie-server:latest-cuda12-default
```

## Embedders

Source: [integrations/sie_crewai/src/sie_crewai/embedders.py](https://github.com/superlinked/sie/blob/main/integrations/sie_crewai/src/sie_crewai/embedders.py)

SIE integrates with CrewAI through two embedding approaches:

1. **Dense embeddings** - Use SIE's OpenAI-compatible API with CrewAI's built-in embedder config
2. **Sparse embeddings** - Use `SIESparseEmbedder` for hybrid search workflows

### Dense Embeddings

Configure CrewAI to use SIE's OpenAI-compatible endpoint:

```python
from crewai import Crew

crew = Crew(
    agents=[...],
    tasks=[...],
    embedder={
        "provider": "openai",
        "config": {
            "api_base": "http://localhost:8080/v1",
            "model": "BAAI/bge-m3"
        }
    }
)
```

### Sparse Embeddings

Use `SIESparseEmbedder` for sparse vectors in hybrid search:

```python
from sie_crewai import SIESparseEmbedder

sparse_embedder = SIESparseEmbedder(
    base_url="http://localhost:8080",
    model="BAAI/bge-m3"
)

# Embed documents
sparse_vectors = sparse_embedder.embed_documents([
    "Machine learning uses algorithms to learn from data.",
    "The weather is sunny today."
])
print(sparse_vectors[0].keys())  # dict_keys(['indices', 'values'])

# Embed a query (uses is_query=True for asymmetric models)
query_vector = sparse_embedder.embed_query("What is machine learning?")
```

## Full Example

Source: [integrations/sie_crewai/src/sie_crewai/embedders.py](https://github.com/superlinked/sie/blob/main/integrations/sie_crewai/src/sie_crewai/embedders.py)

Complete example using SIE embeddings with a CrewAI agent for hybrid search:

```python
from crewai import Agent, Crew, Task
from sie_crewai import SIESparseEmbedder

# 1. Configure dense embeddings via OpenAI-compatible API
embedder_config = {
    "provider": "openai",
    "config": {
        "api_base": "http://localhost:8080/v1",
        "model": "BAAI/bge-m3"
    }
}

# 2. Set up sparse embedder for hybrid search
sparse_embedder = SIESparseEmbedder(
    base_url="http://localhost:8080",
    model="BAAI/bge-m3"
)

# 3. Prepare your corpus with both dense and sparse embeddings
corpus = [
    "Machine learning is a branch of artificial intelligence.",
    "Neural networks are inspired by biological neurons.",
    "Deep learning uses multiple layers of neural networks.",
]

# Get sparse embeddings for your vector database
sparse_vectors = sparse_embedder.embed_documents(corpus)
# Store sparse_vectors in your vector DB (Qdrant, Weaviate, etc.)

# 4. Create a research agent
researcher = Agent(
    role="Research Analyst",
    goal="Find and analyze information from the knowledge base",
    backstory="Expert at finding relevant information using semantic search.",
    verbose=True
)

# 5. Define the research task
research_task = Task(
    description="Search the knowledge base for information about deep learning.",
    expected_output="A summary of findings about deep learning.",
    agent=researcher
)

# 6. Create and run the crew
crew = Crew(
    agents=[researcher],
    tasks=[research_task],
    embedder=embedder_config,
    verbose=True
)

result = crew.kickoff()
print(result)
```

## Reranker Tool

Source: [integrations/sie_crewai/src/sie_crewai/tools.py](https://github.com/superlinked/sie/blob/main/integrations/sie_crewai/src/sie_crewai/tools.py)

`SIERerankerTool` is a CrewAI `BaseTool` that reranks documents by relevance to a query. Agents can use it to improve search quality.

```python
from crewai import Agent, Crew, Task
from sie_crewai import SIERerankerTool

reranker = SIERerankerTool(
    base_url="http://localhost:8080",
    model="jinaai/jina-reranker-v2-base-multilingual",
)

researcher = Agent(
    role="Research Analyst",
    goal="Find the most relevant information",
    tools=[reranker],
)

task = Task(
    description="Rerank these documents for the query 'What is deep learning?'",
    expected_output="The most relevant documents.",
    agent=researcher,
)

crew = Crew(agents=[researcher], tasks=[task])
result = crew.kickoff()
```

## Extractor Tool

Source: [integrations/sie_crewai/src/sie_crewai/tools.py](https://github.com/superlinked/sie/blob/main/integrations/sie_crewai/src/sie_crewai/tools.py)

`SIEExtractorTool` is a CrewAI `BaseTool` that extracts structured data from text. It supports all extraction types: entities (GLiNER), relations (GLiREL), classifications (GLiClass), and object detection (GroundingDINO/OWL-v2). The `_run()` method formats all 4 types in the string output with separate sections for entities, relations, classifications, and objects.

### Entity Extraction

```python
from crewai import Agent, Crew, Task
from sie_crewai import SIEExtractorTool

extractor = SIEExtractorTool(
    base_url="http://localhost:8080",
    model="urchade/gliner_multi-v2.1",
    labels=["person", "organization", "location"],
)

analyst = Agent(
    role="Data Analyst",
    goal="Extract key entities from documents",
    tools=[extractor],
)

task = Task(
    description="Extract all people, organizations, and locations from: 'Tim Cook announced new products at Apple Park in Cupertino.'",
    expected_output="A list of extracted entities.",
    agent=analyst,
)

crew = Crew(agents=[analyst], tasks=[task])
result = crew.kickoff()
```

### Relation Extraction

Extract relationships between entities using GLiREL:

```python
from sie_crewai import SIEExtractorTool

extractor = SIEExtractorTool(
    base_url="http://localhost:8080",
    model="jackboyla/glirel-large-v0",
    labels=["works_for", "ceo_of", "founded"],
)

# Use with an agent, or call directly:
result = extractor._run("Tim Cook is the CEO of Apple Inc.")
print(result)
# Relations:
# Tim Cook --ceo_of--> Apple Inc. (score: 0.92)
```

### Text Classification

Classify text into categories using GLiClass:

```python
from sie_crewai import SIEExtractorTool

extractor = SIEExtractorTool(
    base_url="http://localhost:8080",
    model="knowledgator/gliclass-base-v1.0",
    labels=["positive", "negative", "neutral"],
)

result = extractor._run("I absolutely loved this movie! The acting was superb.")
print(result)
# Classifications:
# positive (score: 0.94)
# neutral (score: 0.04)
# negative (score: 0.02)
```

## Configuration Options

### SIESparseEmbedder

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `base_url` | `str` | `http://localhost:8080` | SIE server URL |
| `model` | `str` | `BAAI/bge-m3` | Model to use for sparse embeddings |
| `gpu` | `str` | `None` | Target GPU type for routing |
| `options` | `dict` | `None` | Model-specific options |
| `timeout_s` | `float` | `180.0` | Request timeout in seconds |

### SIERerankerTool

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `base_url` | `str` | `http://localhost:8080` | SIE server URL |
| `model` | `str` | `jinaai/jina-reranker-v2-base-multilingual` | Reranker model |
| `gpu` | `str` | `None` | Target GPU type for routing |
| `options` | `dict` | `None` | Model-specific options |
| `timeout_s` | `float` | `180.0` | Request timeout in seconds |

### SIEExtractorTool

The extraction model determines which result types are included in the output. Use GLiNER models for entities, GLiREL for relations, GLiClass for classifications, and GroundingDINO/OWL-v2 for object detection.

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `base_url` | `str` | `http://localhost:8080` | SIE server URL |
| `model` | `str` | `urchade/gliner_multi-v2.1` | Extraction model (GLiNER, GLiREL, GLiClass, GroundingDINO, OWL-v2) |
| `labels` | `list[str]` | `["person", "organization", "location"]` | Labels for extraction (entity types, relation types, or classification categories) |
| `gpu` | `str` | `None` | Target GPU type for routing |
| `options` | `dict` | `None` | Model-specific options |
| `timeout_s` | `float` | `180.0` | Request timeout in seconds |

## What's Next

- [Encode Text](/docs/encode/) - dense and sparse embedding details
- [Score / Rerank](/docs/score/) - reranking details
- [Extract](/docs/extract/) - extraction details (NER, relations, classification, vision)
- [Model Catalog](/models) - all supported models
- [Troubleshooting](/docs/reference/troubleshooting/) - common errors and solutions