Why did we open-source our inference engine? Read the post

← Home

Run the agent loop

Uses: /generate

Drive a full agent loop on /generate: an open LLM that plans, calls tools, and streams tokens — self-hosted, so the loop never leaves your cluster.

Featured models

Examples

Worked examples coming soon. In the meantime, browse all SIE examples →

Featured picks are still being finalized. Latency, throughput and cost are real where we've benchmarked the model on the selected GPU; "—" means no measurement there. Cost is approximate — computed from list GPU prices; your actual price depends on the provider you deploy SIE with.

Open source inference for agents

Open-source inference for the models behind your agents. Run it yourself, or let us run it for you.

Github 2.1K

Contact us

Tell us about your use case and we'll get back to you shortly.

Apply for an inference grant

Free capacity on our hosted cluster for selected projects. Tell us what you run and we reply by email.