Run the agent loop

Uses: /generate

Drive a full agent loop on /generate: an open LLM that plans, calls tools, and streams tokens — self-hosted, so the loop never leaves your cluster.

Featured models

Generate · `/generate`

	Model	Size	Quality	Latency	Throughput	Cost $/1M
	Qwen/Qwen3.6-27B MultimodalTool callingConstrained outputStreamingCodeSQL	27.0B	0.6000acc	1.7 s	222 tok/s	$3.80
	Qwen/Qwen3-4B-Instruct-2507 Long contextTool callingConstrained outputStreamingCodeSQL	4.0B	0.6033acc	576 ms	472 tok/s	$1.78
	Qwen/Qwen3-0.6B Streaming	600M	0.4600acc	413 ms	595 tok/s	$1.41
No models match.

Measured on RTX-PRO-6000; other hardware shows "—" until benchmarked. Pick a benchmark to rank by quality.

For similar models, browse the full /generate catalog →

Examples

Worked examples coming soon. In the meantime, browse all SIE examples →

Featured picks are still being finalized. Latency, throughput and cost are real where we've benchmarked the model on the selected GPU; "—" means no measurement there. Cost is approximate — computed from list GPU prices; your actual price depends on the provider you deploy SIE with.

Run the agent loop

Featured models

Generate · /generate

Examples

Open source inference for agents

Generate · `/generate`