Qwen/Qwen3.6-27B
Primitive: /generate · Generate ·
Qwen3 MoE
MultimodalTool callingConstrained outputStreamingCodeSQL
Overview
Hardware: — drives latency, throughput & cost
| Size | 27.0B params |
|---|---|
| Tasks | /generate |
| License | apache-2.0 |
| Latency | 1.7 s |
| Throughput | 222 tok/s |
| Cost | $3.80 /1M tok |
Cost is approximate — computed from list GPU prices; your actual price depends on the provider you deploy SIE with.
Generation
| Capabilities | Tool calling · Constrained output (JSON Schema, Regex) · Streaming · Code · SQL |
|---|---|
| Context length | 4,096 |
| Max output tokens | 4,096 |
Benchmarks
CaseHOLD
Quality
accuracy 0.6000
Performance RTX-PRO-6000 b1 c4
Throughput 146 tok/s
p50 latency 1.5s
GPQA Diamond
Quality
accuracy 0.3889
Performance RTX-PRO-6000 b1 c4
Throughput 219 tok/s
p50 latency 2.0s
MedQA
Quality
accuracy 0.6900
Performance RTX-PRO-6000 b1 c4
Throughput 225 tok/s
p50 latency 1.9s
MMLU-Pro
Quality
accuracy 0.6600
Performance RTX-PRO-6000 b1 c4
Throughput 230 tok/s
p50 latency 1.3s
Compare (0)Compare →