Why did we open-source our inference engine? Read the post

← Catalog

ibm-granite/granite-guardian-3.0-2b

Open comparison →

Primitive: /generate · Generate · Granite

Long contextStreamingGuard

Overview

Hardware: — drives latency, throughput & cost

Size2.5B params
Tasks /generate
Licenseapache-2.0
Latency
Throughput
Cost /1M tok

Cost is approximate — computed from list GPU prices; your actual price depends on the provider you deploy SIE with.

Generation

CapabilitiesStreaming · Guard
Context length8,192
Max output tokens512

Benchmarks

ToxicChat

safety generation en

Quality
guard F1 0.2780

Open source inference for agents

Open-source inference for the models behind your agents. Run it yourself, or let us run it for you.

Github 2.1K

Contact us

Tell us about your use case and we'll get back to you shortly.

Apply for an inference grant

Free capacity on our hosted cluster for selected projects. Tell us what you run and we reply by email.