Why did we open-source our inference engine? Read the post

Qwen/Qwen3-VL-Embedding-2B

The Qwen3-VL-Embedding and Qwen3-VL-Reranker model series are the latest additions to the Qwen family, built upon the recently open-sourced and powerful Qwen3-VL foundation model.

Overview

Architecture
qwen3_vl
Parameters
2.1B
Tasks
Encode
Outputs
Dense
Dimensions
Dense: 2,048
Max Sequence Length
32,768 tokens
License
apache-2.0

Benchmarks

FiQA2018

finance retrieval en

Financial opinion mining and question answering

Corpus: 57,599 Queries: 648
Performance L4 b1 c4
Corpus 494 tok/s
Corpus p50 35.9ms
Reference →

Flickr30kI2TRetrieval

general retrieval en

Image-to-text retrieval: retrieve captions from images

Corpus: 31,783 Queries: 1,000
Quality
ndcg at 10 0.8751
map at 10 0.8017
mrr at 10 0.9653
Performance L4 b1 c4
Corpus 494 tok/s
Corpus p50 35.9ms
Query 0.0 mpix/s
Query p50 4.3s
Reference →

Open source inference for agents

Open-source inference for the models behind your agents. Run it yourself, or let us run it for you.

Github 2.1K

Contact us

Tell us about your use case and we'll get back to you shortly.

Apply for an inference grant

Free capacity on our hosted cluster for selected projects. Tell us what you run and we reply by email.