naver-clova-ix/donut-base-finetuned-docvqa

Donut model fine-tuned on DocVQA. It was introduced in the paper OCR-free Document Understanding Transformer by Geewok et al. and first released in this repository.

Overview

Architecture

Encoder-Decoder

Parameters

110M

Tasks

Extract

Outputs

text_regions

License

mit

View on HuggingFace →

Benchmarks

DocVQA

general kie en

Visual question answering on document images

Corpus: 5,188 Queries: 5,188

Quality

anls 0.6350

Performance L4-SPOT b1 c4

Performance L4 b1 c16

Reference →

Overview

Benchmarks

DocVQA

Open source inference for agents