CHURRO logo CHURRO#

CHURRO is an OCR toolkit for historical document transcription, built to make handwritten and printed sources readable at high accuracy and lower cost.

It works with all major OCR proividers and vision-language models, and provides first-party support for the CHURRO 3B model and CHURRO-DS dataset.

Model Dataset Paper Docs Leaderboard GitHub Stars

  • CHURRO 3B exceeds the accuracy of Gemini 2.5 Pro at 15.5x lower cost.

  • CHURRO-DS contains ~100K pages from 155 historical collections spanning 22 centuries and 46 language clusters.

Cost vs Performance comparison showing CHURRO's accuracy advantage at significantly lower cost
Cost vs. accuracy: CHURRO (3B) achieves higher accuracy than much larger commercial and open-weight VLMs while being substantially cheaper.

Quick Try#

pip install "churro-ocr[hf]"
churro-ocr transcribe --image scan.png --backend hf --model stanford-oval/churro-3B

For more in-depth information, see the Getting Started guide.