Core Concepts#
Churro is easier to understand if you think about it as a document pipeline, not just an OCR call. The library takes an input source, turns it into one or more page objects, runs OCR on those page objects, and returns results that still preserve page-level structure.
If you keep that model in mind, the rest of the API becomes much simpler.
The Mental Model#
Most workflows follow the same shape:
Choose an OCR backend.
Start from an input source such as an image, a photographed spread, or a PDF.
Optionally detect page boundaries.
Work with one or more
DocumentPageobjects.Attach OCR output to those pages.
In practice, the flow looks like this:
raw image or PDF
-> optional page detection
-> one or more DocumentPage objects
-> OCR
-> pages with text, model info, and metadata
The key idea is that Churro keeps the page object all the way through the pipeline. You do not lose the cropped page image, page ordering, or page-level metadata just because OCR has been run.
Start With The Shape Of Your Input#
The easiest way to choose an API is to ask what your input already looks like.
If your input looks like this |
Start with |
Why |
|---|---|---|
One image already equals one page |
|
no page detection is needed |
One image may contain multiple pages |
|
detect crops first, then OCR |
A PDF |
|
rasterization, page detection, and OCR are handled for you |
You only want page crops, not text yet |
|
detection-only workflow |
You want fine control over provider setup |
|
backend configuration stays explicit |
This is the most important rule of thumb in the library: do not add page detection unless your input actually needs it.
The Main Building Blocks#
These are the public types most users need to understand.
Object |
What it represents |
When you use it |
|---|---|---|
|
a declarative description of which provider and model to use |
when configuring OCR backends |
|
the factory that turns a spec into a runnable backend |
right after choosing a provider |
|
OCR for a single page image or a single |
when each image is already one page |
|
page detection without OCR |
when you need crops only |
|
page detection plus OCR in one workflow |
when working with photographed spreads or PDFs |
|
the central page object passed through detection and OCR |
almost everywhere |
For most applications, DocumentPage is the type to pay attention to. The other APIs mainly exist to create, transform, or enrich DocumentPage objects.
DocumentPage Is The Core Object#
A DocumentPage is one page image plus whatever Churro knows about that page.
Before OCR, a page may only have:
an image
a page position
source metadata
crop information such as
bboxorpolygon
After OCR, the same page object can also have:
textprovider_namemodel_nameocr_metadata
That makes it easy to treat page detection and OCR as one continuous workflow instead of converting between unrelated result types.
from churro_ocr import DocumentPage, OCRClient
from churro_ocr.providers import OCRBackendSpec, build_ocr_backend
backend = build_ocr_backend(
OCRBackendSpec(
provider="litellm",
model="vertex_ai/gemini-2.5-flash",
)
)
page = DocumentPage.from_image_path("scan.png")
ocr_page = OCRClient(backend).ocr(page)
print(ocr_page.text)
print(ocr_page.provider_name)
print(ocr_page.model_name)
If your input is already one page per image, this is the simplest mental model: create or load a page, run OCR, then read the text from the returned page object.
How Detection And OCR Fit Together#
Churro separates page detection from OCR, but the two parts compose cleanly.
DocumentPageDetectoranswers: “What are the page crops in this source?”OCRClientanswers: “What text is on this one page?”DocumentOCRPipelineanswers: “Take this document-shaped input and do the whole thing.”
That means the library works well across very different input shapes:
scanned pages where each image is already clean and single-page
photographed book spreads where one image contains two visible pages
PDFs that must be rasterized before OCR
If you want concrete usage examples for each case, see OCR Workflows and Page Detection.
What The Result Types Mean#
The result containers are small, but they serve different purposes.
Type |
What you get |
Typical use |
|---|---|---|
|
one page image, with or without OCR attached |
most page-level code |
|
plain OCR output without the page image |
backend-facing code or |
|
detected pages from one image or PDF |
detection-only workflows |
|
OCR output across all pages in a document workflow |
PDFs, spreads, or batched page flows |
OCRResult is the least user-facing type here. Most application code can stay at the DocumentPage or DocumentOCRResult level.
DocumentOCRResult is especially useful when you want both page structure and convenience helpers:
result.pageskeeps the full page objectsresult.texts()returns plain text per pageresult.as_ocr_results()converts to lightweight OCR-only results
Understanding page_index And source_index#
These two fields are easy to confuse, but they capture different ideas.
page_indexis the page position in the current output.source_indexis the index of the original source item that produced that page.
Examples make this clearer:
If
scan.pngis a single-page image, the page will usually havepage_index=0andsource_index=0.If
spread.jpgcontains two detected pages, the output pages may havepage_index=0andpage_index=1, but both still came from the same source image, so both havesource_index=0.If a PDF has 10 pages and each PDF page becomes one detected page, the output pages will usually have matching
page_indexandsource_index.If one PDF page is split into multiple detected crops, those crops get different
page_indexvalues but share the samesource_indexbecause they came from the same original PDF page.
In short, page_index tells you where a page ended up in the output sequence. source_index tells you where it came from.
Understanding metadata And ocr_metadata#
Churro keeps caller-side and provider-side metadata separate on purpose.
metadatais your own metadata, or metadata produced during page detection.ocr_metadatais metadata returned by the OCR provider for that page.
This separation matters because the two kinds of metadata usually have different meanings.
Examples of metadata:
a job ID you attach when submitting work
page detection hints
page ordering or dataset labels
Examples of ocr_metadata:
provider response fields
usage or timing information
model-specific OCR details
On document-level results, source_type tells you whether the document came from an "image" or a "pdf" workflow.
Sync And Async#
Every high-level sync entrypoint has an async equivalent.
Sync |
Async |
|---|---|
|
|
|
|
|
|
|
|
|
|
The default choice for most users is still the sync API. Use the async forms when:
you are already inside an async application
you want to coordinate OCR with other async work
you want to manage concurrency explicitly
If you use DocumentOCRPipeline, the max_concurrency setting controls how many page OCR jobs run at once inside that pipeline.
Practical Rules Of Thumb#
Use
OCRClientwhen each input image is already a page.Use
DocumentOCRPipelinefor PDFs and photographed spreads.Use
DocumentPageDetectorwhen you want crops without OCR.Build your backend once and reuse it across calls.
Pass exactly one of
imageorimage_pathwhen an API accepts both.
When you want concrete recipes, continue with OCR Workflows. When you need exact signatures and fields, use the API Reference.