Core Concepts#

Churro is easier to understand if you think about it as a document pipeline, not just an OCR call. The library takes an input source, turns it into one or more page objects, runs OCR on those page objects, and returns results that still preserve page-level structure.

If you keep that model in mind, the rest of the API becomes much simpler.

The Mental Model#

Most workflows follow the same shape:

Choose an OCR backend.
Start from an input source such as an image, a photographed spread, or a PDF.
Optionally detect page boundaries.
Work with one or more DocumentPage objects.
Attach OCR output to those pages.

In practice, the flow looks like this:

raw image or PDF
  -> optional page detection
  -> one or more DocumentPage objects
  -> OCR
  -> pages with text, model info, and metadata

The key idea is that Churro keeps the page object all the way through the pipeline. You do not lose the cropped page image, page ordering, or page-level metadata just because OCR has been run.

Start With The Shape Of Your Input#

The easiest way to choose an API is to ask what your input already looks like.

If your input looks like this	Start with	Why
One image already equals one page	`OCRClient`	no page detection is needed
One image may contain multiple pages	`DocumentOCRPipeline` or `DocumentPageDetector`	detect crops first, then OCR
A PDF	`DocumentOCRPipeline`	rasterization, page detection, and OCR are handled for you
You only want page crops, not text yet	`DocumentPageDetector`	detection-only workflow
You want fine control over provider setup	`OCRBackendSpec` + `build_ocr_backend(...)`	backend configuration stays explicit

This is the most important rule of thumb in the library: do not add page detection unless your input actually needs it.

The Main Building Blocks#

These are the public types most users need to understand.

Object	What it represents	When you use it
`OCRBackendSpec`	a declarative description of which provider and model to use	when configuring OCR backends
`build_ocr_backend(...)`	the factory that turns a spec into a runnable backend	right after choosing a provider
`OCRClient`	OCR for a single page image or a single `DocumentPage`	when each image is already one page
`DocumentPageDetector`	page detection without OCR	when you need crops only
`DocumentOCRPipeline`	page detection plus OCR in one workflow	when working with photographed spreads or PDFs
`DocumentPage`	the central page object passed through detection and OCR	almost everywhere

For most applications, DocumentPage is the type to pay attention to. The other APIs mainly exist to create, transform, or enrich DocumentPage objects.

`DocumentPage` Is The Core Object#

A DocumentPage is one page image plus whatever Churro knows about that page.

Before OCR, a page may only have:

an image
a page position
source metadata
crop information such as bbox or polygon

After OCR, the same page object can also have:

text
provider_name
model_name
ocr_metadata

That makes it easy to treat page detection and OCR as one continuous workflow instead of converting between unrelated result types.

from churro_ocr import DocumentPage, OCRClient
from churro_ocr.providers import OCRBackendSpec, build_ocr_backend

backend = build_ocr_backend(
    OCRBackendSpec(
        provider="litellm",
        model="vertex_ai/gemini-2.5-flash",
    )
)

page = DocumentPage.from_image_path("scan.png")
ocr_page = OCRClient(backend).ocr(page)

print(ocr_page.text)
print(ocr_page.provider_name)
print(ocr_page.model_name)

If your input is already one page per image, this is the simplest mental model: create or load a page, run OCR, then read the text from the returned page object.

How Detection And OCR Fit Together#

Churro separates page detection from OCR, but the two parts compose cleanly.

DocumentPageDetector answers: “What are the page crops in this source?”
OCRClient answers: “What text is on this one page?”
DocumentOCRPipeline answers: “Take this document-shaped input and do the whole thing.”

That means the library works well across very different input shapes:

scanned pages where each image is already clean and single-page
photographed book spreads where one image contains two visible pages
PDFs that must be rasterized before OCR

If you want concrete usage examples for each case, see OCR Workflows and Page Detection.

What The Result Types Mean#

The result containers are small, but they serve different purposes.

Type	What you get	Typical use
`DocumentPage`	one page image, with or without OCR attached	most page-level code
`OCRResult`	plain OCR output without the page image	backend-facing code or `DocumentOCRResult.as_ocr_results()`
`PageDetectionResult`	detected pages from one image or PDF	detection-only workflows
`DocumentOCRResult`	OCR output across all pages in a document workflow	PDFs, spreads, or batched page flows

OCRResult is the least user-facing type here. Most application code can stay at the DocumentPage or DocumentOCRResult level.

DocumentOCRResult is especially useful when you want both page structure and convenience helpers:

result.pages keeps the full page objects
result.texts() returns plain text per page
result.as_ocr_results() converts to lightweight OCR-only results

Understanding `page_index` And `source_index`#

These two fields are easy to confuse, but they capture different ideas.

page_index is the page position in the current output.
source_index is the index of the original source item that produced that page.

Examples make this clearer:

If scan.png is a single-page image, the page will usually have page_index=0 and source_index=0.
If spread.jpg contains two detected pages, the output pages may have page_index=0 and page_index=1, but both still came from the same source image, so both have source_index=0.
If a PDF has 10 pages and each PDF page becomes one detected page, the output pages will usually have matching page_index and source_index.
If one PDF page is split into multiple detected crops, those crops get different page_index values but share the same source_index because they came from the same original PDF page.

In short, page_index tells you where a page ended up in the output sequence. source_index tells you where it came from.

Understanding `metadata` And `ocr_metadata`#

Churro keeps caller-side and provider-side metadata separate on purpose.

metadata is your own metadata, or metadata produced during page detection.
ocr_metadata is metadata returned by the OCR provider for that page.

This separation matters because the two kinds of metadata usually have different meanings.

Examples of metadata:

a job ID you attach when submitting work
page detection hints
page ordering or dataset labels

Examples of ocr_metadata:

provider response fields
usage or timing information
model-specific OCR details

On document-level results, source_type tells you whether the document came from an "image" or a "pdf" workflow.

Sync And Async#

Every high-level sync entrypoint has an async equivalent.

Sync	Async
`ocr(...)`	`aocr(...)`
`ocr_image(...)`	`aocr_image(...)`
`detect_image_sync(...)`	`detect_image(...)`
`process_image_sync(...)`	`process_image(...)`
`process_pdf_sync(...)`	`process_pdf(...)`

The default choice for most users is still the sync API. Use the async forms when:

you are already inside an async application
you want to coordinate OCR with other async work
you want to manage concurrency explicitly

If you use DocumentOCRPipeline, the max_concurrency setting controls how many page OCR jobs run at once inside that pipeline.

Practical Rules Of Thumb#

Use OCRClient when each input image is already a page.
Use DocumentOCRPipeline for PDFs and photographed spreads.
Use DocumentPageDetector when you want crops without OCR.
Build your backend once and reuse it across calls.
Pass exactly one of image or image_path when an API accepts both.

When you want concrete recipes, continue with OCR Workflows. When you need exact signatures and fields, use the API Reference.