`churro_ocr.ocr`#

Public OCR interfaces.

class churro_ocr.ocr.OCRResult[source]#

Bases: object

Provider-agnostic OCR result.

Parameters:

text – OCR text after any backend-specific postprocessing.
provider_name – Stable provider identifier attached to the result.
model_name – Human-readable model name attached to the result.
metadata – Provider-returned metadata for this OCR call.

__init__(text, provider_name, model_name, metadata=<factory>)#

Parameters:

text (str)
provider_name (str)
model_name (str)
metadata (MetadataDict)

Return type:

None

class churro_ocr.ocr.OCRBackend[source]#

Bases: Protocol

Async OCR backend interface.

async ocr(page)[source]#

Run OCR for one page.

Parameters:: page (DocumentPage) – Page image and page metadata to transcribe.
Returns:: Provider-agnostic OCR result for the page.
Return type:: OCRResult

__init__(*args, **kwargs)#

class churro_ocr.ocr.BatchOCRBackend[source]#

Bases: Protocol

Async batch OCR backend interface.

async ocr_batch(pages)[source]#

Run OCR for multiple pages in one batch.

Parameters:: pages (list[DocumentPage]) – Pages to transcribe in batch order.
Returns:: OCR results in the same order as pages.
Return type:: list[OCRResult]

__init__(*args, **kwargs)#

churro_ocr.ocr.prepare_ocr_page(page)[source]#

Return a page copy with the shared OCR image preprocessing applied.

Parameters:: page (DocumentPage) – Page to preprocess for OCR.
Returns:: Copy of page with its image replaced by the preprocessed image.
Return type:: DocumentPage

class churro_ocr.ocr.OCRClient[source]#

Bases: object

User-facing OCR client with page-first sync and async entrypoints.

Create an OCR client.

Parameters:: backend – OCR backend or async callable used for page OCR.

__init__(backend)[source]#

Create an OCR client.

Parameters:: backend (OCRBackend | Callable[[DocumentPage], Awaitable[OCRResult]]) – OCR backend or async callable used for page OCR.
Return type:: None

async aocr(page)[source]#

Run OCR asynchronously for one page.

Parameters:: page (DocumentPage) – Page to transcribe.
Returns:: Copy of page with OCR output attached.
Return type:: DocumentPage

ocr(page)[source]#

Run OCR synchronously for one page.

Parameters:: page (DocumentPage) – Page to transcribe.
Returns:: Copy of page with OCR output attached.
Return type:: DocumentPage

async aocr_image(*, image=None, image_path=None, page_index=0, source_index=0, metadata=None)[source]#

Create a single page from an image input and OCR it.

Parameters:

image (Image.Image | None) – In-memory page image. Mutually exclusive with image_path.
image_path (str | Path | None) – Path to a page image on disk. Mutually exclusive with image.
page_index (int) – Page position to attach to the generated page.
source_index (int) – Original source index to attach to the generated page.
metadata (MetadataDict | None) – Optional caller-side metadata attached before OCR runs.

Returns:

OCR-enriched page object.

Raises:

ConfigurationError – If both or neither of image and image_path are provided.

Return type:

DocumentPage

ocr_image(*, image=None, image_path=None, page_index=0, source_index=0, metadata=None)[source]#

Create a single page from an image input and OCR it synchronously.

Parameters:

image (Image.Image | None) – In-memory page image. Mutually exclusive with image_path.
image_path (str | Path | None) – Path to a page image on disk. Mutually exclusive with image.
page_index (int) – Page position to attach to the generated page.
source_index (int) – Original source index to attach to the generated page.
metadata (MetadataDict | None) – Optional caller-side metadata attached before OCR runs.

Returns:

OCR-enriched page object.

Raises:

ConfigurationError – If both or neither of image and image_path are provided.

Return type:

DocumentPage

churro_ocr.ocr#

`churro_ocr.ocr`#