churro_ocr.ocr#
Public OCR interfaces.
- class churro_ocr.ocr.OCRResult[source]#
Bases:
objectProvider-agnostic OCR result.
- Parameters:
text – OCR text after any backend-specific postprocessing.
provider_name – Stable provider identifier attached to the result.
model_name – Human-readable model name attached to the result.
metadata – Provider-returned metadata for this OCR call.
- class churro_ocr.ocr.OCRBackend[source]#
Bases:
ProtocolAsync OCR backend interface.
- async ocr(page)[source]#
Run OCR for one page.
- Parameters:
page (DocumentPage) – Page image and page metadata to transcribe.
- Returns:
Provider-agnostic OCR result for the page.
- Return type:
- __init__(*args, **kwargs)#
- class churro_ocr.ocr.BatchOCRBackend[source]#
Bases:
ProtocolAsync batch OCR backend interface.
- async ocr_batch(pages)[source]#
Run OCR for multiple pages in one batch.
- Parameters:
pages (list[DocumentPage]) – Pages to transcribe in batch order.
- Returns:
OCR results in the same order as
pages.- Return type:
- __init__(*args, **kwargs)#
- churro_ocr.ocr.prepare_ocr_page(page)[source]#
Return a page copy with the shared OCR image preprocessing applied.
- Parameters:
page (DocumentPage) – Page to preprocess for OCR.
- Returns:
Copy of
pagewith its image replaced by the preprocessed image.- Return type:
- class churro_ocr.ocr.OCRClient[source]#
Bases:
objectUser-facing OCR client with page-first sync and async entrypoints.
Create an OCR client.
- Parameters:
backend – OCR backend or async callable used for page OCR.
- __init__(backend)[source]#
Create an OCR client.
- Parameters:
backend (OCRBackend | Callable[[DocumentPage], Awaitable[OCRResult]]) – OCR backend or async callable used for page OCR.
- Return type:
None
- async aocr(page)[source]#
Run OCR asynchronously for one page.
- Parameters:
page (DocumentPage) – Page to transcribe.
- Returns:
Copy of
pagewith OCR output attached.- Return type:
- ocr(page)[source]#
Run OCR synchronously for one page.
- Parameters:
page (DocumentPage) – Page to transcribe.
- Returns:
Copy of
pagewith OCR output attached.- Return type:
- async aocr_image(*, image=None, image_path=None, page_index=0, source_index=0, metadata=None)[source]#
Create a single page from an image input and OCR it.
- Parameters:
image (Image | None) – In-memory page image. Mutually exclusive with
image_path.image_path (str | Path | None) – Path to a page image on disk. Mutually exclusive with
image.page_index (int) – Page position to attach to the generated page.
source_index (int) – Original source index to attach to the generated page.
metadata (dict[str, Any] | None) – Optional caller-side metadata attached before OCR runs.
- Returns:
OCR-enriched page object.
- Raises:
ConfigurationError – If both or neither of
imageandimage_pathare provided.- Return type:
- ocr_image(*, image=None, image_path=None, page_index=0, source_index=0, metadata=None)[source]#
Create a single page from an image input and OCR it synchronously.
- Parameters:
image (Image | None) – In-memory page image. Mutually exclusive with
image_path.image_path (str | Path | None) – Path to a page image on disk. Mutually exclusive with
image.page_index (int) – Page position to attach to the generated page.
source_index (int) – Original source index to attach to the generated page.
metadata (dict[str, Any] | None) – Optional caller-side metadata attached before OCR runs.
- Returns:
OCR-enriched page object.
- Raises:
ConfigurationError – If both or neither of
imageandimage_pathare provided.- Return type: