churro_ocr.page_detection#
Public page detection interfaces.
- class churro_ocr.page_detection.PageCandidate[source]#
Bases:
objectIntermediate page candidate returned by a page detector.
- Parameters:
bbox – Bounding box in source-image coordinates.
image – Optional already-cropped page image. When provided, detection callers use this image directly instead of cropping from
bboxorpolygon.polygon – Optional polygon in source-image coordinates.
metadata – Detector-side metadata attached to the candidate.
- class churro_ocr.page_detection.DocumentPage[source]#
Bases:
objectA document page image with optional OCR output attached.
- Parameters:
page_index – Page position in the current output sequence.
image – Page image.
source_index – Index of the original source item that produced the page.
bbox – Bounding box in source-image coordinates when available.
polygon – Polygon in source-image coordinates when available.
metadata – Caller-side or detector-side metadata for the page.
text – OCR text attached to the page when OCR has been run.
provider_name – Provider identifier attached by OCR.
model_name – Model name attached by OCR.
ocr_metadata – Provider-returned OCR metadata for this page.
- classmethod from_image(image, *, page_index=0, source_index=0, metadata=None)[source]#
Create a document page from an in-memory image.
- Parameters:
- Returns:
New page object with a copied image.
- Return type:
- classmethod from_image_path(path, *, page_index=0, source_index=0, metadata=None)[source]#
Create a document page from an image path.
- Parameters:
- Returns:
New page object loaded from
path.- Return type:
- with_ocr(*, text, provider_name, model_name, ocr_metadata=None)[source]#
Return a copy of the page with OCR output attached.
- __init__(page_index, image, source_index, bbox=None, polygon=(), metadata=<factory>, text=None, provider_name=None, model_name=None, ocr_metadata=<factory>)#
- class churro_ocr.page_detection.PageDetectionRequest[source]#
Bases:
objectRequest payload for image page detection.
- Parameters:
image – In-memory image to detect pages from. Mutually exclusive with
image_path.image_path – Path to an image on disk. Mutually exclusive with
image.trim_margin – Margin in pixels to add around detected crops.
- require_image()[source]#
Return the input image, loading it from disk when needed.
- Returns:
Copy of the requested image.
- Raises:
ConfigurationError – If both or neither of
imageandimage_pathare provided.- Return type:
- class churro_ocr.page_detection.PageDetectionResult[source]#
Bases:
objectPage detection output for an image or PDF.
- Parameters:
pages – Detected pages in output order.
source_type – Input source type, typically
"image"or"pdf".metadata – Detection-level metadata, such as PDF rasterization settings.
- class churro_ocr.page_detection.PageDetectionBackend[source]#
Bases:
ProtocolAsync interface for page detection.
- async detect(image)[source]#
Detect page candidates from one image.
- Parameters:
image (Image) – Source image to analyze.
- Returns:
Page candidates in reading order.
- Return type:
- __init__(*args, **kwargs)#
- class churro_ocr.page_detection.PageDetector[source]#
Bases:
objectDetect one or more page crops from an input image.
Create a page detector.
- Parameters:
backend – Optional low-level backend or async callable. When not provided, the full input image is treated as a single page.
- __init__(backend=None)[source]#
Create a page detector.
- Parameters:
backend (PageDetectionBackend | Callable[[Image], Awaitable[list[PageCandidate]]] | None) – Optional low-level backend or async callable. When not provided, the full input image is treated as a single page.
- Return type:
None
- async adetect(request)[source]#
Asynchronously detect pages for a single image.
- Parameters:
request (PageDetectionRequest) – Detection request describing the source image.
- Returns:
Detected page crops in reading order.
- Return type:
- detect(request)[source]#
Synchronously detect pages for a single image.
- Parameters:
request (PageDetectionRequest) – Detection request describing the source image.
- Returns:
Detected page crops in reading order.
- Return type:
- class churro_ocr.page_detection.DocumentPageDetector[source]#
Bases:
objectDetect pages from raw images or PDFs.
Create a document page detector.
- Parameters:
backend – Optional low-level detection backend or async callable.
- __init__(*, backend=None)[source]#
Create a document page detector.
- Parameters:
backend (PageDetectionBackend | Callable[[Image], Awaitable[list[PageCandidate]]] | None) – Optional low-level detection backend or async callable.
- Return type:
None
- async detect_image(request)[source]#
Detect pages in a single image.
- Parameters:
request (PageDetectionRequest) – Detection request describing the source image.
- Returns:
Detection result for one image input.
- Return type:
- detect_image_sync(request)[source]#
Synchronously detect pages in a single image.
- Parameters:
request (PageDetectionRequest) – Detection request describing the source image.
- Returns:
Detection result for one image input.
- Return type:
- async detect_pdf(path, *, dpi=300, trim_margin=30)[source]#
Rasterize a PDF and detect pages on each image.
- Parameters:
- Returns:
Detection result containing all detected pages from the PDF.
- Return type: