Provider APIs#

churro_ocr.providers is a convenience namespace. It re-exports backend builders, provider option dataclasses, and page-detection helpers from the owning modules below.

Use the canonical module that owns each symbol:

Convenience import	Canonical reference
`build_ocr_backend(...)`	`churro_ocr.providers.builder`
`OCRBackendSpec`, `OCRModelProfile`, `LiteLLMTransportConfig`, `HuggingFaceOptions`, `OpenAICompatibleOptions`, `AzureDocumentIntelligenceOptions`, `MistralOptions`, `resolve_ocr_profile(...)`	`churro_ocr.providers.specs`
`AzurePageDetector`, `LLMPageDetector`, `locate_text_block_bbox_with_llm(...)`, `locate_text_block_bbox_with_llm_sync(...)`	`churro_ocr.providers.page_detection`

`churro_ocr.providers.builder`#

Public OCR backend builder.

churro_ocr.providers.builder.build_ocr_backend(spec)[source]#

Build an OCR backend from a declarative spec.

Parameters:: spec (OCRBackendSpec) – Declarative backend specification.
Returns:: Configured OCR backend ready for use with OCRClient or DocumentOCRPipeline.
Raises:: ConfigurationError – If the provider is unsupported or required provider-specific configuration is missing.
Return type:: OCRBackend

`churro_ocr.providers.specs`#

Public OCR provider specs, options, and model profile resolution.

class churro_ocr.providers.specs.AzureDocumentIntelligenceOptions[source]#

Bases: object

Provider options for Azure Document Intelligence OCR.

Parameters:

endpoint – Azure Document Intelligence endpoint URL.
api_key – Azure API key for the configured resource.

__init__(endpoint=None, api_key=None)#

Parameters:

endpoint (str | None)
api_key (str | None)

Return type:

None

class churro_ocr.providers.specs.HuggingFaceOptions[source]#

Bases: object

Provider options for local Hugging Face OCR backends.

Parameters:

trust_remote_code – Whether to allow remote model code execution.
processor_kwargs – Extra kwargs passed to AutoProcessor.from_pretrained.
model_kwargs – Extra kwargs passed to model loading.
generation_kwargs – Extra generation kwargs passed at inference time.
vision_input_builder – Optional override for building multimodal inputs.
backend_variant – Optional implementation preset such as "dots-ocr-1.5".

__init__(trust_remote_code=None, processor_kwargs=<factory>, model_kwargs=<factory>, generation_kwargs=<factory>, vision_input_builder=None, backend_variant=None)#

Parameters:

trust_remote_code (bool | None)
processor_kwargs (dict[str, object])
model_kwargs (dict[str, object])
generation_kwargs (dict[str, object])
vision_input_builder (Callable[[OCRConversation], object] | None)
backend_variant (str | None)

Return type:

None

class churro_ocr.providers.specs.LiteLLMTransportConfig[source]#

Bases: object

Shared transport config for LiteLLM-based multimodal requests.

Parameters:

api_base – Optional API base URL override.
api_key – Optional API key forwarded to LiteLLM.
api_version – Optional API version string for providers that need one.
image_detail – Optional image-detail hint supported by some providers.
completion_kwargs – Extra completion kwargs merged into LiteLLM calls.
cache_dir – Optional disk-cache directory for LiteLLM request caching.

__init__(api_base=None, api_key=None, api_version=None, image_detail=None, completion_kwargs=<factory>, cache_dir=None)#

Parameters:

api_base (str | None)
api_key (str | None)
api_version (str | None)
image_detail (str | None)
completion_kwargs (dict[str, object])
cache_dir (str | Path | None)

Return type:

None

class churro_ocr.providers.specs.MistralOptions[source]#

Bases: object

Provider options for Mistral OCR.

Parameters:: api_key – Mistral API key used for OCR requests.

__init__(api_key=None)#

Parameters:: api_key (str | None)
Return type:: None

class churro_ocr.providers.specs.OCRBackendSpec[source]#

Bases: object

Declarative builder input for OCR backends.

Parameters:

provider – OCR provider identifier.
model – Provider-specific model identifier.
profile – Optional built-in or custom model profile.
transport – Optional transport settings for LiteLLM-based providers.
options – Optional provider-specific options dataclass.

__init__(provider, model=None, profile=None, transport=None, options=None)#

Parameters:

provider (Literal['litellm', 'openai-compatible', 'azure', 'mistral', 'hf'])
model (str | None)
profile (str | OCRModelProfile | None)
transport (LiteLLMTransportConfig | None)
options (OpenAICompatibleOptions | HuggingFaceOptions | AzureDocumentIntelligenceOptions | MistralOptions | None)

Return type:

None

class churro_ocr.providers.specs.OCRModelProfile[source]#

Bases: object

Model-level OCR behavior shared across provider adapters.

Parameters:

profile_name – Stable profile identifier.
template – Prompt template used to render OCR input.
image_preprocessor – Image preprocessor applied before OCR.
text_postprocessor – Text postprocessor applied after OCR.
display_name – Optional human-readable model name.
transport – Default LiteLLM transport settings for this profile.
huggingface – Default Hugging Face backend options for this profile.

__init__(profile_name, template=HFChatTemplate(system_message='You are an expert in diplomatic transcription of historical documents from various languages. Your task is to extract the full text from a given page. Only output the transcribed text between <output> and </output> tags.', user_prompt='Follow these instructions:\n\n1. You will be provided with a scanned document page.\n\n2. Perform transcription on the entirety of the page, converting all visible text into the following format. Include handwritten and print text, if any. Include tables, captions, headers, main text and all other visible text.\n\n3. If you encounter any non-text elements, simply skip them without attempting to describe them.\n\n4. Do not modernize or standardize the text. For example, if the transcription is using "ſ" instead of "s" or "а" instead of "a", keep it that way.\n\n5. When you come across text in languages other than English, transcribe it as accurately as possible without translation.\n\n6. Output the OCR result in the following format:\n\n<output>\nextracted text here\n</output>\n\nRemember, your goal is to accurately transcribe the text from the scanned page as much as possible. Process the entire page, even if it contains a large amount of text, and provide clear, well-formatted output. Pay attention to the appropriate reading order and layout of the text.', include_image=True, user_prompt_first=False), image_preprocessor=<function default_ocr_image_preprocessor>, text_postprocessor=<function default_ocr_text_postprocessor>, display_name=None, transport=<factory>, huggingface=<factory>)#

Parameters:

profile_name (str)
template (OCRPromptTemplate | Callable[[DocumentPage], OCRConversation])
image_preprocessor (Callable[[Image], Image])
text_postprocessor (Callable[[str], str | tuple[str, MetadataDict]])
display_name (str | None)
transport (LiteLLMTransportConfig)
huggingface (HuggingFaceOptions)

Return type:

None

class churro_ocr.providers.specs.OpenAICompatibleOptions[source]#

Bases: object

Provider options for OpenAI-compatible OCR servers.

Parameters:: model_prefix – Provider prefix prepended to the configured model name.

__init__(model_prefix=None)#

Parameters:: model_prefix (str | None)
Return type:: None

churro_ocr.providers.specs.chandra_image_preprocessor(image)[source]#

Resize an image using Chandra OCR 2’s pixel-budget and 28px-grid scaling.

Parameters:: image (Image)
Return type:: Image

churro_ocr.providers.specs.chandra_ocr_2_profile()[source]#

Return the built-in datalab-to/chandra-ocr-2 OCR profile.

Return type:: OCRModelProfile

churro_ocr.providers.specs.chandra_text_postprocessor(text)[source]#

Extract plain text and metadata from Chandra OCR 2 HTML-layout output.

Parameters:: text (str)
Return type:: str | tuple[str, MetadataDict]

churro_ocr.providers.specs.deepseek_ocr_2_profile()[source]#

Return the built-in deepseek-ai/DeepSeek-OCR-2 OCR profile.

Return type:: OCRModelProfile

churro_ocr.providers.specs.deepseek_ocr_2_text_postprocessor(text)[source]#

Strip DeepSeek OCR 2 prompt echoes, chat scaffold, and trailing stop tokens.

Parameters:: text (str)
Return type:: str

churro_ocr.providers.specs.default_ocr_image_preprocessor(image)[source]#

Apply the default OCR image preprocessing.

Parameters:: image (Image) – Source page image.
Returns:: Preprocessed image ready for OCR.
Return type:: Image

churro_ocr.providers.specs.default_ocr_profile()[source]#

Return the generic OCR model profile.

Returns:: Baseline profile used when no more specific profile matches.
Return type:: OCRModelProfile

churro_ocr.providers.specs.default_ocr_text_postprocessor(text)[source]#

Strip the default OCR output tag wrapper.

Parameters:: text (str) – Raw OCR response text.
Returns:: OCR text with the default wrapper removed when present.
Return type:: str

churro_ocr.providers.specs.firered_ocr_profile()[source]#

Return the built-in FireRedTeam/FireRed-OCR OCR profile.

Return type:: OCRModelProfile

churro_ocr.providers.specs.firered_ocr_text_postprocessor(text)[source]#

Normalize FireRed-OCR markdown output to plain text and preserve raw markdown.

Parameters:: text (str)
Return type:: str | tuple[str, MetadataDict]

churro_ocr.providers.specs.glm_ocr_image_preprocessor(image)[source]#

Resize GLM-OCR inputs to stay within vLLM’s encoder image-item budget.

Parameters:: image (Image)
Return type:: Image

churro_ocr.providers.specs.glm_ocr_profile()[source]#

Return the built-in zai-org/GLM-OCR OCR profile.

Return type:: OCRModelProfile

churro_ocr.providers.specs.glm_ocr_text_postprocessor(text)[source]#

Strip GLM-OCR prompt echoes, chat scaffold, and trailing special tokens.

Parameters:: text (str)
Return type:: str

churro_ocr.providers.specs.identity_text_postprocessor(text)[source]#

Return OCR text unchanged.

Parameters:: text (str) – OCR text to return.
Returns:: The original text value.
Return type:: str

churro_ocr.providers.specs.infinity_parser_7b_profile()[source]#

Return the built-in infly/Infinity-Parser-7B OCR profile.

Return type:: OCRModelProfile

churro_ocr.providers.specs.infinity_parser_7b_text_postprocessor(text)[source]#

Normalize Infinity-Parser markdown output to plain text and preserve raw markdown.

Parameters:: text (str)
Return type:: str | tuple[str, MetadataDict]

churro_ocr.providers.specs.lfm2_5_vl_1_6b_profile()[source]#

Return the built-in LiquidAI/LFM2.5-VL-1.6B OCR profile.

Return type:: OCRModelProfile

churro_ocr.providers.specs.lfm2_5_vl_text_postprocessor(text)[source]#

Strip Liquid LFM2.5-VL chat scaffold and OCR wrapper tags.

Parameters:: text (str)
Return type:: str

churro_ocr.providers.specs.mineru2_5_2509_1_2b_profile()[source]#

Return the built-in opendatalab/MinerU2.5-2509-1.2B OCR profile.

Return type:: OCRModelProfile

churro_ocr.providers.specs.nanonets_ocr2_3b_profile()[source]#

Return the built-in nanonets/Nanonets-OCR2-3B OCR profile.

Return type:: OCRModelProfile

churro_ocr.providers.specs.nanonets_ocr2_3b_text_postprocessor(text)[source]#

Normalize Nanonets-OCR2 markdown output to plain text and preserve raw markdown.

Parameters:: text (str)
Return type:: str | tuple[str, MetadataDict]

churro_ocr.providers.specs.olmocr_image_preprocessor(image)[source]#

Resize an image to olmOCR’s expected 1288px longest side and normalize to RGB.

Parameters:: image (Image)
Return type:: Image

churro_ocr.providers.specs.olmocr_text_postprocessor(text)[source]#

Extract plain text and metadata from olmOCR YAML/markdown output.

Parameters:: text (str)
Return type:: str | tuple[str, MetadataDict]

churro_ocr.providers.specs.paddleocr_vl_1_5_profile()[source]#

Return the built-in PaddlePaddle/PaddleOCR-VL-1.5 OCR profile.

Return type:: OCRModelProfile

churro_ocr.providers.specs.paddleocr_vl_text_postprocessor(text)[source]#

Strip PaddleOCR-VL prompt echoes and leading chat scaffold from OCR output.

Parameters:: text (str)
Return type:: str

churro_ocr.providers.specs.qianfan_ocr_profile()[source]#

Return the built-in baidu/Qianfan-OCR OCR profile.

Return type:: OCRModelProfile

churro_ocr.providers.specs.qianfan_ocr_text_postprocessor(text)[source]#

Normalize Qianfan-OCR markdown output to plain text and preserve raw markdown.

Parameters:: text (str)
Return type:: str | tuple[str, MetadataDict]

churro_ocr.providers.specs.resolve_ocr_profile(*, model_id, profile=None)[source]#

Resolve the OCR model profile for a model or explicit profile.

Parameters:

model_id (str | None) – Model identifier that may map to a built-in profile.
profile (str | OCRModelProfile | None) – Explicit profile name or profile object to use.

Returns:

Resolved OCR model profile.

Raises:

ValueError – If profile is a string that does not match a known profile.

Return type:

OCRModelProfile

churro_ocr.providers.specs.validate_mistral_ocr_model(model, *, context="OCR provider 'mistral'")[source]#

Return a supported pinned Mistral OCR model id or raise a configuration error.

Parameters:

model (str | None)
context (str)

Return type:

Literal[‘mistral-ocr-2505’, ‘mistral-ocr-2512’]

`churro_ocr.providers.page_detection`#

Built-in page detection backends.

class churro_ocr.providers.page_detection.AzurePageDetector[source]#

Bases: PageDetectionBackend

Detect pages from Azure Document Intelligence page output.

Parameters:

endpoint – Azure Document Intelligence endpoint URL.
api_key – Azure API key for the configured resource.
model_id – Azure model ID used for page analysis.

async detect(image)[source]#

Detect page candidates from one image using Azure.

Parameters:: image (Image.Image) – Source image to analyze.
Returns:: Detected page candidates in reading order. Falls back to a single full-image candidate when Azure returns no pages.
Raises:: ConfigurationError – If the optional Azure dependency is not installed.
Return type:: list[PageCandidate]

__init__(endpoint, api_key, model_id='prebuilt-layout')#

Parameters:

endpoint (str)
api_key (str)
model_id (str)

Return type:

None

class churro_ocr.providers.page_detection.LLMPageDetector[source]#

Bases: PageDetectionBackend

Detect one or more pages via a multimodal LLM prompt.

Parameters:

model – Multimodal model identifier to query through LiteLLM.
system_prompt – System prompt used for the initial page-box request.
prompt_template – Optional user prompt override for the initial request.
transport – Optional LiteLLM transport config.
max_review_rounds – Number of iterative review rounds used to refine the initial page boxes.

async detect(image)[source]#

Detect page candidates from one image.

Parameters:: image (Image.Image) – Source image that may contain one or more visible pages.
Returns:: Detected page candidates in reading order. Falls back to a single full-image candidate when no page boxes are returned.
Return type:: list[PageCandidate]

__init__(model, system_prompt='You are an expert document analysis AI. Your task is to detect the precise boundaries\nof document pages to prepare them for OCR.\nThe goal is to crop the image to the **tightest possible rectangle** that contains all\nthe content, removing as much empty margin as possible.\n\nIMPORTANT TERMINOLOGY: In this task, \'page\' means the tightest bounding box around the\nvisible content on a page, NOT the full sheet of paper. Exclude blank/white paper margins\nwhenever they do not contain meaningful content.\n\nIf a neighboring page intrudes into the image (or overlaps visually), do NOT include that\nspillover content in this page\'s box. Each box should isolate one page only.\n\nIdentify every document page in this image (usually 1 or 2). For each page, define a\nbounding box following these strict rules:\n- Return a JSON object with a key "pages".\n- "pages" must be a list containing zero or more objects.\n- Each object must have:\n * "page_index": 1-based index in reading order.\n * "left": integer normalized 0-1000 describing the minimum horizontal coordinate.\n * "top": integer normalized 0-1000 describing the minimum vertical coordinate.\n * "right": integer normalized 0-1000 describing the maximum horizontal coordinate.\n * "bottom": integer normalized 0-1000 describing the maximum vertical coordinate.\n- Provide all coordinates as integers (no decimals) and keep them normalized to the\n 0-1000 range.\n- CRITICAL GOAL: Find the **tightest possible bounding box** that contains **ALL**\n meaningful content on the page.\n- INCLUDE:\n * All printed text (headers, footers, page numbers, body text).\n * All handwriting (signatures, marginalia, corrections).\n * All stamps, seals, logos, and drawings.\n * Any content that conveys information.\n- EXCLUDE:\n * Empty page margins (white space).\n * Content belonging to a different page (even if it appears in the same image).\n * Partial/overflowing text or graphics from a neighboring page.\n * Dark edges or background from the scanner/camera.\n * Binding rings, spiral binding, or book spines.\n * Shadows, scanner noise, or artifacts outside the content area.\n- The box should be as small as possible while still containing every pixel of ink/content.\n- Each returned box must correspond to exactly one page\'s content region.\n- If no page is visible, return {"pages": []}.\n', prompt_template=None, transport=None, max_review_rounds=0)#

Parameters:

model (str)
system_prompt (str)
prompt_template (str | None)
transport (LiteLLMTransportConfig | None)
max_review_rounds (int)

Return type:

None

async churro_ocr.providers.page_detection.locate_text_block_bbox_with_llm(image, block_text, *, block_tag, model, transport=None, max_review_rounds=0)[source]#

Locate the tight bbox of a specific rendered text block via a multimodal LLM.

Parameters:

image (Image.Image) – Source page image containing the rendered block.
block_text (str) – Normalized text content of the target block.
block_tag (str) – HDML-style block tag describing the block type.
model (str) – Multimodal model identifier to query through LiteLLM.
transport (LiteLLMTransportLike) – Optional LiteLLM transport or transport config.
max_review_rounds (int) – Number of iterative review rounds used to refine the initial box.

Returns:

Bounding box in source-image coordinates, or None when no unique matching block can be found.

Raises:

ValueError – If block_text or block_tag is blank.

Return type:

BoundingBox | None

churro_ocr.providers.page_detection.locate_text_block_bbox_with_llm_sync(image, block_text, *, block_tag, model, transport=None, max_review_rounds=0)[source]#

Synchronously locate the tight bbox of a specific rendered text block via a multimodal LLM.

Parameters:

image (Image.Image) – Source page image containing the rendered block.
block_text (str) – Normalized text content of the target block.
block_tag (str) – HDML-style block tag describing the block type.
model (str) – Multimodal model identifier to query through LiteLLM.
transport (LiteLLMTransportLike) – Optional LiteLLM transport or transport config.
max_review_rounds (int) – Number of iterative review rounds used to refine the initial box.

Returns:

Bounding box in source-image coordinates, or None when no unique matching block can be found.

Raises:

ValueError – If block_text or block_tag is blank.

Return type:

BoundingBox | None

Provider APIs#

churro_ocr.providers.builder#

churro_ocr.providers.specs#

churro_ocr.providers.page_detection#

`churro_ocr.providers.builder`#

`churro_ocr.providers.specs`#

`churro_ocr.providers.page_detection`#