Provider APIs#
churro_ocr.providers#
Public OCR builders and page detection backends.
- class churro_ocr.providers.AzureDocumentIntelligenceOptions[source]#
Bases:
objectProvider options for Azure Document Intelligence OCR.
- Parameters:
endpoint – Azure Document Intelligence endpoint URL.
api_key – Azure API key for the configured resource.
- class churro_ocr.providers.AzurePageDetector[source]#
Bases:
PageDetectionBackendDetect pages from Azure Document Intelligence page output.
- Parameters:
endpoint – Azure Document Intelligence endpoint URL.
api_key – Azure API key for the configured resource.
model_id – Azure model ID used for page analysis.
- __init__(endpoint, api_key, model_id='prebuilt-layout')#
- async detect(image)[source]#
Detect page candidates from one image using Azure.
- Parameters:
image (Image) – Source image to analyze.
- Returns:
Detected page candidates in reading order. Falls back to a single full-image candidate when Azure returns no pages.
- Raises:
ConfigurationError – If the optional Azure dependency is not installed.
- Return type:
- class churro_ocr.providers.BatchOCRBackend[source]#
Bases:
ProtocolAsync batch OCR backend interface.
- __init__(*args, **kwargs)#
- churro_ocr.providers.build_ocr_backend(spec)[source]#
Build an OCR backend from a declarative spec.
- Parameters:
spec (OCRBackendSpec) – Declarative backend specification.
- Returns:
Configured OCR backend ready for use with
OCRClientorDocumentOCRPipeline.- Raises:
ConfigurationError – If the provider is unsupported or required provider-specific configuration is missing.
- Return type:
- class churro_ocr.providers.HuggingFaceOptions[source]#
Bases:
objectProvider options for local Hugging Face OCR backends.
- Parameters:
trust_remote_code – Whether to allow remote model code execution.
processor_kwargs – Extra kwargs passed to
AutoProcessor.from_pretrained.model_kwargs – Extra kwargs passed to model loading.
generation_kwargs – Extra generation kwargs passed at inference time.
vision_input_builder – Optional override for building multimodal inputs.
backend_variant – Optional implementation preset such as
"dots-ocr-1.5".
- __init__(trust_remote_code=None, processor_kwargs=<factory>, model_kwargs=<factory>, generation_kwargs=<factory>, vision_input_builder=None, backend_variant=None)#
- class churro_ocr.providers.LiteLLMTransportConfig[source]#
Bases:
objectShared transport config for LiteLLM-based multimodal requests.
- Parameters:
api_base – Optional API base URL override.
api_key – Optional API key forwarded to LiteLLM.
api_version – Optional API version string for providers that need one.
image_detail – Optional image-detail hint supported by some providers.
completion_kwargs – Extra completion kwargs merged into LiteLLM calls.
cache_dir – Optional disk-cache directory for LiteLLM request caching.
- __init__(api_base=None, api_key=None, api_version=None, image_detail=None, completion_kwargs=<factory>, cache_dir=None)#
- class churro_ocr.providers.LLMPageDetector[source]#
Bases:
PageDetectionBackendDetect one or more pages via a multimodal LLM prompt.
- Parameters:
model – Multimodal model identifier to query through LiteLLM.
system_prompt – System prompt used for the initial page-box request.
prompt_template – Optional user prompt override for the initial request.
transport – Optional LiteLLM transport config.
max_review_rounds – Number of iterative review rounds used to refine the initial page boxes.
- __init__(model, system_prompt='You are an expert document analysis AI. Your task is to detect the precise boundaries\nof document pages to prepare them for OCR.\nThe goal is to crop the image to the **tightest possible rectangle** that contains all\nthe content, removing as much empty margin as possible.\n\nIMPORTANT TERMINOLOGY: In this task, \'page\' means the tightest bounding box around the\nvisible content on a page, NOT the full sheet of paper. Exclude blank/white paper margins\nwhenever they do not contain meaningful content.\n\nIf a neighboring page intrudes into the image (or overlaps visually), do NOT include that\nspillover content in this page\'s box. Each box should isolate one page only.\n\nIdentify every document page in this image (usually 1 or 2). For each page, define a\nbounding box following these strict rules:\n- Return a JSON object with a key "pages".\n- "pages" must be a list containing zero or more objects.\n- Each object must have:\n * "page_index": 1-based index in reading order.\n * "left": integer normalized 0-1000 describing the minimum horizontal coordinate.\n * "top": integer normalized 0-1000 describing the minimum vertical coordinate.\n * "right": integer normalized 0-1000 describing the maximum horizontal coordinate.\n * "bottom": integer normalized 0-1000 describing the maximum vertical coordinate.\n- Provide all coordinates as integers (no decimals) and keep them normalized to the\n 0-1000 range.\n- CRITICAL GOAL: Find the **tightest possible bounding box** that contains **ALL**\n meaningful content on the page.\n- INCLUDE:\n * All printed text (headers, footers, page numbers, body text).\n * All handwriting (signatures, marginalia, corrections).\n * All stamps, seals, logos, and drawings.\n * Any content that conveys information.\n- EXCLUDE:\n * Empty page margins (white space).\n * Content belonging to a different page (even if it appears in the same image).\n * Partial/overflowing text or graphics from a neighboring page.\n * Dark edges or background from the scanner/camera.\n * Binding rings, spiral binding, or book spines.\n * Shadows, scanner noise, or artifacts outside the content area.\n- The box should be as small as possible while still containing every pixel of ink/content.\n- Each returned box must correspond to exactly one page\'s content region.\n- If no page is visible, return {"pages": []}.\n', prompt_template=None, transport=None, max_review_rounds=0)#
- Parameters:
model (str)
system_prompt (str)
prompt_template (str | None)
transport (LiteLLMTransportConfig | None)
max_review_rounds (int)
- Return type:
None
- async churro_ocr.providers.locate_text_block_bbox_with_llm(image, block_text, *, block_tag, model, transport=None, max_review_rounds=0)[source]#
Locate the tight bbox of a specific rendered text block via a multimodal LLM.
- Parameters:
image (Image) – Source page image containing the rendered block.
block_text (str) – Normalized text content of the target block.
block_tag (str) – HDML-style block tag describing the block type.
model (str) – Multimodal model identifier to query through LiteLLM.
transport (LiteLLMTransportConfig | LiteLLMTransport | None) – Optional LiteLLM transport or transport config.
max_review_rounds (int) – Number of iterative review rounds used to refine the initial box.
- Returns:
Bounding box in source-image coordinates, or
Nonewhen no unique matching block can be found.- Raises:
ValueError – If
block_textorblock_tagis blank.- Return type:
- churro_ocr.providers.locate_text_block_bbox_with_llm_sync(image, block_text, *, block_tag, model, transport=None, max_review_rounds=0)[source]#
Synchronously locate the tight bbox of a specific rendered text block via a multimodal LLM.
- Parameters:
image (Image) – Source page image containing the rendered block.
block_text (str) – Normalized text content of the target block.
block_tag (str) – HDML-style block tag describing the block type.
model (str) – Multimodal model identifier to query through LiteLLM.
transport (LiteLLMTransportConfig | LiteLLMTransport | None) – Optional LiteLLM transport or transport config.
max_review_rounds (int) – Number of iterative review rounds used to refine the initial box.
- Returns:
Bounding box in source-image coordinates, or
Nonewhen no unique matching block can be found.- Raises:
ValueError – If
block_textorblock_tagis blank.- Return type:
- class churro_ocr.providers.MistralOptions[source]#
Bases:
objectProvider options for Mistral OCR.
- Parameters:
api_key – Mistral API key used for OCR requests.
- class churro_ocr.providers.OCRBackendSpec[source]#
Bases:
objectDeclarative builder input for OCR backends.
- Parameters:
provider – OCR provider identifier.
model – Provider-specific model identifier.
profile – Optional built-in or custom model profile.
transport – Optional transport settings for LiteLLM-based providers.
options – Optional provider-specific options dataclass.
- __init__(provider, model=None, profile=None, transport=None, options=None)#
- Parameters:
provider (Literal['litellm', 'openai-compatible', 'azure', 'mistral', 'hf', 'vllm'])
model (str | None)
profile (str | OCRModelProfile | None)
transport (LiteLLMTransportConfig | None)
options (OpenAICompatibleOptions | HuggingFaceOptions | VLLMOptions | AzureDocumentIntelligenceOptions | MistralOptions | None)
- Return type:
None
- class churro_ocr.providers.OCRModelProfile[source]#
Bases:
objectModel-level OCR behavior shared across provider adapters.
- Parameters:
profile_name – Stable profile identifier.
template – Prompt template used to render OCR input.
image_preprocessor – Image preprocessor applied before OCR.
text_postprocessor – Text postprocessor applied after OCR.
display_name – Optional human-readable model name.
transport – Default LiteLLM transport settings for this profile.
huggingface – Default Hugging Face backend options for this profile.
vllm – Default vLLM backend options for this profile.
- __init__(profile_name, template=HFChatTemplate(system_message='You are an expert in diplomatic transcription of historical documents from various languages. Your task is to extract the full text from a given page. Only output the transcribed text between <output> and </output> tags.', user_prompt='Follow these instructions:\n\n1. You will be provided with a scanned document page.\n\n2. Perform transcription on the entirety of the page, converting all visible text into the following format. Include handwritten and print text, if any. Include tables, captions, headers, main text and all other visible text.\n\n3. If you encounter any non-text elements, simply skip them without attempting to describe them.\n\n4. Do not modernize or standardize the text. For example, if the transcription is using "ſ" instead of "s" or "а" instead of "a", keep it that way.\n\n5. When you come across text in languages other than English, transcribe it as accurately as possible without translation.\n\n6. Output the OCR result in the following format:\n\n<output>\nextracted text here\n</output>\n\nRemember, your goal is to accurately transcribe the text from the scanned page as much as possible. Process the entire page, even if it contains a large amount of text, and provide clear, well-formatted output. Pay attention to the appropriate reading order and layout of the text.', include_image=True), image_preprocessor=<function default_ocr_image_preprocessor>, text_postprocessor=<function default_ocr_text_postprocessor>, display_name=None, transport=<factory>, huggingface=<factory>, vllm=<factory>)#
- Parameters:
profile_name (str)
template (OCRPromptTemplate | Callable[[DocumentPage], list[dict[str, Any]]])
display_name (str | None)
transport (LiteLLMTransportConfig)
huggingface (HuggingFaceOptions)
vllm (VLLMOptions)
- Return type:
None
- class churro_ocr.providers.OpenAICompatibleOptions[source]#
Bases:
objectProvider options for OpenAI-compatible OCR servers.
- Parameters:
model_prefix – Provider prefix prepended to the configured model name.
- churro_ocr.providers.resolve_ocr_profile(*, model_id, profile=None)[source]#
Resolve the OCR model profile for a model or explicit profile.
- Parameters:
model_id (str | None) – Model identifier that may map to a built-in profile.
profile (str | OCRModelProfile | None) – Explicit profile name or profile object to use.
- Returns:
Resolved OCR model profile.
- Raises:
ValueError – If
profileis a string that does not match a known profile.- Return type:
- class churro_ocr.providers.VLLMOptions[source]#
Bases:
objectProvider options for local vLLM OCR backends.
- Parameters:
trust_remote_code – Whether to allow remote model code execution.
processor_kwargs – Extra kwargs passed to
AutoProcessor.from_pretrained.llm_kwargs – Extra kwargs passed to the vLLM
LLMconstructor.sampling_kwargs – Extra kwargs passed to vLLM sampling params.
limit_mm_per_prompt – Per-request multimodal limits passed to vLLM.
- __init__(trust_remote_code=None, processor_kwargs=<factory>, llm_kwargs=<factory>, sampling_kwargs=<factory>, limit_mm_per_prompt=<factory>)#
churro_ocr.providers.specs#
Public OCR provider specs, options, and model profile resolution.
- class churro_ocr.providers.specs.AzureDocumentIntelligenceOptions[source]#
Bases:
objectProvider options for Azure Document Intelligence OCR.
- Parameters:
endpoint – Azure Document Intelligence endpoint URL.
api_key – Azure API key for the configured resource.
- churro_ocr.providers.specs.default_ocr_image_preprocessor(image)[source]#
Apply the default OCR image preprocessing.
- churro_ocr.providers.specs.default_ocr_profile()[source]#
Return the generic OCR model profile.
- Returns:
Baseline profile used when no more specific profile matches.
- Return type:
- churro_ocr.providers.specs.default_ocr_text_postprocessor(text)[source]#
Strip the default OCR output tag wrapper.
- class churro_ocr.providers.specs.HuggingFaceOptions[source]#
Bases:
objectProvider options for local Hugging Face OCR backends.
- Parameters:
trust_remote_code – Whether to allow remote model code execution.
processor_kwargs – Extra kwargs passed to
AutoProcessor.from_pretrained.model_kwargs – Extra kwargs passed to model loading.
generation_kwargs – Extra generation kwargs passed at inference time.
vision_input_builder – Optional override for building multimodal inputs.
backend_variant – Optional implementation preset such as
"dots-ocr-1.5".
- __init__(trust_remote_code=None, processor_kwargs=<factory>, model_kwargs=<factory>, generation_kwargs=<factory>, vision_input_builder=None, backend_variant=None)#
- class churro_ocr.providers.specs.LiteLLMTransportConfig[source]#
Bases:
objectShared transport config for LiteLLM-based multimodal requests.
- Parameters:
api_base – Optional API base URL override.
api_key – Optional API key forwarded to LiteLLM.
api_version – Optional API version string for providers that need one.
image_detail – Optional image-detail hint supported by some providers.
completion_kwargs – Extra completion kwargs merged into LiteLLM calls.
cache_dir – Optional disk-cache directory for LiteLLM request caching.
- __init__(api_base=None, api_key=None, api_version=None, image_detail=None, completion_kwargs=<factory>, cache_dir=None)#
- class churro_ocr.providers.specs.MistralOptions[source]#
Bases:
objectProvider options for Mistral OCR.
- Parameters:
api_key – Mistral API key used for OCR requests.
- class churro_ocr.providers.specs.OCRBackendSpec[source]#
Bases:
objectDeclarative builder input for OCR backends.
- Parameters:
provider – OCR provider identifier.
model – Provider-specific model identifier.
profile – Optional built-in or custom model profile.
transport – Optional transport settings for LiteLLM-based providers.
options – Optional provider-specific options dataclass.
- __init__(provider, model=None, profile=None, transport=None, options=None)#
- Parameters:
provider (Literal['litellm', 'openai-compatible', 'azure', 'mistral', 'hf', 'vllm'])
model (str | None)
profile (str | OCRModelProfile | None)
transport (LiteLLMTransportConfig | None)
options (OpenAICompatibleOptions | HuggingFaceOptions | VLLMOptions | AzureDocumentIntelligenceOptions | MistralOptions | None)
- Return type:
None
- class churro_ocr.providers.specs.OCRModelProfile[source]#
Bases:
objectModel-level OCR behavior shared across provider adapters.
- Parameters:
profile_name – Stable profile identifier.
template – Prompt template used to render OCR input.
image_preprocessor – Image preprocessor applied before OCR.
text_postprocessor – Text postprocessor applied after OCR.
display_name – Optional human-readable model name.
transport – Default LiteLLM transport settings for this profile.
huggingface – Default Hugging Face backend options for this profile.
vllm – Default vLLM backend options for this profile.
- __init__(profile_name, template=HFChatTemplate(system_message='You are an expert in diplomatic transcription of historical documents from various languages. Your task is to extract the full text from a given page. Only output the transcribed text between <output> and </output> tags.', user_prompt='Follow these instructions:\n\n1. You will be provided with a scanned document page.\n\n2. Perform transcription on the entirety of the page, converting all visible text into the following format. Include handwritten and print text, if any. Include tables, captions, headers, main text and all other visible text.\n\n3. If you encounter any non-text elements, simply skip them without attempting to describe them.\n\n4. Do not modernize or standardize the text. For example, if the transcription is using "ſ" instead of "s" or "а" instead of "a", keep it that way.\n\n5. When you come across text in languages other than English, transcribe it as accurately as possible without translation.\n\n6. Output the OCR result in the following format:\n\n<output>\nextracted text here\n</output>\n\nRemember, your goal is to accurately transcribe the text from the scanned page as much as possible. Process the entire page, even if it contains a large amount of text, and provide clear, well-formatted output. Pay attention to the appropriate reading order and layout of the text.', include_image=True), image_preprocessor=<function default_ocr_image_preprocessor>, text_postprocessor=<function default_ocr_text_postprocessor>, display_name=None, transport=<factory>, huggingface=<factory>, vllm=<factory>)#
- Parameters:
profile_name (str)
template (OCRPromptTemplate | Callable[[DocumentPage], list[dict[str, Any]]])
display_name (str | None)
transport (LiteLLMTransportConfig)
huggingface (HuggingFaceOptions)
vllm (VLLMOptions)
- Return type:
None
- class churro_ocr.providers.specs.OpenAICompatibleOptions[source]#
Bases:
objectProvider options for OpenAI-compatible OCR servers.
- Parameters:
model_prefix – Provider prefix prepended to the configured model name.
- churro_ocr.providers.specs.resolve_ocr_profile(*, model_id, profile=None)[source]#
Resolve the OCR model profile for a model or explicit profile.
- Parameters:
model_id (str | None) – Model identifier that may map to a built-in profile.
profile (str | OCRModelProfile | None) – Explicit profile name or profile object to use.
- Returns:
Resolved OCR model profile.
- Raises:
ValueError – If
profileis a string that does not match a known profile.- Return type:
- class churro_ocr.providers.specs.VLLMOptions[source]#
Bases:
objectProvider options for local vLLM OCR backends.
- Parameters:
trust_remote_code – Whether to allow remote model code execution.
processor_kwargs – Extra kwargs passed to
AutoProcessor.from_pretrained.llm_kwargs – Extra kwargs passed to the vLLM
LLMconstructor.sampling_kwargs – Extra kwargs passed to vLLM sampling params.
limit_mm_per_prompt – Per-request multimodal limits passed to vLLM.
- __init__(trust_remote_code=None, processor_kwargs=<factory>, llm_kwargs=<factory>, sampling_kwargs=<factory>, limit_mm_per_prompt=<factory>)#
churro_ocr.providers.page_detection#
Built-in page detection backends.
- class churro_ocr.providers.page_detection.AzurePageDetector[source]#
Bases:
PageDetectionBackendDetect pages from Azure Document Intelligence page output.
- Parameters:
endpoint – Azure Document Intelligence endpoint URL.
api_key – Azure API key for the configured resource.
model_id – Azure model ID used for page analysis.
- async detect(image)[source]#
Detect page candidates from one image using Azure.
- Parameters:
image (Image) – Source image to analyze.
- Returns:
Detected page candidates in reading order. Falls back to a single full-image candidate when Azure returns no pages.
- Raises:
ConfigurationError – If the optional Azure dependency is not installed.
- Return type:
- class churro_ocr.providers.page_detection.LLMPageDetector[source]#
Bases:
PageDetectionBackendDetect one or more pages via a multimodal LLM prompt.
- Parameters:
model – Multimodal model identifier to query through LiteLLM.
system_prompt – System prompt used for the initial page-box request.
prompt_template – Optional user prompt override for the initial request.
transport – Optional LiteLLM transport config.
max_review_rounds – Number of iterative review rounds used to refine the initial page boxes.
- async detect(image)[source]#
Detect page candidates from one image.
- Parameters:
image (Image) – Source image that may contain one or more visible pages.
- Returns:
Detected page candidates in reading order. Falls back to a single full-image candidate when no page boxes are returned.
- Return type:
- __init__(model, system_prompt='You are an expert document analysis AI. Your task is to detect the precise boundaries\nof document pages to prepare them for OCR.\nThe goal is to crop the image to the **tightest possible rectangle** that contains all\nthe content, removing as much empty margin as possible.\n\nIMPORTANT TERMINOLOGY: In this task, \'page\' means the tightest bounding box around the\nvisible content on a page, NOT the full sheet of paper. Exclude blank/white paper margins\nwhenever they do not contain meaningful content.\n\nIf a neighboring page intrudes into the image (or overlaps visually), do NOT include that\nspillover content in this page\'s box. Each box should isolate one page only.\n\nIdentify every document page in this image (usually 1 or 2). For each page, define a\nbounding box following these strict rules:\n- Return a JSON object with a key "pages".\n- "pages" must be a list containing zero or more objects.\n- Each object must have:\n * "page_index": 1-based index in reading order.\n * "left": integer normalized 0-1000 describing the minimum horizontal coordinate.\n * "top": integer normalized 0-1000 describing the minimum vertical coordinate.\n * "right": integer normalized 0-1000 describing the maximum horizontal coordinate.\n * "bottom": integer normalized 0-1000 describing the maximum vertical coordinate.\n- Provide all coordinates as integers (no decimals) and keep them normalized to the\n 0-1000 range.\n- CRITICAL GOAL: Find the **tightest possible bounding box** that contains **ALL**\n meaningful content on the page.\n- INCLUDE:\n * All printed text (headers, footers, page numbers, body text).\n * All handwriting (signatures, marginalia, corrections).\n * All stamps, seals, logos, and drawings.\n * Any content that conveys information.\n- EXCLUDE:\n * Empty page margins (white space).\n * Content belonging to a different page (even if it appears in the same image).\n * Partial/overflowing text or graphics from a neighboring page.\n * Dark edges or background from the scanner/camera.\n * Binding rings, spiral binding, or book spines.\n * Shadows, scanner noise, or artifacts outside the content area.\n- The box should be as small as possible while still containing every pixel of ink/content.\n- Each returned box must correspond to exactly one page\'s content region.\n- If no page is visible, return {"pages": []}.\n', prompt_template=None, transport=None, max_review_rounds=0)#
- Parameters:
model (str)
system_prompt (str)
prompt_template (str | None)
transport (LiteLLMTransportConfig | None)
max_review_rounds (int)
- Return type:
None
- async churro_ocr.providers.page_detection.locate_text_block_bbox_with_llm(image, block_text, *, block_tag, model, transport=None, max_review_rounds=0)[source]#
Locate the tight bbox of a specific rendered text block via a multimodal LLM.
- Parameters:
image (Image) – Source page image containing the rendered block.
block_text (str) – Normalized text content of the target block.
block_tag (str) – HDML-style block tag describing the block type.
model (str) – Multimodal model identifier to query through LiteLLM.
transport (LiteLLMTransportConfig | LiteLLMTransport | None) – Optional LiteLLM transport or transport config.
max_review_rounds (int) – Number of iterative review rounds used to refine the initial box.
- Returns:
Bounding box in source-image coordinates, or
Nonewhen no unique matching block can be found.- Raises:
ValueError – If
block_textorblock_tagis blank.- Return type:
- churro_ocr.providers.page_detection.locate_text_block_bbox_with_llm_sync(image, block_text, *, block_tag, model, transport=None, max_review_rounds=0)[source]#
Synchronously locate the tight bbox of a specific rendered text block via a multimodal LLM.
- Parameters:
image (Image) – Source page image containing the rendered block.
block_text (str) – Normalized text content of the target block.
block_tag (str) – HDML-style block tag describing the block type.
model (str) – Multimodal model identifier to query through LiteLLM.
transport (LiteLLMTransportConfig | LiteLLMTransport | None) – Optional LiteLLM transport or transport config.
max_review_rounds (int) – Number of iterative review rounds used to refine the initial box.
- Returns:
Bounding box in source-image coordinates, or
Nonewhen no unique matching block can be found.- Raises:
ValueError – If
block_textorblock_tagis blank.- Return type: