churro_ocr.prompts#

Public prompt defaults used by churro-ocr backends.

churro_ocr.prompts.parse_chandra_response(text)[source]#

Extract plain text and metadata from a Chandra HTML-layout response.

Parameters:

text (str)

Return type:

tuple[str, MetadataDict]

churro_ocr.prompts.parse_olmocr_response(text)[source]#

Extract plain text and metadata from an olmOCR YAML-front-matter response.

Parameters:

text (str)

Return type:

tuple[str, MetadataDict]

churro_ocr.prompts.strip_ocr_output_tag(text, *, output_tag=DEFAULT_OCR_OUTPUT_TAG)[source]#

Remove outer OCR output tags and any stray tag tokens when present.

Parameters:
  • text (str) – Raw OCR response text.

  • output_tag (str) – Expected wrapper tag name.

Returns:

OCR text with the outer wrapper removed when present.

Return type:

str

churro_ocr.prompts.strip_rich_ocr_markup_to_plain_text(text)[source]#

Best-effort plain-text conversion for OCR markdown/HTML output.

Parameters:

text (str)

Return type:

str