churro_ocr.prompts
Public prompt defaults used by churro-ocr backends.
-
churro_ocr.prompts.parse_chandra_response(text)[source]
Extract plain text and metadata from a Chandra HTML-layout response.
- Parameters:
text (str)
- Return type:
tuple[str, MetadataDict]
-
churro_ocr.prompts.parse_olmocr_response(text)[source]
Extract plain text and metadata from an olmOCR YAML-front-matter response.
- Parameters:
text (str)
- Return type:
tuple[str, MetadataDict]
-
churro_ocr.prompts.strip_ocr_output_tag(text, *, output_tag=DEFAULT_OCR_OUTPUT_TAG)[source]
Remove outer OCR output tags and any stray tag tokens when present.
- Parameters:
-
- Returns:
OCR text with the outer wrapper removed when present.
- Return type:
str
-
churro_ocr.prompts.strip_rich_ocr_markup_to_plain_text(text)[source]
Best-effort plain-text conversion for OCR markdown/HTML output.
- Parameters:
text (str)
- Return type:
str