End-to-End Text Line Detection and Ordering

2026-06-02Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors created Orli, a new computer model that looks at a whole page of historical text and finds the lines of writing in the correct reading order all at once. Instead of breaking the problem into separate steps like line detection and then ordering, Orli predicts the lines directly from the image as a sequence. It uses a special way to describe each text line's shape and location, and it was trained on many pages in different scripts. The model performs well on tests without extra training on specific datasets and can adapt to new layouts with some fine-tuning.

layout analysisline detectionreading orderautogressive modeltext-line baselineschord-frame parameterizationhistorical documentszero-shot learningfine-tuningcBAD dataset
Authors
Benjamin Kiessling
Abstract
Practical text-recognition pipelines for historical documents typically decompose layout analysis into line detection followed by a separate reading-order step, with the latter most often handled by a hand-coded geometric heuristic that struggles with marginalia, multiple columns, tables, and source-specific editorial conventions. This article introduces Orli (Ordered Regression of Lines), an end-to-end model that casts both sub-tasks as a single image-to-sequence problem: from a page image, Orli autoregressively generates text-line baselines directly in reading order. Baselines are represented in a chord-frame parameterization that anchors a line's position, orientation, and extent while encoding local geometry through perpendicular offsets; an iterative refinement head and a local visual refiner produce the final curve. Trained on a heterogeneous corpus of 196,691 pages spanning ten writing systems, Orli marginally exceeds the previously reported state of the art for cBAD line detection without dataset-specific training, reaches near perfect coverage and ordering on multiple reading-order benchmarks zero-shot, and adapts to more specialized out-of-domain layouts with limited fine-tuning. The method's source code and model weights are available under an open license at https://github.com/mittagessen/orli.