The Character Error Vector: Decomposable errors for page-level OCR evaluation

2026-04-07Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionMachine Learning
AI summary

The authors explain that the usual way to measure OCR quality, called Character Error Rate (CER), doesn't work well when the text layout isn't perfectly understood. They created a new method called Character Error Vector (CEV) that breaks down errors into different types, helping to identify if mistakes come from reading the text itself or from parsing the page layout. They tested CEV on difficult newspaper images and found it more useful for evaluating OCR on whole pages than older methods. The authors also made the CEV tool available in Python to help others working on document understanding.

Optical Character RecognitionCharacter Error RateDocument ParsingError MetricsPage Layout AnalysisJensen-Shannon DistanceEnd-to-End ModelsPipeline ApproachesArchival NewspapersPython Library
Authors
Jonathan Bourne, Mwiza Simbeye, Joseph Nockels
Abstract
The Character Error Rate (CER) is a key metric for evaluating the quality of Optical Character Recognition (OCR). However, this metric assumes that text has been perfectly parsed, which is often not the case. Under page-parsing errors, CER becomes undefined, limiting its use as a metric and making evaluating page-level OCR challenging, particularly when using data that do not share a labelling schema. We introduce the Character Error Vector (CEV), a bag-of-characters evaluator for OCR. The CEV can be decomposed into parsing and OCR, and interaction error components. This decomposability allows practitioners to focus on the part of the Document Understanding pipeline that will have the greatest impact on overall text extraction quality. The CEV can be implemented using a variety of methods, of which we demonstrate SpACER (Spatially Aware Character Error Rate) and a Character distribution method using the Jensen-Shannon Distance. We validate the CEV's performance against other metrics: first, the relationship with CER; then, parse quality; and finally, as a direct measure of page-level OCR quality. The validation process shows that the CEV is a valuable bridge between parsing metrics and local metrics like CER. We analyse a dataset of archival newspapers made of degraded images with complex layouts and find that state-of-the-art end-to-end models are outperformed by more traditional pipeline approaches. Whilst the CEV requires character-level positioning for optimal triage, thresholding on easily available values can predict the main error source with an F1 of 0.91. We provide the CEV as part of a Python library to support Document understanding research.