Bag of Bags: Adaptive Visual Vocabularies for Genizah Join Image Retrieval

2026-04-09Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors study how to find other pieces from the same ancient manuscript when given an image of one fragment. They introduce a new method called Bag of Bags (BoB), which looks at small parts of each fragment to create a special vocabulary unique to that fragment, rather than using a general set of visual features. Their method improves accuracy in matching fragments compared to older techniques and includes a way to balance speed and accuracy using two steps. They tested this on fragments from the Cairo Genizah and showed better results than previous methods.

manuscript join retrievalBag of Words (BoW)Bag of Bags (BoB)convolutional autoencoderlocal visual wordsk-means clusteringset-to-set distanceoptimal transportCairo Genizahimage retrieval
Authors
Sharva Gogawale, Gal Grudka, Daria Vasyutinsky-Shapira, Omer Ventura, Berat Kurar-Barakat, Nachum Dershowitz
Abstract
A join is a set of manuscript fragments identified as originally emanating from the same manuscript. We study manuscript join retrieval: Given a query image of a fragment, retrieve other fragments originating from the same physical manuscript. We propose Bag of Bags (BoB), an image-level representation that replaces the global-level visual codebook of classical Bag of Words (BoW) with a fragment-specific vocabulary of local visual words. Our pipeline trains a sparse convolutional autoencoder on binarized fragment patches, encodes connected components from each page, clusters the resulting embeddings with per image $k$-means, and compares images using set to set distances between their local vocabularies. Evaluated on fragments from the Cairo Genizah, the best BoB variant (viz.\@ Chamfer) achieves Hit@1 of 0.78 and MRR of 0.84, compared to 0.74 and 0.80, respectively, for the strongest BoW baseline (BoW-RawPatches-$χ^2$), a 6.1\% relative improvement in top-1 accuracy. We furthermore study a mass-weighted BoB-OT variant that incorporates cluster population into prototype matching and present a formal approximation guarantee bounding its deviation from full component-level optimal transport. A two-stage pipeline using a BoW shortlist followed by BoB-OT reranking provides a practical compromise between retrieval strength and computational cost, supporting applicability to larger manuscript collections.