QPredSGG: Hybrid Quantum Predicate Learning for Long-Tailed Scene Graph Generation

2026-06-03Machine Learning

Machine Learning
AI summary

The authors address the problem of recognizing relationships between objects in images, which is difficult because some relationship types appear very rarely. They replace a large classical prediction module in an existing model with a smaller quantum-based classifier that uses quantum computing techniques to process features more compactly. Their experiments show that this hybrid quantum classifier improves performance on rare relationship types while using far fewer parameters. They also analyze how different quantum design choices affect the results. Overall, the study suggests that quantum methods can help make visual relationship classification more efficient.

Scene Graph GenerationRelational ReasoningLong-Tail ImbalanceQuantum ComputingPredicate ClassificationAmplitude EmbeddingEntanglementCross-Entropy LossVisual Genome DatasetHybrid Quantum-Classical Models
Authors
Prerana Ramkumar, Nouhaila Innan, Muhammad Shafique
Abstract
Scene Graph Generation (SGG) requires relational reasoning over objects and their interactions, but performance is often limited by severe long-tail predicate imbalance. Classical SGG models frequently rely on dataset statistics, leading to biased predictions toward frequent relations rather than fine-grained semantic predicates. Although existing debiasing strategies improve mean recall, predicate classification in current frameworks still often depends on large classical decision modules with high parameter cost. This work introduces a hybrid quantum predicate classifier for SGG by replacing the classical predicate head in Causal Feature Enhancement Network (CFEN) with a Quantum Predicate Head (QP-Head) trained using weighted cross-entropy. To the best of our knowledge, this is among the first studies to evaluate a hybrid quantum architecture for scene graph predicate classification on Visual Genome 150. We study the effect of qubit count, encoding strategy, entangling structure, and circuit depth on relational prediction. The best 4-qubit QP-Head uses Amplitude Embedding and Strongly Entangling Layers to compress 4096-dimensional pair features into a 16-dimensional quantum-compatible representation, corresponding to a 256$\times$ reduction. It achieves an mR@100 of 57.25%, compared with 41.1% for the classical CFEN reference, while using only 96 trainable quantum parameters. Scaling to 8 qubits maintains strong long-tail performance, reaching an mR@100 of 55.38% with 384 quantum parameters, while the depth analysis shows a trade-off between expressibility and runtime overhead. These results suggest that compact hybrid quantum predicate heads can support parameter-efficient long-tail relational classification in complex visual reasoning tasks.