Text-attributed Graph Condensation via Text Selection and Attribute Matching
2026-06-02 • Machine Learning
Machine Learning
AI summaryⓘ
The authors focus on Text-Attributed Graphs (TAGs), where each node has text, and find existing models slow and space-heavy. They introduce TAGSAM, a method to shrink these graphs by smartly picking important text parts and matching graph attributes to keep key info. TAGSAM improves training speed and still keeps accuracy, even when the graph is greatly reduced in size. Their tests show it works better than other recent methods.
Text-Attributed GraphGraph Neural NetworkLanguage ModelGraph CondensationSubgraph SelectionMutual InformationAttribute SimilarityTraining TrajectoriesModel Compression
Authors
Haowei Han, Yuxiang Wang, Guojia Wan, Hao Wang, Shanshan Feng, Hao Huang, Jiawei Jiang, Xiao Yan
Abstract
Text-Attributed Graph (TAG) is an important type of graph structured data, where each node has a text description. TAG models usually train a Graph Neural Network (GNN) and language model jointly, which leads to high space and time consumption, especially on large datasets. To mitigate this, we propose TAGSAM, a condensation method that compresses TAGs while preserving training accuracy. TAGSAM comes with two key designs, i.e., subgraph text Selection and Attribute similarity Matching, which compress the text description and graph topology of TAG, respectively. For the texts, subgraph text selection selects and merges representative text chunks from multiple related text descriptions by maximizing mutual information. For the graph topology, popular condensation methods based on Matching Training Trajectories (MTT) suffer from high variance, which hinders accuracy. Our attribute similarity matching mitigates this issue by aligning stable similarity matrices. We evaluate TAGSAM against six state-of-the-art baselines, where it showcases superior performance. For the same compressed size, TAGSAM improves upon the best-performing baseline by an average of 4.9% in accuracy. Furthermore, it maintains competitive training accuracy even when the TAG is condensed to just 1% size. Our code is available at https://github.com/SundayVHan/TAGSAM