Operational Feature Fingerprints of Graph Datasets via a White-Box Signal-Subspace Probe

2026-04-24Machine Learning

Machine Learning
AI summary

The authors introduce WG-SRC, a new method for understanding how graph neural networks classify nodes by replacing complex learned processes with clearer, fixed components based on raw features and graph signal processing. WG-SRC uses statistical tools like PCA and ridge classification to make predictions while providing interpretable insights about the role of different graph features such as low-pass smoothing and high-pass differences. Tested on six datasets, their method performs comparably to standard graph neural networks and offers detailed 'feature fingerprints' that explain dataset-specific behaviors. These fingerprints help guide further analysis and adjustments by showing when certain graph aspects, like noise or raw features, are most important.

Graph Neural NetworksNode ClassificationMessage PassingLow-pass PropagationHigh-pass Graph DifferencesPrincipal Component Analysis (PCA)Ridge ClassificationFisher Coordinate SelectionGraph Signal ProcessingWhite-box Model
Authors
Yuchen Xiong, Swee Keong Yeap, Zhen Hong Ban
Abstract
Graph neural networks achieve strong node-classification accuracy, but their learned message passing entangles ego attributes, neighborhood smoothing, high-pass graph differences, class geometry, and classifier boundaries in an opaque representation. This obscures why a node is classified and what feature-level graph-learning mechanisms a dataset requires. We propose WG-SRC, a white-box signal-subspace probe for prediction and graph dataset diagnosis. WG-SRC replaces learned message passing with a fixed, named graph-signal dictionary of raw features, row-normalized and symmetric-normalized low-pass propagation, and high-pass graph differences. It combines Fisher coordinate selection, class-wise PCA subspaces, closed-form multi-alpha ridge classification, and validation-based score fusion, so prediction and analysis use explicit class subspaces, energy-controlled dimensions, and closed-form linear decisions. As a white-box graph-learning instrument, WG-SRC uses predictive performance to validate its diagnostics: across six node-classification datasets, the scaffold remains competitive with reproduced graph baselines and achieves positive average gain under aligned splits. Its atlas, produced by a predictor, decomposes behavior into raw-feature, low-pass, high-pass, class-geometric, and ridge-boundary components. These operational feature fingerprints distinguish low-pass-dominated Amazon graphs, mixed high-pass and class-geometrically complex Chameleon behavior, and raw- or boundary-sensitive WebKB graphs. As intrinsic classifier outputs rather than post-hoc explanations, these fingerprints provide post-evaluation guidance for later analysis and dataset-specific modification. Aligned mechanistic interventions support this guidance by indicating when high-pass blocks act as removable noise, when raw features should be preserved, and when ridge-type boundary correction matters.