This Week In Computer Science Papers

Week beginning 20th April 2026

Tap a tile to open details. Use the left sidebar to filter by category.

No filters applied
Showing 1–36 of 2346
Spend Less, Fit Better: Budget-Efficient Scaling Law Fitting via Active…
2026-04-24Machine Learningarxiv
Abstract
Scaling laws are used to plan multi-million-dollar training runs, but fitting those laws can itself cost millions. In modern large-scale workflows, assembling a sufficiently informative set of pilot experiments is already a major budget-allocation problem rather than a routine preprocessing step. We formulate scaling-law fitting as budget-aware sequential experimental design: given a finite pool of runnable experiments with heterogeneous costs, choose which runs to execute so as to maximize extrapolation accuracy in a high-cost target region. We then propose an uncertainty-aware method for sequentially allocating experimental budget toward the runs most useful for target-region extrapolation. Across a diverse benchmark of scaling-law tasks, our method consistently outperforms classical design-based baselines, and often approaches the performance of fitting on the full experimental set while using only about 10% of the total training budget. Our code is available at https://github.com/PlanarG/active-sl.
Open 2604.22753v1
How Do AI Agents Spend Your Money? Analyzing and Predicting Token Consu…
2026-04-24Computation and LanguageComputers and SocietyHuman-Computer Interactionarxiv
Abstract
The wide adoption of AI agents in complex human workflows is driving rapid growth in LLM token consumption. When agents are deployed on tasks that require a significant amount of tokens, three questions naturally arise: (1) Where do AI agents spend the tokens? (2) Which models are more token-efficient? and (3) Can agents predict their token usage before task execution? In this paper, we present the first systematic study of token consumption patterns in agentic coding tasks. We analyze trajectories from eight frontier LLMs on SWE-bench Verified and evaluate models' ability to predict their own token costs before task execution. We find that: (1) agentic tasks are uniquely expensive, consuming 1000x more tokens than code reasoning and code chat, with input tokens rather than output tokens driving the overall cost; (2) token usage is highly variable and inherently stochastic: runs on the same task can differ by up to 30x in total tokens, and higher token usage does not translate into higher accuracy; instead, accuracy often peaks at intermediate cost and saturates at higher costs; (3) models vary substantially in token efficiency: on the same tasks, Kimi-K2 and Claude-Sonnet-4.5, on average, consume over 1.5 million more tokens than GPT-5; (4) task difficulty rated by human experts only weakly aligns with actual token costs, revealing a fundamental gap between human-perceived complexity and the computational effort agents actually expend; and (5) frontier models fail to accurately predict their own token usage (with weak-to-moderate correlations, up to 0.39) and systematically underestimate real token costs. Our study offers new insights into the economics of AI agents and can inspire future research in this direction.
Open 2604.22750v1
Representational Harms in LLM-Generated Narratives Against Global Major…
2026-04-24Computation and Languagearxiv
Abstract
Large language models (LLMs) are increasingly used for text generation tasks from everyday use to high-stakes enterprise and government applications, including simulated interviews with asylum seekers. While many works highlight the new potential applications of LLMs, there are risks of LLMs encoding and perpetuating harmful biases about non-dominant communities across the globe. To better evaluate and mitigate such harms, more research examining how LLMs portray diverse individuals is needed. In this work, we study how national origin identities are portrayed by widely-adopted LLMs in response to open-ended narrative generation prompts. Our findings demonstrate the presence of persistent representational harms by national origin, including harmful stereotypes, erasure, and one-dimensional portrayals of Global Majority identities. Minoritized national identities are simultaneously underrepresented in power-neutral stories and overrepresented in subordinated character portrayals, which are over fifty times more likely to appear than dominant portrayals. The degree of harm is amplified when US nationality cues (e.g., ``American'') are present in input prompts. Notably, we find that the harms we identify cannot be explained away via sycophancy, as US-centric biases persist even when replacing US nationality cues with non-US national identities in the prompts. Based on our findings, we call for further exploration of cultural harms in LLMs through methodologies that center Global Majority perspectives and challenge the uncritical adoption of US-based LLMs for the classification, surveillance, and misrepresentation of the majority of our planet.
Open 2604.22749v1
Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond
2026-04-24Artificial Intelligencearxiv
Abstract
As AI systems move from generating text to accomplishing goals through sustained interaction, the ability to model environment dynamics becomes a central bottleneck. Agents that manipulate objects, navigate software, coordinate with others, or design experiments require predictive environment models, yet the term world model carries different meanings across research communities. We introduce a "levels x laws" taxonomy organized along two axes. The first defines three capability levels: L1 Predictor, which learns one-step local transition operators; L2 Simulator, which composes them into multi-step, action-conditioned rollouts that respect domain laws; and L3 Evolver, which autonomously revises its own model when predictions fail against new evidence. The second identifies four governing-law regimes: physical, digital, social, and scientific. These regimes determine what constraints a world model must satisfy and where it is most likely to fail. Using this framework, we synthesize over 400 works and summarize more than 100 representative systems spanning model-based reinforcement learning, video generation, web and GUI agents, multi-agent social simulation, and AI-driven scientific discovery. We analyze methods, failure modes, and evaluation practices across level-regime pairs, propose decision-centric evaluation principles and a minimal reproducible evaluation package, and outline architectural guidance, open problems, and governance challenges. The resulting roadmap connects previously isolated communities and charts a path from passive next-step prediction toward world models that can simulate, and ultimately reshape, the environments in which agents operate.
Open 2604.22748v1
Code for All: Educational Applications of the "Vibe Coding" Hackathon i…
2026-04-24Software Engineeringarxiv
Abstract
The emergence of large language models has enabled vibe coding, a natural language approach to programming in which users describe intent and AI generates or revises code, potentially broadening access to programming while preserving meaningful learning outcomes. We investigate its educational value through a month-long online hackathon that welcomed participants from multiple countries, ranging from complete beginners to experienced developers. The hackathon offered three tracks with increasing technical demands. Spark emphasized basic frontend functionality and dynamic features such as buttons, forms, and API calls. Build required backend or database integration. Launch targeted production ready web applications, including deployment. Participants were required to develop projects using only LLM generated code without manual edits and submitted complete chat histories, source code, demo videos, and functionality reports. We assessed educational effectiveness with a mixed methods design that combined standardized project evaluations across functionality, user interface and user experience design, impact, prompt quality, and code readability, along with post-hackathon surveys of perceived learning outcomes and thematic analysis of open-ended feedback. Our findings describe how participants with different backgrounds engage with vibe coding as task complexity increases, how the no manual editing constraint shapes prompting and debugging practices, and what these patterns imply for integrating AI assisted development into programming education and competitive learning environments.
Open 2604.22747v1
Relaxation-Informed Training of Neural Network Surrogate Models
2026-04-24Machine Learningarxiv
Abstract
ReLU neural networks trained as surrogate models can be embedded exactly in mixed-integer linear programs (MILPs), enabling global optimization over the learned function. The tractability of the resulting MILP depends on structural properties of the network, i.e., the number of binary variables in associated formulations and the tightness of the continuous LP relaxation. These properties are determined during training, yet standard training objectives (prediction loss with classical weight regularization) offer no mechanism to directly control them. This work studies training regularizers that directly target downstream MILP tractability. Specifically, we propose simple bound-based regularizers that penalize the big-M constants of MILP formulations and/or the number of unstable neurons. Moreover, we introduce an LP relaxation gap regularizer that explicitly penalizes the per-sample gap of the continuous relaxation at training points. We derive its associated gradient and provide an implementation from LP dual variables without custom automatic differentiation tools. We show that combining the above regularizers can approximate the full total derivative of the LP gap with respect to the network parameters, capturing both direct and indirect sensitivities. Experiments on non-convex benchmark functions and a two-stage stochastic programming problem with quantile neural network surrogates demonstrate that the proposed regularizers can reduce MILP solve times by up to four orders of magnitude relative to an unregularized baseline, while maintaining competitive surrogate model accuracy.
Open 2604.22746v1
Multiplex Hypergraph Modeling of Higher Order Structures in Psychometri…
2026-04-24Social and Information NetworksInformation Theoryarxiv
Abstract
Psychiatric disorders have been traditionally conceptualized as latent conditions producing observable symptoms, but recent studies suggest that psychopathology may emerge from symptoms interactions. Psychometric networking model these relations focusing on pairwise associations but overlooks higher-order dependencies arising among groups of variables. These dependencies may reflect synergistic mechanisms, where joint symptom configurations convey more information than pairwise relations, or redundancy, where information overlaps. We introduce an information-theoretic multiplex hypergraph framework to identify and compare higher-order interactions in eating disorders data, across diagnostic groups (e.g., anorexia nervosa). Higher-order structures are quantified using $Ω$-information, a measure that captures the balance between redundancy and synergy. To address the combinatorial growth of candidate subsets, multiple testing and estimation instability, we propose a structured pipeline comprising: (i) targeted candidate selection based on dyadic network topology and theory-driven subscale information; (ii) a three-stage inferential procedure combining null-model testing with bootstrap robustness assessment; and (iii) the construction and analysis of diagnosis-layered, synergistic and redundant multiplex hypergraphs. Results highlight how synergy captures the emergent, higher-order organization of diagnoses, revealing both a stable transdiagnostic core and diagnosis-specific ways in which these domains combine. By contrast, redundancy is confined to eating and body-image related content, marking reinforcement rather than broader symptom integration.
Open 2604.22744v1
Boolean PCSPs through the lens of Fourier Analysis
2026-04-24Computational Complexityarxiv
Abstract
We develop an analytical framework for Boolean Promise Constraint Satisfaction Problems (PCSPs) that studies polymorphisms through the notion of influence from Fourier analysis of Boolean functions. Extending the work of Brakensiek, Guruswami, and Sandeep [ICALP'21] on Ordered PCSPs, we identify two general phenomena in Boolean minions indicative of hardness or tractability: (1) preservation of coordinate influence under random 2-to-1 minors and (2) the presence of sharp thresholds. We demonstrate that these phenomena occur in broader settings than previously established, yielding new hardness/tractability results for minions consisting of unate or polynomial threshold functions.
Open 2604.22742v1
Minimax Optimal Procedures for Joint Detection and Estimation
2026-04-24Information Theoryarxiv
Abstract
We investigate the problem of jointly testing a pair of composite hypotheses and, depending on the test result, estimating a random parameter under distributional uncertainties. Specifically, it is assumed that the distribution of the data given the parameter of interest, is subject to uncertainty. Both, a Bayesian formulation and a Neyman-Pearson-like formulation, are considered. It is shown that the optimal policy induces an $f$-similarity that must be maximized to identify the least favorable distributions. Besides the general results, the implementation is investigated using a band-type uncertainty model. For designing the minimax procedures, existing algorithms are modified to increase convergence speed while maintaining numerical stability. The proposed theory is supplemented by numerical results for both formulations.
Open 2604.22740v1
Inter-Stance: A Dyadic Multimodal Corpus for Conversational Stance Anal…
2026-04-24Computer Vision and Pattern Recognitionarxiv
Abstract
Social interactions dominate our perceptions of the world and shape our daily behavior by attaching social meaning to acts as simple and spontaneous as gestures, facial expressions, voice, and speech. People mimic and otherwise respond to each other's postures, facial expressions, mannerisms, and other verbal and nonverbal behavior, and form appraisals or evaluations in the process. Yet, no publicly-available dataset includes multimodal recordings and self-report measures of multiple persons in social interaction. Dyadic recordings and annotation are lacking. We present a new data corpus of multimodal dyadic interaction (45 dyads, 90 persons) that includes synchronized multi-modality behavior (2D face video, 3D face geometry, thermal spectrum dynamics, voice and speech behavior, physiology (PPG, EDA, heart-rate, blood pressure, and respiration), and self-reported affect of all participants in a communicative interaction scenario. Two types of dyads are included: persons with shared past history and strangers. Annotations include social signals, agreement, disagreement, and neutral stance. With a potent emotion induction, these multimodal data will enable novel modeling of multimodal interpersonal behavior. We present extensive experiments to evaluate multimodal dyadic communication of dyads with and without interpersonal history, and their affect. This new database will make multimodal modeling of social interaction never possible before. The dataset includes 20TB of multimodal data to share with the research community.
Open 2604.22739v1
An Undecidability Proof for the Plan Existence Problem
2026-04-24Logic in Computer ScienceArtificial Intelligencearxiv
Abstract
The plan existence problem asks, given a goal in the form of a formula in modal logic, an initial epistemic state (a pointed Kripke model), and a set of epistemic actions, whether there exists a sequence of actions that can be applied to reach the goal. We prove that even in the case where the preconditions of the epistemic actions have modal depth at most 1, and there are no postconditions, the plan existence problem is undecidable. The (un)decidability of this problem was previously unknown.
Open 2604.22736v1
Neural Recovery of Historical Lexical Structure in Bantu Languages from…
2026-04-24Machine LearningComputation and Languagearxiv
Abstract
We investigate whether neural models trained exclusively on modern morphological data can recover cross-lingual lexical structure consistent with historical reconstruction. Using BantuMorph v7, a transformer over Bantu morphological paradigms, we analyze 14 Eastern and Southern Bantu languages, extract encoder embeddings for their noun and verb lemmas, and identify 728 noun and 1,525 verb cognate candidates shared across 5+ languages. Evaluating these candidates against established historical resources-the Bantu Lexical Reconstructions database (BLR3; 4,786 reconstructed Proto-Bantu forms) and the ASJP basic vocabulary-we confirm 10 of the top 11 noun candidates (90.9%) align with previously reconstructed Proto-Bantu forms, including *-ntU 'person' (8 languages), *gombe 'cow' (9 languages), and *mUn (9 languages). Extending to verbs, 12 verb cognates align with reconstructed Proto-Bantu roots, including *-bon- 'see' and *-jIm- 'stand', each attested across wide geographic ranges. Cross-model validation using an independent translation model (NLLB-600M) confirms these patterns: both models recover cognate clusters and phylogenetic groupings consistent with established Guthrie-zone classifications (p < 0.01). Cross-lingual noun class analysis reveals that all 13 productive classes maintain >0.83 cosine similarity across languages (within-class > between-class, p < 10^-9). Our dataset is restricted to Eastern and Southern Bantu, so we interpret these results as recovering shared Bantu lexical structure consistent with Proto-Bantu rather than definitively distinguishing Proto-Bantu retentions from later regional innovations.
Open 2604.22730v1
GCImOpt: Learning efficient goal-conditioned policies by imitating opti…
2026-04-24Roboticsarxiv
Abstract
Imitation learning is a well-established approach for machine-learning-based control. However, its applicability depends on having access to demonstrations, which are often expensive to collect and/or suboptimal for solving the task. In this work, we present GCImOpt, an approach to learn efficient goal-conditioned policies by training on datasets generated by trajectory optimization. Our approach for dataset generation is computationally efficient, can generate thousands of optimal trajectories in minutes on a laptop computer, and produces high-quality demonstrations. Further, by means of a data augmentation scheme that treats intermediate states as goals, we are able to increase the training dataset size by an order of magnitude. Using our generated datasets, we train goal-conditioned neural network policies that can control the system towards arbitrary goals. To demonstrate the generality of our approach, we generate datasets and then train policies for various control tasks, namely cart-pole stabilization, planar and three-dimensional quadcopter stabilization, and point reaching using a 6-DoF robot arm. We show that our trained policies can achieve high success rates and near-optimal control profiles, all while being small (less than 80,000 neural network parameters) and fast enough (up to more than 6,000 times faster than a trajectory optimization solver) that they could be deployed onboard resource-constrained controllers. We provide videos, code, datasets and pre-trained policies under a free software license; see our project website https://jongoiko.github.io/gcimopt/.
Open 2604.22724v1
Zero-Shot Morphological Discovery in Low-Resource Bantu Languages via C…
2026-04-24Machine LearningComputation and Languagearxiv
Abstract
We present a method for discovering morphological features in low-resource Bantu languages by combining cross-lingual transfer learning with unsupervised clustering. Applied to Giriama (nyf), a language with only 91 labeled paradigms, our pipeline discovers noun class assignments for 2,455 words and identifies two previously undocumented morphological patterns: an a- prefix variant for Class 2 (vowel coalescence - the merger of two adjacent vowels - of wa-, 95.1% consistency) and a contracted k'- prefix (98.5% consistency). External validation on 444 known Giriama verb paradigms confirms 78.2% lemmatization accuracy, while a v3 corpus expansion to 19,624 words (9,014 unique lemmas) achieves 97.3% segmentation and 86.7% lemmatization rates across all major word classes. Our ensemble of transfer learning from Swahili and unsupervised clustering, combined via weighted voting, exploits complementary strengths: transfer excels at cognate detection (leveraging ~60% vocabulary overlap) while clustering discovers language-specific innovations invisible to transfer. We release all code and discovered lexicons to support morphological documentation for low-resource Bantu languages.
Open 2604.22723v1
Aligning Dense Retrievers with LLM Utility via DistillationAligning Den…
2026-04-24Information RetrievalArtificial IntelligenceMachine Learningarxiv
Abstract
Dense vector retrieval is the practical backbone of Retrieval- Augmented Generation (RAG), but similarity search can suffer from precision limitations. Conversely, utility-based approaches leveraging LLM re-ranking often achieve superior performance but are computationally prohibitive and prone to noise inherent in perplexity estimation. We propose Utility-Aligned Embeddings (UAE), a framework designed to merge these advantages into a practical, high-performance retrieval method. We formulate retrieval as a distribution matching problem, training a bi-encoder to imitate a utility distribution derived from perplexity reduction using a Utility-Modulated InfoNCE objective. This approach injects graded utility signals directly into the embedding space without requiring test-time LLM inference. On the QASPER benchmark, UAE improves retrieval Recall@1 by 30.59%, MAP by 30.16% and Token F1 by 17.3% over the strong semantic baseline BGE-Base. Crucially, UAE is over 180x faster than the efficient LLM re-ranking methods preserving competitive performance, demonstrating that aligning retrieval with generative utility yields reliable contexts at scale.
Open 2604.22722v1
ATRS: Adaptive Trajectory Re-splitting via a Shared Neural Policy for P…
2026-04-24Roboticsarxiv
Abstract
Parallel trajectory optimization via the Alternating Direction Method of Multipliers (ADMM) has emerged as a scalable approach to long-horizon motion planning. However, existing frameworks typically decompose the problem into parallel subproblems based on a predefined fixed structure. Such structural rigidity often causes optimization stagnation in highly constrained regions, where a few lagging subproblems delay global convergence. A natural remedy is to adaptively re-split these stagnating segments online. Yet, deciding when, where, and how to split exceeds the capability of rule-based heuristics. To this end, we propose ATRS, a novel framework that embeds a shared Deep Reinforcement Learning policy into the parallel ADMM loop. We formulate this adaptive adjustment as a Multi-Agent Shared-Policy Markov Decision Process, where all trajectory segments act as homogeneous agents and share a unified neural policy network. This parameter-sharing architecture endows the system with size invariance, enabling it to handle dynamically changing segment counts during re-splitting and generalize to arbitrary trajectory lengths. Furthermore, our formulation inherently supports zero-shot generalization to unseen environments, as our network relies solely on the internal states of the numerical solver rather than on the geometric features of the environment. To ensure solver stability, a Confidence-Based Election mechanism selects only the most stagnating segment for re-splitting at each step. Extensive simulations demonstrate that ATRS accelerates convergence, reducing the number of iterations by up to 26.0% and the computation time by up to 19.1%. Real-world experiments further confirm its applicability to both large-scale offline global planning and real-time onboard replanning within 35 ms per cycle, with no sim-to-real degradation.
Open 2604.22715v1
Long-tail Internet photo reconstruction
2026-04-24Computer Vision and Pattern Recognitionarxiv
Abstract
Internet photo collections exhibit an extremely long-tailed distribution: a few famous landmarks are densely photographed and easily reconstructed in 3D, while most real-world sites are represented with sparse, noisy, uneven imagery beyond the capabilities of both classical and learned 3D methods. We believe that tackling this long-tail regime represents one of the next frontiers for 3D foundation models. Although reliable ground-truth 3D supervision from sparse scenes is challenging to acquire, we observe that it can be effectively simulated by sampling sparse subsets from well-reconstructed Internet landmarks. To this end, we introduce MegaDepth-X, a large dataset of 3D reconstructions with clean, dense depth, together with a strategy for sampling sets of training images that mimic camera distributions in long-tail scenes. Finetuning 3D foundation models with these components yields robust reconstructions under extreme sparsity, and also enables more reliable reconstruction in symmetric and repetitive scenes, while preserving generalization to standard, dense 3D benchmark datasets.
Open 2604.22714v1
Evaluation of the effects of 3GPP-specific beamforming and channel esti…
2026-04-24Networking and Internet Architecturearxiv
Abstract
Spatial domain exploitation through 3D beamforming serves as a critical technology enabler for performance enhancement in the Fifth Generation New Radio (5G NR) specification. This is realized at the gNodeB (gNB) through the integration of massive antenna element arrays that facilitates 3D spatial multiplexing. However, these systems with high-directional transmissions also represent a threat to incumbent services such as radar and satellites. These incumbents already operate in midband spectrum\textemdash{}including the 4.4-4.9 GHz and 7.125-7.4 GHz bands\textemdash{}that are currently being evaluated for future cellular deployments. Here, we present the first work that evaluates the transmitted Effective Isotropic Radiated Power (EIRP) of a gNB in 3D space, using the 3GPP Release-18 standard for FR-1 instead of theoretical analyses of beam nulling, which can be simplistic. We shed light on the problems requiring attention with the EIRP profile in 3D space for existing codebook designs predefined in 3GPP: i) interference from a gNB does not depend only on the worst-case beamforming direction, but on a variety of beamforming directions due to side-lobes; ii) advanced antenna systems (AAS) architecture and antenna port configurations play a crucial role in average 3D EIRP, which are implementation dependent, and iii) we introduce two beam nulling methods, which achieve a 11 dB power reduction toward a target direction, with 3.5-4.5 dB SNR loss in UE link performance at a 10^{-4} bit error rate (BER) across modulation schemes under ideal and practical channel estimation, a higher loss compared to predictions from theoretical analyses.
Open 2604.22710v1
Thinking Without Words: Efficient Latent Reasoning with Abstract Chain-…
2026-04-24Computation and Languagearxiv
Abstract
While long, explicit chains-of-thought (CoT) have proven effective on complex reasoning tasks, they are costly to generate during inference. Non-verbal reasoning methods have emerged with shorter generation lengths by leveraging continuous representations, yet their performance lags behind verbalized CoT. We propose $\textbf{Abstract Chain-of-Thought}$, a discrete latent reasoning post-training mechanism in which the language model produces a short sequence of tokens from a reserved vocabulary in lieu of a natural language CoT, before generating a response. To make previously unseen ''abstract'' tokens useful, we introduce a policy iteration-style warm-up loop that alternates between (i.) bottlenecking from a verbal CoT via masking and performing supervised fine-tuning, and (ii.) self-distillation by training the model to generate abstract tokens from the prompt alone via constrained decoding with the codebook. After warm-up, we optimize the generation of abstract sequences with warm-started reinforcement learning under constrained decoding. Abstract-CoT achieves up to $11.6\times$ fewer reasoning tokens while demonstrating comparable performance across mathematical reasoning, instruction-following, and multi-hop reasoning, and generalizes across language model families. We also find an emergent power law distribution over the abstract vocabulary, akin to those seen in natural language, that evolves across the training phases. Our findings highlight the potential for post-training latent reasoning mechanisms that enable efficient inference through a learned abstract reasoning language.
Open 2604.22709v1
Seeing the Whole Elephant: A Benchmark for Failure Attribution in LLM-b…
2026-04-24Multiagent Systemsarxiv
Abstract
Failure attribution, i.e., identifying the responsible agent and decisive step of a failure, is particularly challenging in LLM-based multi-agent systems (MAS) due to their natural-language reasoning, nondeterministic outputs, and intricate interaction dynamics. A reliable benchmark is therefore essential to guide and evaluate attribution techniques. Yet existing benchmarks rely on partially observable traces that capture only agent outputs, omitting the inputs and context that developers actually use when debugging. We argue that failure attribution should be studied under full execution observability, aligning with real-world developer-facing scenarios where complete traces, rather than only outputs, are accessible for diagnosis. To this end, we introduce TraceElephant, a benchmark designed for failure attribution with full execution traces and reproducible environments. We then systematically evaluate failure attribution techniques across various configurations. Specifically, full traces improve attribution accuracy by up to 76\% over a partial-observation counterpart, confirming that missing inputs obscure many failure causes. TraceElephant provides a foundation for follow-up failure attribution research, promoting evaluation practices that reflect real-world debugging and supporting the development of more transparent MASs.
Open 2604.22708v1
Generative Modeling of Neurodegenerative Brain Anatomy with 4D Longitud…
2026-04-24Computer Vision and Pattern Recognitionarxiv
Abstract
Understanding and predicting the progression of neurodegenerative diseases remains a major challenge in medical AI, with significant implications for early diagnosis, disease monitoring, and treatment planning. However, most available longitudinal neuroimaging datasets are temporally sparse with a few follow-up scans per subject. This scarcity of temporal data limits our ability to model and accurately capture the continuous anatomical changes related to disease progression in individual subjects. To address this problem, we propose a novel 4D (3DxT) diffusion-based generative framework that effectively models and synthesizes longitudinal brain anatomy over time, conditioned on available clinical variables such as health status, age, sex, and other relevant factors. Moreover, while most current approaches focus on manipulating image intensity or texture, our method explicitly learns the data distribution of topology-preserving spatiotemporal deformations to effectively capture the geometric changes of brain structures over time. This design enables the realistic generation of future anatomical states and the reconstruction of anatomically consistent disease trajectories, providing a more faithful representation of longitudinal brain changes. We validate our model through both synthetic sequence generation and downstream longitudinal disease classification, as well as brain segmentation. Experiments on two large-scale longitudinal neuroimage datasets demonstrate that our method outperforms state-of-the-art baselines in generating anatomically accurate, temporally consistent, and clinically meaningful brain trajectories. Our code is available on Github.
Open 2604.22700v1
Entrywise Low-Rank Approximation and Matrix $p \rightarrow q$ Norms via…
2026-04-24Data Structures and Algorithmsarxiv
Abstract
Given a matrix $A$, the goal of the entrywise low-rank approximation problem is to find $\operatorname{argmin} \|A-B\|_p$ over all rank-$k$ matrices $B$, where $\| \cdot \|_p$ is the entrywise $\ell_p$ norm. When $p = 2$ this well-studied problem is solved by the singular value decomposition, but for $p \neq 2$ the problem becomes computationally challenging. For every even $p > 2$ and every fixed $k$, we give the first polynomial-time approximation scheme for this problem, improving on the $(3 + \varepsilon)$ approximation of Ban, Bhattiprolu, Bringmann, Kolev, Lee, and Woodruff, the bi-criteria approximation of Woodruff and Yasuda, and the additive approximation scheme of Anderson, Bakshi, and Hopkins. Prior algorithmic approaches based on sketching and column selection, which yielded a polynomial-time approximation scheme in the $p < 2$ setting, face concrete barriers when $p > 2$. Instead, we use the Sherali-Adams hierarchy of convex programs, and in so doing establish a blueprint for how to use convex hierarchies to design polynomial-time approximation schemes for continuous optimization problems. We use the same algorithmic strategy to give a new family of additive approximation algorithms for matrix $p \rightarrow q$ norms, which are intimately related to small-set expansion and quantum information. In particular, we give the first nontrivial additive approximation algorithms in the regime $p < 2 < q$.
Open 2604.22699v1
RFID-Based Non-Biometric Classroom Attendance System: Proxy Attendance…
2026-04-24Computers and SocietyHuman-Computer Interactionarxiv
Abstract
Attendance tracking in educational institutions, when conducted through traditional methods, leads to structural problems that consume instruction time and threaten academic integrity. Attendance durations spanning several minutes in primary and secondary education and exceeding ten minutes in higher education, combined with the proxy attendance problem of signing on behalf of someone else, demonstrate the need for electronic systems. Most existing electronic solutions rely on biometric authentication, which raises legal and ethical risks under the European General Data Protection Regulation (GDPR), the Turkish Personal Data Protection Law (KVKK), and the United States Family Educational Rights and Privacy Act (FERPA). Systems using RFID alone provide no built-in safeguard against proxy attendance through card transfer. This study proposes a biometric-free IoT attendance system addressing both deficiencies. The prototype consists of an RFID module, RFID cards, weight sensors, a Bluetooth module, and an Arduino UNO microcontroller. After the student presents their RFID card, the weight sensor measurement is compared against a statistical reference range of 350 individuals (aged 18-22) compiled from three Kaggle datasets; no personal biometric data is recorded. A Python-based GUI performs student management, course tracking, and CSV-based reporting via Bluetooth. Qualitative tests in conditions close to a real classroom have shown that the RFID reading, weight verification, Bluetooth communication, and GUI modules operate in an integrated manner as expected. The proposed system offers a low-cost and reproducible solution that aims to reduce proxy attendance without storing biometric data.
Open 2604.22697v1
Time-Localized Parametric Decomposition of Respiratory Airflow for Sub-…
2026-04-24Machine Learningarxiv
Abstract
Respiratory airflow signals provide critical insight into breathing mechanics, yet conventional analysis methods remain limited in their ability to characterize the internal structure of individual breaths. Traditional approaches treat airflow as a quasi-periodic signal and rely on global descriptors such as tidal volume or peak flow, obscuring sub-breath events that reflect neuromuscular coordination and compensatory breathing strategies. This study introduces a parametric framework for decomposing inspiratory airflow into a small number of time-localized components with explicit amplitude, onset time, and duration parameters. Unlike spectral or data-adaptive methods, the proposed approach employs physiologically grounded basis functions, Half-Sine, Gaussian, and Beta, to represent intrabreath waveform morphology through constrained nonlinear optimization. Evaluation across 8,276 breaths demonstrates high reconstruction accuracy (mean squared error $<$ 0.001 for four-component models) and robust parameter precision under moderate noise. Component-derived features describing sub-breath timing and coordination improved classification of cognitive fatigue states arising from cognitive-respiratory competition by up to 30.7% in Matthews correlation coefficient compared with classical respiratory metrics. These results establish that modeling airflow as a sum of parameterized, time-localized primitives provides an interpretable and precise foundation for quantifying intrabreath organization, compensatory breathing dynamics, and respiratory motor control adaptation under cognitive-respiratory dual-task demands.
Open 2604.22695v1
CRAFT: Clustered Regression for Adaptive Filtering of Training data
2026-04-24Computation and LanguageArtificial Intelligencearxiv
Abstract
Selecting a small, high-quality subset from a large corpus for fine-tuning is increasingly important as corpora grow to tens of millions of datapoints, making full fine-tuning expensive and often unnecessary. We propose CRAFT (Clustered Regression for Adaptive Filtering of Training data), a vectorization-agnostic selection method for training sequence-to-sequence models. CRAFT decomposes the joint source-target distribution and performs a two-stage selection: (i) match the validation source distribution through proportional budget allocation across k-means clusters, and (ii) within each source cluster, select training pairs whose target embeddings minimize a conditional expected distance derived from the validation target distribution. We prove that proportional cluster allocation bounds the continuous KL divergence between selected and validation distributions, with the residual controlled by cluster diameters. We evaluate CRAFT on English-Hindi translation by selecting training data from 33 million NLLB sentence pairs and fine-tuning mBART via LoRA. CRAFT achieves 43.34 BLEU, outperforming TSDS (41.21) by 2.13 points on the same candidate pool and encoder while completing selection over 40 times faster. With TF-IDF vectorization, the entire pipeline completes in under one minute on CPU. TAROT achieves 45.61 BLEU, but CRAFT completes selection in 26.86 seconds versus TAROT's 75.6 seconds, a 2.8 time speedup.
Open 2604.22693v1
COMPASS: A Unified Decision-Intelligence System for Navigating Performa…
2026-04-24Performancearxiv
Abstract
HPC systems expose many configuration parameters that jointly drive competing objectives. Existing tools such as autotuners recommend good configurations but do not identify minimal changes for a near-miss configuration to meet a performance objective, and they often ignore domain-specific constraints. To address this gap, we introduce COMPASS -- a modular, programmable engine that uses operational traces to generate HPC configuration recommendations and guide tuning decisions. This paper: (1) formalizes configuration questions into query patterns; (2) develops an interactive decision-making engine that formulates these queries as Machine Learning (ML) tasks; (3) quantifies the trustworthiness of its recommendations by providing evidence and quantifying uncertainty, and -- when confidence is low -- provides guidance on which configurations to run next. We validate COMPASS using analytical ground truth, reconstruction accuracy, reproduction of published findings, and when possible, running on real hardware. When integrated with an open-source HPC scheduling simulator, COMPASS cuts average job turnaround time by 65.93% and node usage by 80.93% relative to the state-of-the-art. Moreover, COMPASS achieves up to 100x faster training and 80x faster inference than state-of-the-art generative methods, and scales to traces with 1.3B samples and 126GB of data.
Open 2604.22688v1
SS3D: End2End Self-Supervised 3D from Web Videos
2026-04-24Computer Vision and Pattern Recognitionarxiv
Abstract
We present SS3D, a web-scale SfM-based self-supervision pretraining pipeline for feed-forward 3D estimation from monocular video. Our model jointly predicts depth, ego-motion, and intrinsics in a single forward pass and is trained/evaluated as a coherent end-to-end 3D estimator. To stabilize joint learning, we use an intrinsics-first two-stage schedule and a unified single-checkpoint evaluation protocol. Scaling SfM self-supervision to unconstrained web video is challenging due to weak multi-view observability and strong corpus heterogeneity; we address these with a multi-view signal proxy (MVS) used for filtering and curriculum sampling, and with expert training distilled into a single student. Pretraining on YouTube-8M (~100M frames after filtering) yields strong cross-domain zero-shot transfer and improved fine-tuning performance over prior self-supervised baselines. We release the pretrained checkpoint and code.
Open 2604.22686v1
CosmicDancePro -- Measuring LEO satellite's orbital decay and network c…
2026-04-24Networking and Internet ArchitecturePerformancearxiv
Abstract
The May 2024 solar superstorm highlighted the vulnerability of rapidly expanding low Earth orbit (LEO) satellite networks to severe space weather events. To systematically evaluate LEO network resilience, we introduce an open-source tool, CosmicDancePro. It enables a comprehensive analysis of the effects of solar storms in the LEO satellite network. It integrates real-world multimodal datasets, including space weather measurements from several satellites, upper-atmospheric density conditions from data-driven and high-fidelity physics-based models, and LEO satellite trajectory and LEO network measurement traces to quantify orbital decay driven by enhanced atmospheric density and network connectivity degradation. We utilize CosmicDancePro to analyze the Starlink constellation's behavior during two recent major solar storms. First, we identify the specific fleet management strategies Starlink adopts during the May 2024 solar superstorm and how they differ from its regular orbit-correction strategy. Second, we identify the mechanisms driving the previously unexplained 'W'-shaped altitude variation pattern across orbital planes of LEO constellations. Finally, our network-layer analysis quantifies the connectivity degradation during these storms, revealing transient disruptions that include repetitive short-lived outages, reconfiguration latency spikes above 500 ms, up to 60% increase in uplink loss, distorted diurnal latency patterns, and a 10+ Mbps drop in end-user data rates during storm peaks.
Open 2604.22685v1
How Supply Chain Dependencies Complicate Bias Measurement and Accountab…
2026-04-24Computers and SocietyArtificial Intelligencearxiv
Abstract
The increasing adoption of AI systems in hiring has raised concerns about algorithmic bias and accountability, prompting regulatory responses including the EU AI Act, NYC Local Law 144, and Colorado's AI Act. While existing research examines bias through technical or regulatory lenses, both perspectives overlook a fundamental challenge: modern AI hiring systems operate within complex supply chains where responsibility fragments across data vendors, model developers, platform providers, and deploying organizations. This paper investigates how these dependency chains complicate bias evaluation and accountability attribution. Drawing on literature review and regulatory analysis, we demonstrate that fragmented responsibilities create two critical problems. First, bias emerges from component interactions rather than isolated elements, yet proprietary configurations prevent integrated evaluation. A resume parser may function without bias independently but contribute to discrimination when integrated with specific ranking algorithms and filtering thresholds. Second, information asymmetries mean deploying organizations bear legal responsibility without technical visibility into vendor-supplied algorithms, while vendors control implementations without meaningful disclosure requirements. Each stakeholder may believe they are compliant; nevertheless, the integrated system may produce biased outcomes. Analysis of implementation ambiguities reveals these challenges in practice. We propose multi-layered interventions including system-level audits, vendor guidelines, continuous monitoring mechanisms, and documentation across dependency chains. Our findings reveal that effective governance requires coordinated action across technical, organizational, and regulatory domains to establish meaningful accountability in distributed development environments.
Open 2604.22679v1
BERAG: Bayesian Ensemble Retrieval-Augmented Generation for Knowledge-b…
2026-04-24Computation and Languagearxiv
Abstract
A common approach to question answering with retrieval-augmented generation (RAG) is to concatenate documents into a single context and pass it to a language model to generate an answer. While simple, this strategy can obscure the contribution of individual documents, making attribution difficult and contributing to the ``lost-in-the-middle'' effect, where relevant information in long contexts is overlooked. Concatenation also scales poorly: computational cost grows quadratically with context length, a problem that becomes especially severe when the context includes visual data, as in visual question answering. Attempts to mitigate these issues by limiting context length can further restrict performance by preventing models from benefiting from the improved recall offered by deeper retrieval. We propose Bayesian Ensemble Retrieval-Augmented Generation (BERAG), along with Bayesian Ensemble Fine-Tuning (BEFT), as a RAG framework in which language models are conditioned on individual retrieved documents rather than a single combined context. BERAG treats document posterior probabilities as ensemble weights and updates them token by token using Bayes' rule during generation. This approach enables probabilistic re-ranking, parallel memory usage, and clear attribution of document contribution, making it well-suited for large document collections. We evaluate BERAG and BEFT primarily on knowledge-based visual question answering tasks, where models must reason over long, imperfect retrieval lists. The results show substantial improvements over standard RAG, including strong gains on Document Visual Question Answering and multimodal needle-in-a-haystack benchmarks. We also demonstrate that BERAG mitigates the ``lost-in-the-middle'' effect. The document posterior can be used to detect insufficient grounding and trigger deflection, while document pruning enables faster decoding than standard RAG.
Open 2604.22678v1
Operational Feature Fingerprints of Graph Datasets via a White-Box Sign…
2026-04-24Machine Learningarxiv
Abstract
Graph neural networks achieve strong node-classification accuracy, but their learned message passing entangles ego attributes, neighborhood smoothing, high-pass graph differences, class geometry, and classifier boundaries in an opaque representation. This obscures why a node is classified and what feature-level graph-learning mechanisms a dataset requires. We propose WG-SRC, a white-box signal-subspace probe for prediction and graph dataset diagnosis. WG-SRC replaces learned message passing with a fixed, named graph-signal dictionary of raw features, row-normalized and symmetric-normalized low-pass propagation, and high-pass graph differences. It combines Fisher coordinate selection, class-wise PCA subspaces, closed-form multi-alpha ridge classification, and validation-based score fusion, so prediction and analysis use explicit class subspaces, energy-controlled dimensions, and closed-form linear decisions. As a white-box graph-learning instrument, WG-SRC uses predictive performance to validate its diagnostics: across six node-classification datasets, the scaffold remains competitive with reproduced graph baselines and achieves positive average gain under aligned splits. Its atlas, produced by a predictor, decomposes behavior into raw-feature, low-pass, high-pass, class-geometric, and ridge-boundary components. These operational feature fingerprints distinguish low-pass-dominated Amazon graphs, mixed high-pass and class-geometrically complex Chameleon behavior, and raw- or boundary-sensitive WebKB graphs. As intrinsic classifier outputs rather than post-hoc explanations, these fingerprints provide post-evaluation guidance for later analysis and dataset-specific modification. Aligned mechanistic interventions support this guidance by indicating when high-pass blocks act as removable noise, when raw features should be preserved, and when ridge-type boundary correction matters.
Open 2604.22676v1
Measuring Epistemic Unfairness for Algorithmic Decision-Making
2026-04-24Social and Information Networksarxiv
Abstract
Algorithmic systems increasingly function as epistemic infrastructures that govern the conditions of interpretative access and social belief. Yet, mainstream auditing strategies operationalize fairness primarily in predictive terms - error rates, calibration, or group-level parity - leaving epistemic harms under-theorized and under-measured. We propose a quantitative framework for evaluating forms of epistemic injustice in algorithmic environments. First, we introduce a deficit-based template that models epistemic injustices as gaps between ideal and realized conditions across features such as credibility, uptake, and epistemic agency. We map these deficits to concrete stages of algorithmic mediation, showing how epistemic injustice can persist even when standard fairness constraints are satisfied. Drawing on distributive fairness indices, we distinguish two evaluation stances: resource inequality, where indices are applied to distributions of epistemic goods directly, and capability/rights inequity, where indices are applied to output-induced epistemic opportunity. We provide an epistemic translation of canonical indices, illustrating how they diagnose complementary signatures of unfairness - such as exclusionary tails and hierarchical concentration - and support longitudinal auditing under iterative deployment. We also provide a simulation study of a recommender-mediated opinion dynamics setting, showing how the proposed indices capture the evolution of epistemic unfairness under repeated platform interventions. The result is a measurement framework that makes the epistemic dimension of algorithmic harms explicit for system design and evaluation.
Open 2604.22675v1
Inferring Equivalence Classes from Legacy Undocumented Embedded Binarie…
2026-04-24Software EngineeringSymbolic Computationarxiv
Abstract
Equivalence class partitioning is a well-established test design technique mandated by safety standards such as ISO~26262 for systematic testing of safety software. In industrial practice, however, its application to legacy undocumented embedded firmware is often hindered by incomplete or outdated functional specifications. This paper proposes a binary-level methodology for inferring output-oriented equivalence classes directly from compiled firmware, without relying on source-level annotations or external documentation. The approach combines control-flow reconstruction and guided symbolic execution to analyze individual functions and group execution paths according to indistinguishable observable behavior, including return values and output parameters. An optional post-processing step produces human-readable representations to support comprehension and documentation. The methodology is evaluated in an industrial automotive context through a practitioner-based study assessing correctness and interpretability. Results indicate strong alignment with expert expectations and a positive perception of readability and usefulness for supporting function understanding and test design. These findings demonstrate the feasibility and practical relevance of binary-level equivalence class inference for systematic testing of legacy undocumented safety-embedded software.
Open 2604.22673v1
Iterative Model-Learning Scheme via Gaussian Processes for Nonlinear Mo…
2026-04-24Machine Learningarxiv
Abstract
Batch processes are inherently transient and typically nonlinear, motivating nonlinear model predictive control (NMPC). However, adopting NMPC is hindered by the cost and unavailability of dynamic models. Thus, we propose to use Gaussian Processes (GP) in a model-learning NMPC scheme (GP-MLMPC) for batch processes. We initialize the GP-MLMPC using data from a single initial trajectory, e.g., from a PI controller. We iteratively apply the NMPC embedded with GPs to run batches and update the GP with new observations from each iteration, thereby achieving batch-wise improvements. Using uncertainty quantification from the GPs, we formulate chance constraints to enforce safe operation to the required confidence levels. We demonstrate our approach in \textit{silico} on a semi-batch polymerization reactor for tracking and economic objectives over durations of two hours, and the reactor temperature is constrained in a range of $\pm2^\circ C$ around its setpoint. After only four batch iterations, tracking error from the GP-MLMPC scheme converged to a reduction of $83\%$, compared to the initial trajectory. Furthermore, under an economic objective, the GP-MLMPC resulted in a 17-fold increase in final product mass by iteration 8, compared to the initial trajectory. In both cases, the resulting GP-MLMPC performance is on par with the full-model NMPC, which shows that the optimal controller can be learned by the approach. By collecting samples around the optimal trajectory, the GP-MLMPC remains sample-efficient across iterations and achieves quick convergence. Thus, the proposed GP-MLMPC scheme presents a promising data-efficient approach for the control of nonlinear batch processes without mechanistic knowledge.
Open 2604.22672v1
Cuts and Gauges for Submodular Width
2026-04-24Data Structures and AlgorithmsDatabasesDiscrete Mathematicsarxiv
Abstract
Submodular width is a central structural measure governing the complexity of conjunctive query evaluation. In this paper we recast submodular width in geometric terms. We how that submodular width can be approximated, up to a factor $3/2$, by a new branchwidth parameter defined in terms of edge separations in the hypergraph and the costs induced on them by admissible submodular functions. This reformulation turns lower bounds on submodular width into the problem of constructing well-balanced edge separations whose induced cost remains small. We then express this connection through a variational characterisation in terms of a convex body. Using these tools, we relate submodular width to more familiar graph-theoretic notions, including line-graph treewidth and multicommodity flow, and obtain general conditions under which submodular width is tightly linked to generalised hypertree width. In particular, under various natural conditions we show that \[ subw(H) \in Ω\left(\frac{ghw(H)}{\log ghw(H)} \right). \]
Open 2604.22663v1
Rethinking XAI Evaluation: A Human-Centered Audit of Shapley Benchmarks…
2026-04-24Machine LearningArtificial IntelligenceHuman-Computer Interactionarxiv
Abstract
Shapley values are a cornerstone of explainable AI, yet their proliferation into competing formulations has created a fragmented landscape with little consensus on practical deployment. While theoretical differences are well-documented, evaluation remains reliant on quantitative proxies whose alignment with human utility is unverified. In this work, we use a unified amortized framework to isolate semantic differences between eight Shapley variants under the low-latency constraints of operational risk workflows. We conduct a large-scale empirical evaluation across four risk datasets and a realistic fraud-detection environment involving professional analysts and 3,735 case reviews. Our results reveal a fundamental misalignment: standard quantitative metrics, such as sparsity and faithfulness, are decoupled from human-perceived clarity and decision utility. Furthermore, while no formulation improved objective analyst performance, explanations consistently increased decision confidence, signaling a critical risk of automation bias in high-stakes settings. These findings suggest that current evaluation proxies are insufficient for predicting downstream human impact, and we provide evidence-based guidance for selecting formulations and metrics in operational decision systems.
Open 2604.22662v1