This Week In Computer Science Papers

Week beginning 2nd February 2026

Tap a tile to open details. Use the left sidebar to filter by category.

No filters applied
Showing 1–36 of 600
Reinforced Attention Learning
2026-02-04Computation and LanguageComputer Vision and Pattern RecognitionMachine Learningarxiv
Abstract
Post-training with Reinforcement Learning (RL) has substantially improved reasoning in Large Language Models (LLMs) via test-time scaling. However, extending this paradigm to Multimodal LLMs (MLLMs) through verbose rationales yields limited gains for perception and can even degrade performance. We propose Reinforced Attention Learning (RAL), a policy-gradient framework that directly optimizes internal attention distributions rather than output token sequences. By shifting optimization from what to generate to where to attend, RAL promotes effective information allocation and improved grounding in complex multimodal inputs. Experiments across diverse image and video benchmarks show consistent gains over GRPO and other baselines. We further introduce On-Policy Attention Distillation, demonstrating that transferring latent attention behaviors yields stronger cross-modal alignment than standard knowledge distillation. Our results position attention policies as a principled and general alternative for multimodal post-training.
Open 2602.04884v1
Protein Autoregressive Modeling via Multiscale Structure Generation
2026-02-04Machine LearningArtificial Intelligencearxiv
Abstract
We present protein autoregressive modeling (PAR), the first multi-scale autoregressive framework for protein backbone generation via coarse-to-fine next-scale prediction. Using the hierarchical nature of proteins, PAR generates structures that mimic sculpting a statue, forming a coarse topology and refining structural details over scales. To achieve this, PAR consists of three key components: (i) multi-scale downsampling operations that represent protein structures across multiple scales during training; (ii) an autoregressive transformer that encodes multi-scale information and produces conditional embeddings to guide structure generation; (iii) a flow-based backbone decoder that generates backbone atoms conditioned on these embeddings. Moreover, autoregressive models suffer from exposure bias, caused by the training and the generation procedure mismatch, and substantially degrades structure generation quality. We effectively alleviate this issue by adopting noisy context learning and scheduled sampling, enabling robust backbone generation. Notably, PAR exhibits strong zero-shot generalization, supporting flexible human-prompted conditional generation and motif scaffolding without requiring fine-tuning. On the unconditional generation benchmark, PAR effectively learns protein distributions and produces backbones of high design quality, and exhibits favorable scaling behavior. Together, these properties establish PAR as a promising framework for protein structure generation.
Open 2602.04883v1
Contrastive Continual Learning for Model Adaptability in Internet of Th…
2026-02-04Machine LearningArtificial Intelligencearxiv
Abstract
Internet of Things (IoT) deployments operate in nonstationary, dynamic environments where factors such as sensor drift, evolving user behavior, and heterogeneous user privacy requirements can affect application utility. Continual learning (CL) addresses this by adapting models over time without catastrophic forgetting. Meanwhile, contrastive learning has emerged as a powerful representation-learning paradigm that improves robustness and sample efficiency in a self-supervised manner. This paper reviews the usage of \emph{contrastive continual learning} (CCL) for IoT, connecting algorithmic design (replay, regularization, distillation, prompts) with IoT system realities (TinyML constraints, intermittent connectivity, privacy). We present a unifying problem formulation, derive common objectives that blend contrastive and distillation losses, propose an IoT-oriented reference architecture for on-device, edge, and cloud-based CCL, and provide guidance on evaluation protocols and metrics. Finally, we highlight open unique challenges with respect to the IoT domain, such as spanning tabular and streaming IoT data, concept drift, federated settings, and energy-aware training.
Open 2602.04881v1
Capturing Visual Environment Structure Correlates with Control Performa…
2026-02-04Roboticsarxiv
Abstract
The choice of visual representation is key to scaling generalist robot policies. However, direct evaluation via policy rollouts is expensive, even in simulation. Existing proxy metrics focus on the representation's capacity to capture narrow aspects of the visual world, like object shape, limiting generalization across environments. In this paper, we take an analytical perspective: we probe pretrained visual encoders by measuring how well they support decoding of environment state -- including geometry, object structure, and physical attributes -- from images. Leveraging simulation environments with access to ground-truth state, we show that this probing accuracy strongly correlates with downstream policy performance across diverse environments and learning settings, significantly outperforming prior metrics and enabling efficient representation selection. More broadly, our study provides insight into the representational properties that support generalizable manipulation, suggesting that learning to encode the latent physical state of the environment is a promising objective for control.
Open 2602.04880v1
Rethinking the Trust Region in LLM Reinforcement Learning
2026-02-04Machine LearningArtificial IntelligenceComputation and Languagearxiv
Abstract
Reinforcement learning (RL) has become a cornerstone for fine-tuning Large Language Models (LLMs), with Proximal Policy Optimization (PPO) serving as the de facto standard algorithm. Despite its ubiquity, we argue that the core ratio clipping mechanism in PPO is structurally ill-suited for the large vocabularies inherent to LLMs. PPO constrains policy updates based on the probability ratio of sampled tokens, which serves as a noisy single-sample Monte Carlo estimate of the true policy divergence. This creates a sub-optimal learning dynamic: updates to low-probability tokens are aggressively over-penalized, while potentially catastrophic shifts in high-probability tokens are under-constrained, leading to training inefficiency and instability. To address this, we propose Divergence Proximal Policy Optimization (DPPO), which substitutes heuristic clipping with a more principled constraint based on a direct estimate of policy divergence (e.g., Total Variation or KL). To avoid huge memory footprint, we introduce the efficient Binary and Top-K approximations to capture the essential divergence with negligible overhead. Extensive empirical evaluations demonstrate that DPPO achieves superior training stability and efficiency compared to existing methods, offering a more robust foundation for RL-based LLM fine-tuning.
Open 2602.04879v1
CoWTracker: Tracking by Warping instead of Correlation
2026-02-04Computer Vision and Pattern Recognitionarxiv
Abstract
Dense point tracking is a fundamental problem in computer vision, with applications ranging from video analysis to robotic manipulation. State-of-the-art trackers typically rely on cost volumes to match features across frames, but this approach incurs quadratic complexity in spatial resolution, limiting scalability and efficiency. In this paper, we propose \method, a novel dense point tracker that eschews cost volumes in favor of warping. Inspired by recent advances in optical flow, our approach iteratively refines track estimates by warping features from the target frame to the query frame based on the current estimate. Combined with a transformer architecture that performs joint spatiotemporal reasoning across all tracks, our design establishes long-range correspondences without computing feature correlations. Our model is simple and achieves state-of-the-art performance on standard dense point tracking benchmarks, including TAP-Vid-DAVIS, TAP-Vid-Kinetics, and Robo-TAP. Remarkably, the model also excels at optical flow, sometimes outperforming specialized methods on the Sintel, KITTI, and Spring benchmarks. These results suggest that warping-based architectures can unify dense point tracking and optical flow estimation.
Open 2602.04877v1
PerpetualWonder: Long-Horizon Action-Conditioned 4D Scene Generation
2026-02-04Computer Vision and Pattern Recognitionarxiv
Abstract
We introduce PerpetualWonder, a hybrid generative simulator that enables long-horizon, action-conditioned 4D scene generation from a single image. Current works fail at this task because their physical state is decoupled from their visual representation, which prevents generative refinements to update the underlying physics for subsequent interactions. PerpetualWonder solves this by introducing the first true closed-loop system. It features a novel unified representation that creates a bidirectional link between the physical state and visual primitives, allowing generative refinements to correct both the dynamics and appearance. It also introduces a robust update mechanism that gathers supervision from multiple viewpoints to resolve optimization ambiguity. Experiments demonstrate that from a single image, PerpetualWonder can successfully simulate complex, multi-step interactions from long-horizon actions, maintaining physical plausibility and visual consistency.
Open 2602.04876v1
Laminating Representation Autoencoders for Efficient Diffusion
2026-02-04Computer Vision and Pattern Recognitionarxiv
Abstract
Recent work has shown that diffusion models can generate high-quality images by operating directly on SSL patch features rather than pixel-space latents. However, the dense patch grids from encoders like DINOv2 contain significant redundancy, making diffusion needlessly expensive. We introduce FlatDINO, a variational autoencoder that compresses this representation into a one-dimensional sequence of just 32 continuous tokens -an 8x reduction in sequence length and 48x compression in total dimensionality. On ImageNet 256x256, a DiT-XL trained on FlatDINO latents achieves a gFID of 1.80 with classifier-free guidance while requiring 8x fewer FLOPs per forward pass and up to 4.5x fewer FLOPs per training step compared to diffusion on uncompressed DINOv2 features. These are preliminary results and this work is in progress.
Open 2602.04873v1
Multi-layer Cross-Attention is Provably Optimal for Multi-modal In-cont…
2026-02-04Artificial IntelligenceMachine Learningarxiv
Abstract
Recent progress has rapidly advanced our understanding of the mechanisms underlying in-context learning in modern attention-based neural networks. However, existing results focus exclusively on unimodal data; in contrast, the theoretical underpinnings of in-context learning for multi-modal data remain poorly understood. We introduce a mathematically tractable framework for studying multi-modal learning and explore when transformer-like architectures can recover Bayes-optimal performance in-context. To model multi-modal problems, we assume the observed data arises from a latent factor model. Our first result comprises a negative take on expressibility: we prove that single-layer, linear self-attention fails to recover the Bayes-optimal predictor uniformly over the task distribution. To address this limitation, we introduce a novel, linearized cross-attention mechanism, which we study in the regime where both the number of cross-attention layers and the context length are large. We show that this cross-attention mechanism is provably Bayes optimal when optimized using gradient flow. Our results underscore the benefits of depth for in-context learning and establish the provable utility of cross-attention for multi-modal distributions.
Open 2602.04872v1
Evolving scientific collaboration among EU member states, candidate cou…
2026-02-04Digital Librariesarxiv
Abstract
This study explores how EU integration, globalisation, and geopolitical disruptions have influenced scientific collaboration among European countries at different stages of EU membership. Specifically, it distinguishes between the EU-14, the EU-13, that joined the EU in 2004 or later, and EU candidate countries. Using Scopus article, the study analyses Relative Intensity of Collaboration (RIC) among EU member state, candidate countries and China, Latin America, the UK, the USA and Russia. Findings indicate increasing integration within European groups and with global partners, yet persistent hierarchical structures remain. EU-14 countries form the core of the network, exhibiting stable and cohesive collaboration, including with the UK despite Brexit. EU-13 countries occupy an intermediate position, showing moderate collaboration with EU-14 but stronger collaboration within their own group, with EU candidate countries and Russia. EU candidate countries demonstrate even weaker integration with EU-14, focusing on intra-group ties and links with EU-13 and Russia. RIC peaks in 2012 and 2018 for EU-13 and EU candidate countries correspond to Horizon 2020 and Horizon Europe cycles, highlighting the role of EU Framework Programmes. Collaboration with Russia increased following 2014 and only marginally declined after 2022. For EU-14, it exceeds collaboration with the USA. Collaboration with China remains limited due to network and cultural constraints, with similar intensity across all three groups. Overall, funding and policy initiatives are critical for stable international collaboration.
Open 2602.04871v1
Multi-Head LatentMoE and Head Parallel: Communication-Efficient and Det…
2026-02-04Machine Learningarxiv
Abstract
Large language models have transformed many applications but remain expensive to train. Sparse Mixture of Experts (MoE) addresses this through conditional computation, with Expert Parallel (EP) as the standard distributed training method. However, EP has three limitations: communication cost grows linearly with the number of activated experts $k$, load imbalance affects latency and memory usage, and data-dependent communication requires metadata exchange. We propose Multi-Head LatentMoE and Head Parallel (HP), a new architecture and parallelism achieving $O(1)$ communication cost regardless of $k$, completely balanced traffic, and deterministic communication, all while remaining compatible with EP. To accelerate Multi-Head LatentMoE, we propose IO-aware routing and expert computation. Compared to MoE with EP, Multi-Head LatentMoE with HP trains up to $1.61\times$ faster while having identical performance. With doubled granularity, it achieves higher overall performance while still being $1.11\times$ faster. Our method makes multi-billion-parameter foundation model research more accessible.
Open 2602.04870v1
CRoSS: A Continual Robotic Simulation Suite for Scalable Reinforcement…
2026-02-04Machine LearningArtificial Intelligencearxiv
Abstract
Continual reinforcement learning (CRL) requires agents to learn from a sequence of tasks without forgetting previously acquired policies. In this work, we introduce a novel benchmark suite for CRL based on realistically simulated robots in the Gazebo simulator. Our Continual Robotic Simulation Suite (CRoSS) benchmarks rely on two robotic platforms: a two-wheeled differential-drive robot with lidar, camera and bumper sensor, and a robotic arm with seven joints. The former represent an agent in line-following and object-pushing scenarios, where variation of visual and structural parameters yields a large number of distinct tasks, whereas the latter is used in two goal-reaching scenarios with high-level cartesian hand position control (modeled after the Continual World benchmark), and low-level control based on joint angles. For the robotic arm benchmarks, we provide additional kinematics-only variants that bypass the need for physical simulation (as long as no sensor readings are required), and which can be run two orders of magnitude faster. CRoSS is designed to be easily extensible and enables controlled studies of continual reinforcement learning in robotic settings with high physical realism, and in particular allow the use of almost arbitrary simulated sensors. To ensure reproducibility and ease of use, we provide a containerized setup (Apptainer) that runs out-of-the-box, and report performances of standard RL algorithms, including Deep Q-Networks (DQN) and policy gradient methods. This highlights the suitability as a scalable and reproducible benchmark for CRL research.
Open 2602.04868v1
When LLaVA Meets Objects: Token Composition for Vision-Language-Models
2026-02-04Computer Vision and Pattern Recognitionarxiv
Abstract
Current autoregressive Vision Language Models (VLMs) usually rely on a large number of visual tokens to represent images, resulting in a need for more compute especially at inference time. To address this problem, we propose Mask-LLaVA, a framework that leverages different levels of visual features to create a compact yet information-rich visual representation for autoregressive VLMs. Namely, we combine mask-based object representations together with global tokens and local patch tokens. While all tokens are used during training, it shows that the resulting model can flexibly drop especially the number of mask-based object-tokens at test time, allowing to adapt the number of tokens during inference without the need to retrain the model and without a significant drop in performance. We evaluate the proposed approach on a suite of standard benchmarks showing results competitive to current token efficient methods and comparable to the original LLaVA baseline using only a fraction of visual tokens. Our analysis demonstrates that combining multi-level features enables efficient learning with fewer tokens while allowing dynamic token selection at test time for good performance.
Open 2602.04864v1
Subliminal Effects in Your Data: A General Mechanism via Log-Linearity
2026-02-04Machine LearningArtificial IntelligenceComputation and Languagearxiv
Abstract
Training modern large language models (LLMs) has become a veritable smorgasbord of algorithms and datasets designed to elicit particular behaviors, making it critical to develop techniques to understand the effects of datasets on the model's properties. This is exacerbated by recent experiments that show datasets can transmit signals that are not directly observable from individual datapoints, posing a conceptual challenge for dataset-centric understandings of LLM training and suggesting a missing fundamental account of such phenomena. Towards understanding such effects, inspired by recent work on the linear structure of LLMs, we uncover a general mechanism through which hidden subtexts can arise in generic datasets. We introduce Logit-Linear-Selection (LLS), a method that prescribes how to select subsets of a generic preference dataset to elicit a wide range of hidden effects. We apply LLS to discover subsets of real-world datasets so that models trained on them exhibit behaviors ranging from having specific preferences, to responding to prompts in a different language not present in the dataset, to taking on a different persona. Crucially, the effect persists for the selected subset, across models with varying architectures, supporting its generality and universality.
Open 2602.04863v1
Capacity Bounds on Doppler OFDM Channels
2026-02-04Information Theoryarxiv
Abstract
Low Earth orbit (LEO) satellite systems experience significant Doppler effects due to high mobility. While Doppler shifts can be largely compensated, residual frequency uncertainty induces a structured form of channel uncertainty that can limit achievable rates. We model this effect using a block-fading channel of the form $ \mathbf{H} = \mathbf{F} + s \mathbf{G} $, where $s$ is an unknown scalar random parameter. We first study this model in a general $N\times N$ MIMO setting. For this channel, we derive achievable rate lower bounds based on explicit transmission schemes and capacity upper bounds using a duality approach. We study Gaussian signaling and propose a practical superposition scheme with subspace alignment (SN) and successive interference cancellation, where a coarse-layer stream serves as an implicit pilot for decoding refined-layer data. We characterize asymptotic capacity in the near-coherent and high-SNR regimes, and show via Doppler-OFDM simulations that the proposed SN scheme achieves near-optimal rates with low complexity.
Open 2602.04862v1
From Evaluation to Design: Using Potential Energy Surface Smoothness Me…
2026-02-04Machine LearningArtificial Intelligencearxiv
Abstract
Machine Learning Interatomic Potentials (MLIPs) sometimes fail to reproduce the physical smoothness of the quantum potential energy surface (PES), leading to erroneous behavior in downstream simulations that standard energy and force regression evaluations can miss. Existing evaluations, such as microcanonical molecular dynamics (MD), are computationally expensive and primarily probe near-equilibrium states. To improve evaluation metrics for MLIPs, we introduce the Bond Smoothness Characterization Test (BSCT). This efficient benchmark probes the PES via controlled bond deformations and detects non-smoothness, including discontinuities, artificial minima, and spurious forces, both near and far from equilibrium. We show that BSCT correlates strongly with MD stability while requiring a fraction of the cost of MD. To demonstrate how BSCT can guide iterative model design, we utilize an unconstrained Transformer backbone as a testbed, illustrating how refinements such as a new differentiable $k$-nearest neighbors algorithm and temperature-controlled attention reduce artifacts identified by our metric. By optimizing model design systematically based on BSCT, the resulting MLIP simultaneously achieves a low conventional E/F regression error, stable MD simulations, and robust atomistic property predictions. Our results establish BSCT as both a validation metric and as an "in-the-loop" model design proxy that alerts MLIP developers to physical challenges that cannot be efficiently evaluated by current MLIP benchmarks.
Open 2602.04861v1
Digital signatures with classical shadows on near-term quantum computers
2026-02-04Cryptography and Securityarxiv
Abstract
Quantum mechanics provides cryptographic primitives whose security is grounded in hardness assumptions independent of those underlying classical cryptography. However, existing proposals require low-noise quantum communication and long-lived quantum memory, capabilities which remain challenging to realize in practice. In this work, we introduce a quantum digital signature scheme that operates with only classical communication, using the classical shadows of states produced by random circuits as public keys. We provide theoretical and numerical evidence supporting the conjectured hardness of learning the private key (the circuit) from the public key (the shadow). A key technical ingredient enabling our scheme is an improved state-certification primitive that achieves higher noise tolerance and lower sample complexity than prior methods. We realize this certification by designing a high-rate error-detecting code tailored to our random-circuit ensemble and experimentally generating shadows for 32-qubit states using circuits with $\geq 80$ logical ($\geq 582$ physical) two-qubit gates, attaining 0.90 $\pm$ 0.01 fidelity. With increased number of measurement samples, our hardware-demonstrated primitives realize a proof-of-principle quantum digital signature, demonstrating the near-term feasibility of our scheme.
Open 2602.04859v1
CoT is Not the Chain of Truth: An Empirical Internal Analysis of Reason…
2026-02-04Computation and Languagearxiv
Abstract
From generating headlines to fabricating news, the Large Language Models (LLMs) are typically assessed by their final outputs, under the safety assumption that a refusal response signifies safe reasoning throughout the entire process. Challenging this assumption, our study reveals that during fake news generation, even when a model rejects a harmful request, its Chain-of-Thought (CoT) reasoning may still internally contain and propagate unsafe narratives. To analyze this phenomenon, we introduce a unified safety-analysis framework that systematically deconstructs CoT generation across model layers and evaluates the role of individual attention heads through Jacobian-based spectral metrics. Within this framework, we introduce three interpretable measures: stability, geometry, and energy to quantify how specific attention heads respond or embed deceptive reasoning patterns. Extensive experiments on multiple reasoning-oriented LLMs show that the generation risk rise significantly when the thinking mode is activated, where the critical routing decisions concentrated in only a few contiguous mid-depth layers. By precisely identifying the attention heads responsible for this divergence, our work challenges the assumption that refusal implies safety and provides a new understanding perspective for mitigating latent reasoning risks.
Open 2602.04856v1
Decomposed Prompting Does Not Fix Knowledge Gaps, But Helps Models Say…
2026-02-04Computation and Languagearxiv
Abstract
Large language models often struggle to recognize their knowledge limits in closed-book question answering, leading to confident hallucinations. While decomposed prompting is typically used to improve accuracy, we investigate its impact on reliability. We evaluate three task-equivalent prompting regimes: Direct, Assistive, and Incremental, across different model scales and multi-hop QA benchmarks. We find that although accuracy gains from decomposition diminish in frontier models, disagreements between prompting regimes remain highly indicative of potential errors. Because factual knowledge is stable while hallucinations are stochastic, cross-regime agreement provides a precise signal of internal uncertainty. We leverage this signal to implement a training-free abstention policy that requires no retrieval or fine-tuning. Our results show that disagreement-based abstention outperforms standard uncertainty baselines as an error detector, improving both F1 and AUROC across settings. This demonstrates that decomposition-based prompting can serve as a practical diagnostic probe for model reliability in closed-book QA.
Open 2602.04853v1
The Key to State Reduction in Linear Attention: A Rank-based Perspective
2026-02-04Machine Learningarxiv
Abstract
Linear attention offers a computationally efficient yet expressive alternative to softmax attention. However, recent empirical results indicate that the state of trained linear attention models often exhibits a low-rank structure, suggesting that these models underexploit their capacity in practice. To illuminate this phenomenon, we provide a theoretical analysis of the role of rank in linear attention, revealing that low effective rank can affect retrieval error by amplifying query noise. In addition to these theoretical insights, we conjecture that the low-rank states can be substantially reduced post-training with only minimal performance degradation, yielding faster and more memory-efficient models. To this end, we propose a novel hardware-aware approach that structurally prunes key and query matrices, reducing the state size while retaining compatibility with existing CUDA kernels. We adapt several existing pruning strategies to fit our framework and, building on our theoretical analysis, propose a novel structured pruning method based on a rank-revealing QR decomposition. Our empirical results, evaluated across models of varying sizes and on various downstream tasks, demonstrate the effectiveness of our state reduction framework. We highlight that our framework enables the removal of 50% of the query and key channels at only a marginal increase in perplexity. The code for this project can be found at https://github.com/camail-official/LinearAttentionPruning.
Open 2602.04852v1
PDF-HR: Pose Distance Fields for Humanoid Robots
2026-02-04RoboticsComputer Vision and Pattern Recognitionarxiv
Abstract
Pose and motion priors play a crucial role in humanoid robotics. Although such priors have been widely studied in human motion recovery (HMR) domain with a range of models, their adoption for humanoid robots remains limited, largely due to the scarcity of high-quality humanoid motion data. In this work, we introduce Pose Distance Fields for Humanoid Robots (PDF-HR), a lightweight prior that represents the robot pose distribution as a continuous and differentiable manifold. Given an arbitrary pose, PDF-HR predicts its distance to a large corpus of retargeted robot poses, yielding a smooth measure of pose plausibility that is well suited for optimization and control. PDF-HR can be integrated as a reward shaping term, a regularizer, or a standalone plausibility scorer across diverse pipelines. We evaluate PDF-HR on various humanoid tasks, including single-trajectory motion tracking, general motion tracking, style-based motion mimicry, and general motion retargeting. Experiments show that this plug-and-play prior consistently and substantially strengthens strong baselines. Code and models will be released.
Open 2602.04851v1
El Agente Quntur: A research collaborator agent for quantum chemistry
2026-02-04Artificial IntelligenceMultiagent Systemsarxiv
Abstract
Quantum chemistry is a foundational enabling tool for the fields of chemistry, materials science, computational biology and others. Despite of its power, the practical application of quantum chemistry simulations remains in the hands of qualified experts due to methodological complexity, software heterogeneity, and the need for informed interpretation of results. To bridge the accessibility gap for these tools and expand their reach to chemists with broader backgrounds, we introduce El Agente Quntur, a hierarchical, multi-agent AI system designed to operate not merely as an automation tool but as a research collaborator for computational quantum chemistry. Quntur was designed following three main strategies: i) elimination of hard-coded procedural policies in favour of reasoning-driven decisions, ii) construction of general and composable actions that facilitate generalization and efficiency, and iii) implementation of guided deep research to integrate abstract quantum-chemical reasoning across subdisciplines and a detailed understanding of the software's internal logic and syntax. Although instantiated in ORCA, these design principles are applicable to research agents more generally and easily expandable to additional quantum chemistry packages and beyond. Quntur supports the full range of calculations available in ORCA 6.0 and reasons over software documentation and scientific literature to plan, execute, adapt, and analyze in silico chemistry experiments following best practices. We discuss the advances and current bottlenecks in agentic systems operating at the research level in computational chemistry, and outline a roadmap toward a fully autonomous end-to-end computational chemistry research agent.
Open 2602.04850v1
El Agente Estructural: An Artificially Intelligent Molecular Editor
2026-02-04Artificial IntelligenceMultiagent Systemsarxiv
Abstract
We present El Agente Estructural, a multimodal, natural-language-driven geometry-generation and manipulation agent for autonomous chemistry and molecular modelling. Unlike molecular generation or editing via generative models, Estructural mimics how human experts directly manipulate molecular systems in three dimensions by integrating a comprehensive set of domain-informed tools and vision-language models. This design enables precise control over atomic or functional group replacements, atomic connectivity, and stereochemistry without the need to rebuild extensive core molecular frameworks. Through a series of representative case studies, we demonstrate that Estructural enables chemically meaningful geometry manipulation across a wide range of real-world scenarios. These include site-selective functionalization, ligand binding, ligand exchange, stereochemically controlled structure construction, isomer interconversion, fragment-level structural analysis, image-guided generation of structures from schematic reaction mechanisms, and mechanism-driven geometry generation and modification. These examples illustrate how multimodal reasoning, when combined with specialized geometry-aware tools, supports interactive and context-aware molecular modelling beyond structure generation. Looking forward, the integration of Estructural into El Agente Quntur, an autonomous multi-agent quantum chemistry platform, enhances its capabilities by adding sophisticated tools for the generation and editing of three-dimensional structures.
Open 2602.04849v1
A-Graph: A Unified Graph Representation for At-Will Simulation across S…
2026-02-04Performancearxiv
Abstract
As computer systems continue to diversify across technologies, architectures, applications, and beyond, the relevant design space has become larger and more complex. Given such trends, design space exploration (DSE) at early stages is critical to ensure agile development towards optimal performance and cost. Industry-grade EDA tools directly take in RTL code and report accurate results, but do not perform DSE. Recent works have attempted to explore the design space via simulation. However, most of these works are domain-specific and constrain the space that users are allowed to explore, offering limited flexibility between technologies, architecture, and applications. Moreover, they often demand high domain expertise to ensure high accuracy. To enable simulation that is agnostic to technology, architecture, and application at any granularity, we introduce Architecture-Graph (Agraph), a graph that unifies the system representation surrounding any arbitrary application, software, architecture, and circuit. Such a unified representation distinguishes Agraph from prior works, which focus on a single stack, allowing users to freely explore the design space across system stacks. To fully unleash the potential of Agraph, we further present Archx, a framework that implements Agraph. Archx is user-friendly in two ways. First, Archx has an easy-to-use programming interface to automatically generate and sweep design points under user constraints, boosting the programmability. Second, Archx adopts scope-based metric retrieval to analyze and understand each design point at any user-preferred hierarchy, enhancing the explainability. We conduct case studies that demonstrate Agraph's generalization across technologies, architecture, and applications with high simulation accuracy. Overall, we argue that Agraph and Archx serve as a foundation to simulate both performance and cost at will.
Open 2602.04847v1
CSLib: The Lean Computer Science Library
2026-02-04Logic in Computer ScienceProgramming Languagesarxiv
Abstract
We introduce CSLib, an open-source framework for proving computer-science-related theorems and writing formally verified code in the Lean proof assistant. CSLib aims to be for computer science what Lean's Mathlib is for mathematics. Mathlib has been tremendously impactful: it is a key reason for Lean's popularity within the mathematics research community, and it has also played a critical role in the training of AI systems for mathematical reasoning. However, the base of computer science knowledge in Lean is currently quite limited. CSLib will vastly enhance this knowledge base and provide infrastructure for using this knowledge in real-world verification projects. By doing so, CSLib will (1) enable the broad use of Lean in computer science education and research, and (2) facilitate the manual and AI-aided engineering of large-scale formally verified systems.
Open 2602.04846v1
Fluid Representations in Reasoning Models
2026-02-04Artificial Intelligencearxiv
Abstract
Reasoning language models, which generate long chains of thought, dramatically outperform non-reasoning language models on abstract problems. However, the internal model mechanisms that allow this superior performance remain poorly understood. We present a mechanistic analysis of how QwQ-32B - a model specifically trained to produce extensive reasoning traces - process abstract structural information. On Mystery Blocksworld - a semantically obfuscated planning domain - we find that QwQ-32B gradually improves its internal representation of actions and concepts during reasoning. The model develops abstract encodings that focus on structure rather than specific action names. Through steering experiments, we establish causal evidence that these adaptations improve problem solving: injecting refined representations from successful traces boosts accuracy, while symbolic representations can replace many obfuscated encodings with minimal performance loss. We find that one of the factors driving reasoning model performance is in-context refinement of token representations, which we dub Fluid Reasoning Representations.
Open 2602.04843v1
The matrix-vector complexity of $Ax=b$
2026-02-04Data Structures and Algorithmsarxiv
Abstract
Matrix-vector algorithms, particularly Krylov subspace methods, are widely viewed as the most effective algorithms for solving large systems of linear equations. This paper establishes lower bounds on the worst-case number of matrix-vector products needed by such an algorithm to approximately solve a general linear system. The first main result is that, for a matrix-vector algorithm which can perform products with both a matrix and its transpose, $Ω(κ\log(1/\varepsilon))$ matrix-vector products are necessary to solve a linear system with condition number $κ$ to accuracy $\varepsilon$, matching an upper bound for conjugate gradient on the normal equations. The second main result is that one-sided algorithms, which lack access to the transpose, must use $n$ matrix-vector products to solve an $n \times n$ linear system, even when the problem is perfectly conditioned. Both main results include explicit constants that match known upper bounds up to a factor of four. These results rigorously demonstrate the limitations of matrix-vector algorithms and confirm the optimality of widely used Krylov subspace algorithms.
Open 2602.04842v1
Vivifying LIME: Visual Interactive Testbed for LIME Analysis
2026-02-04Human-Computer Interactionarxiv
Abstract
Explainable Artificial Intelligence (XAI) has gained importance in interpreting model predictions. Among leading techniques for XAI, Local Interpretable Model-agnostic Explanations (LIME) is most frequently utilized as it notably helps people's understanding of complex models. However, LIME's analysis is constrained to a single image at a time. Besides, it lacks interaction mechanisms for observing the LIME's results and direct manipulations of factors affecting the results. To address these issues, we introduce an interactive visualization tool, LIMEVis, which improves the analysis workflow of LIME by enabling users to explore multiple LIME results simultaneously and modify them directly. With LIMEVis, we could conveniently identify common features in images that a model seems to mainly consider for category classification. Additionally, by interactively modifying the LIME results, we could determine which segments in an image influence the model's classification.
Open 2602.04841v1
LitS: A novel Neighborhood Descriptor for Point Clouds
2026-02-04Computer Vision and Pattern Recognitionarxiv
Abstract
With the advancement of 3D scanning technologies, point clouds have become fundamental for representing 3D spatial data, with applications that span across various scientific and technological fields. Practical analysis of this data depends crucially on available neighborhood descriptors to accurately characterize the local geometries of the point cloud. This paper introduces LitS, a novel neighborhood descriptor for 2D and 3D point clouds. LitS are piecewise constant functions on the unit circle that allow points to keep track of their surroundings. Each element in LitS' domain represents a direction with respect to a local reference system. Once constructed, evaluating LitS at any given direction gives us information about the number of neighbors in a cone-like region centered around that same direction. Thus, LitS conveys a lot of information about the local neighborhood of a point, which can be leveraged to gain global structural understanding by analyzing how LitS changes between close points. In addition, LitS comes in two versions ('regular' and 'cumulative') and has two parameters, allowing them to adapt to various contexts and types of point clouds. Overall, they are a versatile neighborhood descriptor, capable of capturing the nuances of local point arrangements and resilient to common point cloud data issues such as variable density and noise.
Open 2602.04838v1
Group-Evolving Agents: Open-Ended Self-Improvement via Experience Shari…
2026-02-04Artificial Intelligencearxiv
Abstract
Open-ended self-improving agents can autonomously modify their own structural designs to advance their capabilities and overcome the limits of pre-defined architectures, thus reducing reliance on human intervention. We introduce Group-Evolving Agents (GEA), a new paradigm for open-ended self-improvements, which treats a group of agents as the fundamental evolutionary unit, enabling explicit experience sharing and reuse within the group throughout evolution. Unlike existing open-ended self-evolving paradigms that adopt tree-structured evolution, GEA overcomes the limitation of inefficient utilization of exploratory diversity caused by isolated evolutionary branches. We evaluate GEA on challenging coding benchmarks, where it significantly outperforms state-of-the-art self-evolving methods (71.0% vs. 56.7% on SWE-bench Verified, 88.3% vs. 68.3% on Polyglot) and matches or exceeds top human-designed agent frameworks (71.8% and 52.0% on two benchmarks, respectively). Analysis reveals that GEA more effectively converts early-stage exploratory diversity into sustained, long-term progress, achieving stronger performance under the same number of evolved agents. Furthermore, GEA exhibits consistent transferability across different coding models and greater robustness, fixing framework-level bugs in 1.4 iterations on average, versus 5 for self-evolving methods.
Open 2602.04837v1
Are AI Capabilities Increasing Exponentially? A Competing Hypothesis
2026-02-04Artificial Intelligencearxiv
Abstract
Rapidly increasing AI capabilities have substantial real-world consequences, ranging from AI safety concerns to labor market consequences. The Model Evaluation & Threat Research (METR) report argues that AI capabilities have exhibited exponential growth since 2019. In this note, we argue that the data does not support exponential growth, even in shorter-term horizons. Whereas the METR study claims that fitting sigmoid/logistic curves results in inflection points far in the future, we fit a sigmoid curve to their current data and find that the inflection point has already passed. In addition, we propose a more complex model that decomposes AI capabilities into base and reasoning capabilities, exhibiting individual rates of improvement. We prove that this model supports our hypothesis that AI capabilities will exhibit an inflection point in the near future. Our goal is not to establish a rigorous forecast of our own, but to highlight the fragility of existing forecasts of exponential growth.
Open 2602.04836v1
It's not a Lottery, it's a Race: Understanding How Gradient Descent Ada…
2026-02-04Machine LearningArtificial IntelligenceComputer Vision and Pattern Recognitionarxiv
Abstract
Our theoretical understanding of neural networks is lagging behind their empirical success. One of the important unexplained phenomena is why and how, during the process of training with gradient descent, the theoretical capacity of neural networks is reduced to an effective capacity that fits the task. We here investigate the mechanism by which gradient descent achieves this through analyzing the learning dynamics at the level of individual neurons in single hidden layer ReLU networks. We identify three dynamical principles -- mutual alignment, unlocking and racing -- that together explain why we can often successfully reduce capacity after training through the merging of equivalent neurons or the pruning of low norm weights. We specifically explain the mechanism behind the lottery ticket conjecture, or why the specific, beneficial initial conditions of some neurons lead them to obtain higher weight norms.
Open 2602.04832v1
When Code Becomes Abundant: Redefining Software Engineering Around Orch…
2026-02-04Software Engineeringarxiv
Abstract
Software Engineering (SE) faces simultaneous pressure from AI automation (reducing code production costs) and hardware-energy constraints (amplifying failure costs). We position that SE must redefine itself around human discernment-intent articulation, architectural control, and verification-rather than code construction. This shift introduces accountability collapse as a central risk and requires fundamental changes to research priorities, educational curricula, and industrial practices. We argue that Software Engineering, as traditionally defined around code construction and process management, is no longer sufficient. Instead, the discipline must be redefined around intent articulation, architectural control, and systematic verification. This redefinition shifts Software Engineering from a production-oriented field to one centered on human judgment under automation, with profound implications for research, practice, and education.
Open 2602.04830v1
Opinion dynamics under electoral shocks in competitive campaigns
2026-02-04Social and Information Networksarxiv
Abstract
We propose a computational framework for modeling opinion dynamics in electoral competitions that combines two realistic features: voter memory and exogenous shocks. The population is represented by a fully-connected network of agents, each holding a binary opinion that reflects support for one of two candidates. First, inspired by the classical voter model, we introduce a memory-dependent opinion update: each agent's probability of adopting a neighbor's stance depends on how many times they agreed with that neighbor in the agent's past $m$ states, promoting inertia and resistance to change. Second, we define an electoral shock as an abrupt external influence acting uniformly over all agents during a finite interval $[t_0, t_0+Δt]$, favoring one candidate by switching opinions with probability $p_s$, representing the impact of extraordinary events such as political scandals, impactful speeches, or sudden news. We explore how the strength and duration of the shock, in conjunction with memory length, influence the transient and stationary properties of the model, as well as the candidates' advantage. Our findings reveal a rich dynamical behavior: memory slows down convergence and enhances system resilience, whereas shocks of sufficient intensity and duration can abruptly realign collective preferences, particularly when occurring close to the election date. Conversely, for long memory lengths or large election horizons, shock effects are dampened or delayed, depending on their timing. These results offer insights into why some sudden political events reshape electoral outcomes while others fade under strong individual inertia. Finally, a qualitative comparison with real electoral shocks reported in opinion polls illustrates how the model captures the competition between voter inertia and abrupt external events observed in actual elections.
Open 2602.04829v1
On Dual Connectivity in 6G Leo Constellations
2026-02-04Networking and Internet Architecturearxiv
Abstract
Dual connectivity (DC) has garnered significant attention in 5G evolution, allowing for enhancing throughput and reliability by leveraging the channel conditions of two paths. However, when the paths exhibit different delays, such as in terrestrial and non-terrestrial integrated networks with multi-orbit topologies or in networks characterized by frequent topology changes, like Low Earth Orbit (LEO) satellite constellations with different elevation angles, traffic delivery may experience packet reordering or triggering congestion control mechanisms. Additionally, real-time traffic may experience packet drops if their arrival exceeds a play-out threshold. Different techniques have been proposed to address these issues, such as packet duplication, packet switching, and network coding for traffic scheduling in DC. However, if not accurately designed, these techniques can lead to resource waste, encoding/decoding delays, and computational overhead, undermining DC's intended benefits. This paper provides a mathematical framework for calculating the average end-to-end packet loss in case of a loss process modeled with a Discrete Markov Chain - typical of a wireless channel - when combining packet duplication and packet switching or when network coding is employed in DC. Such metrics help derive optimal policies with full knowledge of the underlying loss process to be compared to empirical models learned through Machine Learning algorithms.
Open 2602.04825v1
Do Developers Read Type Information? An Eye-Tracking Study on TypeScript
2026-02-04Software Engineeringarxiv
Abstract
Statically-annotated types have been shown to aid developers in a number of programming tasks, and this benefit holds true even when static type checking is not used. It is hypothesized that this is because developers use type annotations as in-code documentation. In this study, we aim to provide evidence that developers use type annotations as in-code documentation. Understanding this hypothesized use will help to understand how, and in what contexts, developers use type information; additionally, it may help to design better development tools and inform educational decisions. To provide this evidence, we conduct an eye tracking study with 26 undergraduate students to determine if they read type annotations during code comprehension and bug localization in the TypeScript language. We found that developers do not look directly at lines containing type annotations or type declarations more often when they are present, in either code summarization or bug localization tasks. The results have implications for tool builders to improve the availability of type information, the development community to build good standards for use of type annotations, and education to enforce deliberate teaching of reading patterns.
Open 2602.04824v1