This Week In Computer Science Papers

Week beginning 9th March 2026

Tap a tile to open details. Use the left sidebar to filter by category.

No filters applied
Showing 1–36 of 475
Scale Space Diffusion
2026-03-09Computer Vision and Pattern RecognitionArtificial Intelligencearxiv
Abstract
Diffusion models degrade images through noise, and reversing this process reveals an information hierarchy across timesteps. Scale-space theory exhibits a similar hierarchy via low-pass filtering. We formalize this connection and show that highly noisy diffusion states contain no more information than small, downsampled images - raising the question of why they must be processed at full resolution. To address this, we fuse scale spaces into the diffusion process by formulating a family of diffusion models with generalized linear degradations and practical implementations. Using downsampling as the degradation yields our proposed Scale Space Diffusion. To support Scale Space Diffusion, we introduce Flexi-UNet, a UNet variant that performs resolution-preserving and resolution-increasing denoising using only the necessary parts of the network. We evaluate our framework on CelebA and ImageNet and analyze its scaling behavior across resolutions and network depths. Our project website ( https://prateksha.github.io/projects/scale-space-diffusion/ ) is available publicly.
Open 2603.08709v1
FVG-PT: Adaptive Foreground View-Guided Prompt Tuning for Vision-Langua…
2026-03-09Computer Vision and Pattern Recognitionarxiv
Abstract
CLIP-based prompt tuning enables pretrained Vision-Language Models (VLMs) to efficiently adapt to downstream tasks. Although existing studies have made significant progress, they pay limited attention to changes in the internal attention representations of VLMs during the tuning process. In this paper, we attribute the failure modes of prompt tuning predictions to shifts in foreground attention of the visual encoder, and propose Foreground View-Guided Prompt Tuning (FVG-PT), an adaptive plug-and-play foreground attention guidance module, to alleviate the shifts. Concretely, FVG-PT introduces a learnable Foreground Reliability Gate to automatically enhance the foreground view quality, applies a Foreground Distillation Compensation module to guide visual attention toward the foreground, and further introduces a Prior Calibration module to mitigate generalization degradation caused by excessive focus on the foreground. Experiments on multiple backbone models and datasets show the effectiveness and compatibility of FVG-PT. Codes are available at: https://github.com/JREion/FVG-PT
Open 2603.08708v1
Impermanent: A Live Benchmark for Temporal Generalization in Time Serie…
2026-03-09Machine Learningarxiv
Abstract
Recent advances in time-series forecasting increasingly rely on pre-trained foundation-style models. While these models often claim broad generalization, existing evaluation protocols provide limited evidence. Indeed, most current benchmarks use static train-test splits that can easily lead to contamination as foundation models can inadvertently train on test data or perform model selection using test scores, which can inflate performance. We introduce Impermanent, a live benchmark that evaluates forecasting models under open-world temporal change by scoring forecasts sequentially over time on continuously updated data streams, enabling the study of temporal robustness, distributional shift, and performance stability rather than one-off accuracy on a frozen test set. Impermanent is instantiated on GitHub open-source activity, providing a naturally live and highly non-stationary dataset shaped by releases, shifting contributor behavior, platform/tooling changes, and external events. We focus on the top 400 repositories by star count and construct time series from issues opened, pull requests opened, push events, and new stargazers, evaluated over a rolling window with daily updates, alongside standardized protocols and leaderboards for reproducible, ongoing comparison. By shifting evaluation from static accuracy to sustained performance, Impermanent takes a concrete step toward assessing when and whether foundation-level generalization in time-series forecasting can be meaningfully claimed. Code and a live dashboard are available at https://github.com/TimeCopilot/impermanent and https://impermanent.timecopilot.dev.
Open 2603.08707v1
Agentic Critical Training
2026-03-09Artificial IntelligenceComputation and LanguageMachine Learningarxiv
Abstract
Training large language models (LLMs) as autonomous agents often begins with imitation learning, but it only teaches agents what to do without understanding why: agents never contrast successful actions against suboptimal alternatives and thus lack awareness of action quality. Recent approaches attempt to address this by introducing self-reflection supervision derived from contrasts between expert and alternative actions. However, the training paradigm fundamentally remains imitation learning: the model imitates pre-constructed reflection text rather than learning to reason autonomously. We propose Agentic Critical Training (ACT), a reinforcement learning paradigm that trains agents to identify the better action among alternatives. By rewarding whether the model's judgment is correct, ACT drives the model to autonomously develop reasoning about action quality, producing genuine self-reflection rather than imitating it. Across three challenging agent benchmarks, ACT consistently improves agent performance when combined with different post-training methods. It achieves an average improvement of 5.07 points over imitation learning and 4.62 points over reinforcement learning. Compared to approaches that inject reflection capability through knowledge distillation, ACT also demonstrates clear advantages, yielding an average improvement of 2.42 points. Moreover, ACT enables strong out-of-distribution generalization on agentic benchmarks and improves performance on general reasoning benchmarks without any reasoning-specific training data, highlighting the value of our method. These results suggest that ACT is a promising path toward developing more reflective and capable LLM agents.
Open 2603.08706v1
Evaluating Financial Intelligence in Large Language Models: Benchmarkin…
2026-03-09Artificial Intelligencearxiv
Abstract
Large language models are increasingly used for financial analysis and investment research, yet systematic evaluation of their financial reasoning capabilities remains limited. In this work, we introduce the AI Financial Intelligence Benchmark (AFIB), a multi-dimensional evaluation framework designed to assess financial analysis capabilities across five dimensions: factual accuracy, analytical completeness, data recency, model consistency, and failure patterns. We evaluate five AI systems: GPT, Gemini, Perplexity, Claude, and SuperInvesting, using a dataset of 95+ structured financial analysis questions derived from real-world equity research tasks. The results reveal substantial differences in performance across models. Within this benchmark setting, SuperInvesting achieves the highest aggregate performance, with an average factual accuracy score of 8.96/10 and the highest completeness score of 56.65/70, while also demonstrating the lowest hallucination rate among evaluated systems. Retrieval-oriented systems such as Perplexity perform strongly on data recency tasks due to live information access but exhibit weaker analytical synthesis and consistency. Overall, the results highlight that financial intelligence in large language models is inherently multi-dimensional, and systems that combine structured financial data access with analytical reasoning capabilities provide the most reliable performance for complex investment research workflows.
Open 2603.08704v1
HiAR: Efficient Autoregressive Long Video Generation via Hierarchical D…
2026-03-09Computer Vision and Pattern Recognitionarxiv
Abstract
Autoregressive (AR) diffusion offers a promising framework for generating videos of theoretically infinite length. However, a major challenge is maintaining temporal continuity while preventing the progressive quality degradation caused by error accumulation. To ensure continuity, existing methods typically condition on highly denoised contexts; yet, this practice propagates prediction errors with high certainty, thereby exacerbating degradation. In this paper, we argue that a highly clean context is unnecessary. Drawing inspiration from bidirectional diffusion models, which denoise frames at a shared noise level while maintaining coherence, we propose that conditioning on context at the same noise level as the current block provides sufficient signal for temporal consistency while effectively mitigating error propagation. Building on this insight, we propose HiAR, a hierarchical denoising framework that reverses the conventional generation order: instead of completing each block sequentially, it performs causal generation across all blocks at every denoising step, so that each block is always conditioned on context at the same noise level. This hierarchy naturally admits pipelined parallel inference, yielding a 1.8 wall-clock speedup in our 4-step setting. We further observe that self-rollout distillation under this paradigm amplifies a low-motion shortcut inherent to the mode-seeking reverse-KL objective. To counteract this, we introduce a forward-KL regulariser in bidirectional-attention mode, which preserves motion diversity for causal inference without interfering with the distillation loss. On VBench (20s generation), HiAR achieves the best overall score and the lowest temporal drift among all compared methods.
Open 2603.08703v1
Learning Functions of Halfspaces
2026-03-09Data Structures and AlgorithmsComputational Complexityarxiv
Abstract
We give an algorithm that learns arbitrary Boolean functions of $k$ arbitrary halfspaces over $\mathbb{R}^n$, in the challenging distribution-free Probably Approximately Correct (PAC) learning model, running in time $2^{\sqrt{n} \cdot (\log n)^{O(k)}}$. This is the first algorithm that can PAC learn even intersections of two halfspaces in time $2^{o(n)}.$
Open 2603.08700v1
A note on approximating the average degree of bounded arboricity graphs
2026-03-09Data Structures and AlgorithmsComputational ComplexityDiscrete Mathematicsarxiv
Abstract
Estimating the average degree of graph is a classic problem in sublinear graph algorithm. Eden, Ron, and Seshadhri (ICALP 2017, SIDMA 2019) gave a simple algorithm for this problem whose running time depended on the graph arboricity, but the underlying simplicity and associated analysis were buried inside the main result. Moreover, the description there loses logarithmic factors because of parameter search. The aim of this note is to give a full presentation of this algorithm, without these losses. Consider standard access (vertex samples, degree queries, and neighbor queries) to a graph $G = (V,E)$ of arboricity at most $α$. Let $d$ denote the average degree of $G$. We describe an algorithm that gives a $(1+\varepsilon)$-approximation to $d$ degree using $O(\varepsilon^{-2}α/d)$ queries. For completeness, we modify the algorithm to get a $O(\varepsilon^{-2} \sqrt{n/d})$ query
Open 2603.08694v1
Improved Certificates for Independence Number in Semirandom Hypergraphs
2026-03-09Data Structures and Algorithmsarxiv
Abstract
We study the problem of efficiently certifying upper bounds on the independence number of $\ell$-uniform hypergraphs. This is a notoriously hard problem, with efficient algorithms failing to approximate the independence number within $n^{1-ε}$ factor in the worst case [Has99, Zuc07]. We study the problem in random and semirandom hypergraphs. There is a folklore reduction to the graph case, achieving a certifiable bound of $O(\sqrt{n/p})$. More recently, the work [GKM22] improved this by constructing spectral certificates that yield a bound of $O(\sqrt{n}.\mathrm{polylog}(n)/p^{1/(\ell/2)})$. We make two key improvements: firstly, we prove sharper bounds that get rid of pesky logarithmic factors in $n$, and nearly attain the conjectured optimal (in both $n$ and $p$) computational threshold of $O(\sqrt{n}/p^{1/\ell})$, and secondly, we design robust Sum-of-Squares (SoS) certificates, proving our bounds in the more challenging semirandom hypergraph model. Our analysis employs the proofs-to-algorithms paradigm [BS16, FKP19] in showing an upper bound for pseudo-expectation of degree-$2\ell$ SoS relaxation of the natural polynomial system for maximum independent set. The challenging case is odd-arity hypergraphs, where we employ a tensor-based analysis that reduces the problem to proving bounds on a natural class of random chaos matrices associated with $\ell$-uniform hypergraphs. Previous bounds [AMP21, RT23] have a logarithmic dependence, which we remove by leveraging recent progress on matrix concentration inequalities [BBvH23, BLNvH25]; we believe these may be useful in other hypergraph problems. As an application, we show our improved certificates can be combined with an SoS relaxation of a natural $r$-coloring polynomial system to recover an arbitrary planted $r$-colorable subhypergraph in a semirandom model along the lines of [LPR25], which allows for strong adversaries.
Open 2603.08693v1
A Multi-Objective Optimization Approach for Sustainable AI-Driven Entre…
2026-03-09Artificial Intelligencearxiv
Abstract
The rapid advancement of artificial intelligence (AI) technologies presents both unprecedented opportunities and significant challenges for sustainable economic development. While AI offers transformative potential for addressing environmental challenges and enhancing economic resilience, its deployment often involves substantial energy consumption and environmental costs. This research introduces the EcoAI-Resilience framework, a multi-objective optimization approach designed to maximize the sustainability benefits of AI deployment while minimizing environmental costs and enhancing economic resilience. The framework addresses three critical objectives through mathematical optimization: sustainability impact maximization, economic resilience enhancement, and environmental cost minimization. The methodology integrates diverse data sources, including energy consumption metrics, sustainability indicators, economic performance data, and entrepreneurship outcomes across 53 countries and 14 sectors from 2015-2024. Our experimental validation demonstrates exceptional performance with R scores exceeding 0.99 across all model components, significantly outperforming baseline methods, including Linear Regression (R = 0.943), Random Forest (R = 0.957), and Gradient Boosting (R = 0.989). The framework successfully identifies optimal AI deployment strategies featuring 100\% renewable energy integration, 80% efficiency improvement targets, and optimal investment levels of $202.48 per capita. Key findings reveal strong correlations between economic complexity and resilience (r = 0.82), renewable energy adoption and sustainability outcomes (r = 0.71), and demonstrate significant temporal improvements in AI readiness (+1.12 points/year) and renewable energy adoption (+0.67 year) globally.
Open 2603.08692v1
Split Federated Learning Architectures for High-Accuracy and Low-Delay…
2026-03-09Machine LearningArtificial Intelligencearxiv
Abstract
Can we find a network architecture for ML model training so as to optimize training loss (and thus, accuracy) in Split Federated Learning (SFL)? And can this architecture also reduce training delay and communication overhead? While accuracy is not influenced by how we split the model in ordinary, state-of-the-art SFL, in this work we answer the questions above in the affirmative. Recent Hierarchical SFL (HSFL) architectures adopt a three-tier training structure consisting of clients, (local) aggregators, and a central server. In this architecture, the model is partitioned at two partitioning layers into three sub-models, which are executed across the three tiers. Despite their merits, HSFL architectures overlook the impact of the partitioning layers and client-to-aggregator assignments on accuracy, delay, and overhead. This work explicitly captures the impact of the partitioning layers and client-to-aggregator assignments on accuracy, delay and overhead by formulating a joint optimization problem. We prove that the problem is NP-hard and propose the first accuracy-aware heuristic algorithm that explicitly accounts for model accuracy, while remaining delay-efficient. Simulation results on public datasets show that our approach can improve accuracy by 3%, while reducing delay by 20% and overhead by 50%, compared to state-of-the-art SFL and HSFL schemes.
Open 2603.08687v1
Predicting Conflict Impact on Performance in O-RAN
2026-03-09Networking and Internet Architecturearxiv
Abstract
The O-RAN Alliance promotes the integration of intelligent autonomous agents to control the Radio Access Network (RAN). This improves flexibility, performance, and observability in the RAN, but introduces new challenges, such as the detection and management of conflicts among the intelligent autonomous agents. A solution consists of profiling the agents before deployment to gather statistical information about their decision-making behavior, then using the information to estimate the level of conflict among agents with different goals. This approach enables determining the occurrence of conflicts among agents, but does not provide information about the impact on RAN performance, including potential service degradation. The problem becomes more complex when agents generate control actions at different timescales, which makes conflict severity hard to predict. In this paper, we present a novel approach that fills this gap. Our solution leverages the same data used to determine conflict severity but extends its use to predict the impact of such conflicts on RAN performance based on the frequency at which each agent generates actions, giving more weight to faster applications, which exert control more frequently. Via a prototype, we demonstrate that our solution is viable and accurately predicts conflict impact on RAN performance.
Open 2603.08685v1
Benchmarking Language Modeling for Lossless Compression of Full-Fidelit…
2026-03-09SoundArtificial IntelligenceMachine Learningarxiv
Abstract
Autoregressive "language" models (LMs) trained on raw waveforms can be repurposed for lossless audio compression, but prior work is limited to 8-bit audio, leaving open whether such approaches work for practical settings (16/24-bit) and can compete with existing codecs. We benchmark LM-based compression on full-fidelity audio across diverse domains (music, speech, bioacoustics), sampling rates (16kHz-48kHz), and bit depths (8, 16, 24-bit). Standard sample-level tokenization becomes intractable at higher bit depths due to vocabulary size (65K for 16-bit; 16.7M for 24-bit). We propose Trilobyte, a byte-level tokenization schema for full resolution audio, improving vocabulary scaling from $O(2^{b})$ to $O(1)$ and enabling the first tractable 24-bit LM-based lossless compression. While LMs consistently outperform FLAC and yield state-of-the-art compression at 8-bit and 16-bit, we observe that compression gains become more modest as bit depth increases beyond 8-bit.
Open 2603.08683v1
Structural Causal Bottleneck Models
2026-03-09Machine Learningarxiv
Abstract
We introduce structural causal bottleneck models (SCBMs), a novel class of structural causal models. At the core of SCBMs lies the assumption that causal effects between high-dimensional variables only depend on low-dimensional summary statistics, or bottlenecks, of the causes. SCBMs provide a flexible framework for task-specific dimension reduction while being estimable via standard, simple learning algorithms in practice. We analyse identifiability in SCBMs, connect them to information bottlenecks in the sense of Tishby & Zaslavsky (2015), and illustrate how to estimate them experimentally. We also demonstrate the benefit of bottlenecks for effect estimation in low-sample transfer learning settings. We argue that SCBMs provide an alternative to existing causal dimension reduction frameworks like causal representation learning or causal abstraction learning.
Open 2603.08682v1
ER-Pose: Rethinking Keypoint-Driven Representation Learning for Real-Ti…
2026-03-09Computer Vision and Pattern Recognitionarxiv
Abstract
Single-stage multi-person pose estimation aims to jointly perform human localization and keypoint prediction within a unified framework, offering advantages in inference efficiency and architectural simplicity. Consequently, multi-scale real-time detection architectures, such as YOLO-like models, are widely adopted for real-time pose estimation. However, these approaches typically inherit a box-driven modeling paradigm from object detection, in which pose estimation is implicitly constrained by bounding-box supervision during training. This formulation introduces biases in sample assignment and feature representation, resulting in task misalignment and ultimately limiting pose estimation accuracy. In this work, we revisit box-driven single-stage pose estimation from a keypoint-driven perspective and identify semantic conflicts among parallel objectives as a key source of performance degradation. To address this issue, we propose a keypoint-driven learning paradigm that elevates pose estimation to a primary prediction objective. Specifically, we remove bounding-box prediction and redesign the prediction head to better accommodate the high-dimensional structured representations for pose estimation. We further introduce a keypoint-driven dynamic sample assignment strategy to align training objectives with pose evaluation metrics, enabling dense supervision during training and efficient NMS-free inference. In addition, we propose a smooth OKS-based loss function to stabilize optimization in regression-based pose estimation. Based on these designs, we develop a single-stage multi-person pose estimation framework, termed ER-Pose. On MS COCO and CrowdPose, ER-Pose-n achieves AP improvements of 3.2/6.7 without pre-training and 7.4/4.9 with pre-training respectively compared with the baseline YOLO-Pose. These improvements are achieved with fewer parameters and higher inference efficiency.
Open 2603.08681v1
A New Lower Bound for the Random Offerer Mechanism in Bilateral Trade u…
2026-03-09Machine LearningArtificial IntelligenceComputer Science and Game Theoryarxiv
Abstract
The celebrated Myerson--Satterthwaite theorem shows that in bilateral trade, no mechanism can be simultaneously fully efficient, Bayesian incentive compatible (BIC), and budget balanced (BB). This naturally raises the question of how closely the gains from trade (GFT) achievable by a BIC and BB mechanism can approximate the first-best (fully efficient) benchmark. The optimal BIC and BB mechanism is typically complex and highly distribution-dependent, making it difficult to characterize directly. Consequently, much of the literature analyzes simpler mechanisms such as the Random-Offerer (RO) mechanism and establishes constant-factor guarantees relative to the first-best GFT. An important open question concerns the worst-case performance of the RO mechanism relative to first-best (FB) efficiency. While it was originally hypothesized that the approximation ratio $\frac{\text{GFT}_{\text{FB}}}{\text{GFT}_{\text{RO}}}$ is bounded by $2$, recent work provided counterexamples to this conjecture: Cai et al. proved that the ratio can be strictly larger than $2$, and Babaioff et al. exhibited an explicit example with ratio approximately $2.02$. In this work, we employ AlphaEvolve, an AI-guided evolutionary search framework, to explore the space of value distributions. We identify a new worst-case instance that yields an improved lower bound of $\frac{\text{GFT}_{\text{FB}}}{\text{GFT}_{\text{RO}}} \ge \textbf{2.0749}$. This establishes a new lower bound on the worst-case performance of the Random-Offerer mechanism, demonstrating a wider efficiency gap than previously known.
Open 2603.08679v1
Momentum SVGD-EM for Accelerated Maximum Marginal Likelihood Estimation
2026-03-09Machine Learningarxiv
Abstract
Maximum marginal likelihood estimation (MMLE) can be formulated as the optimization of a free energy functional. From this viewpoint, the Expectation-Maximisation (EM) algorithm admits a natural interpretation as a coordinate descent method over the joint space of model parameters and probability measures. Recently, a significant body of work has adopted this perspective, leading to interacting particle algorithms for MMLE. In this paper, we propose an accelerated version of one such procedure, based on Stein variational gradient descent (SVGD), by introducing Nesterov acceleration in both the parameter updates and in the space of probability measures. The resulting method, termed Momentum SVGD-EM, consistently accelerates convergence in terms of required iterations across various tasks of increasing difficulty, demonstrating effectiveness in both low- and high-dimensional settings.
Open 2603.08676v1
Talking Together: Synthesizing Co-Located 3D Conversations from Audio
2026-03-09Computer Vision and Pattern Recognitionarxiv
Abstract
We tackle the challenging task of generating complete 3D facial animations for two interacting, co-located participants from a mixed audio stream. While existing methods often produce disembodied "talking heads" akin to a video conference call, our work is the first to explicitly model the dynamic 3D spatial relationship -- including relative position, orientation, and mutual gaze -- that is crucial for realistic in-person dialogues. Our system synthesizes the full performance of both individuals, including precise lip-sync, and uniquely allows their relative head poses to be controlled via textual descriptions. To achieve this, we propose a dual-stream architecture where each stream is responsible for one participant's output. We employ speaker's role embeddings and inter-speaker cross-attention mechanisms designed to disentangle the mixed audio and model the interaction. Furthermore, we introduce a novel eye gaze loss to promote natural, mutual eye contact. To power our data-hungry approach, we introduce a novel pipeline to curate a large-scale conversational dataset consisting of over 2 million dyadic pairs from in-the-wild videos. Our method generates fluid, controllable, and spatially aware dyadic animations suitable for immersive applications in VR and telepresence, significantly outperforming existing baselines in perceived realism and interaction coherence.
Open 2603.08674v1
Exp-Force: Experience-Conditioned Pre-Grasp Force Selection with Vision…
2026-03-09Roboticsarxiv
Abstract
Accurate pre-contact grasp force selection is critical for safe and reliable robotic manipulation. Adaptive controllers regulate force after contact but still require a reasonable initial estimate. Starting a grasp with too little force requires reactive adjustment, while starting a grasp with too high a force risks damaging fragile objects. This trade-off is particularly challenging for compliant grippers, whose contact mechanics are difficult to model analytically. We propose Exp-Force, an experience-conditioned framework that predicts the minimum feasible grasping force from a single RGB image. The method retrieves a small set of relevant prior grasping experiences and conditions a vision-language model on these examples for in-context inference, without analytic contact models or manually designed heuristics. On 129 object instances, ExpForce achieves a best-case MAE of 0.43 N, reducing error by 72% over zero-shot inference. In real-world tests on 30 unseen objects, it improves appropriate force selection rate from 63% to 87%. These results demonstrate that Exp-Force enables reliable and generalizable pre-grasp force selection by leveraging prior interaction experiences. http://expforcesubmission.github.io/Exp-Force-Website/
Open 2603.08668v1
Characterization and upgrade of a quantum graph neural network for char…
2026-03-09Machine Learningarxiv
Abstract
In the forthcoming years the LHC experiments are going to be upgraded to benefit from the substantial increase of the LHC instantaneous luminosity, which will lead to larger, denser events, and, consequently, greater complexity in reconstructing charged particle tracks, motivating frontier research in new technologies. Quantum machine learning models are being investigated as potential new approaches to high energy physics (HEP) tasks. We characterize and upgrade a quantum graph neural network (QGNN) architecture for charged particle track reconstruction on a simulated high luminosity dataset. The model operates on a set of event graphs, each built from the hits generated in tracking detector layers by particles produced in proton collisions, performing a classification of the possible hit connections between adjacent layers. In this approach the QGNN is designed as a hybrid architecture, interleaving classical feedforward networks with parametrized quantum circuits. We characterize the interplay between the classical and quantum components. We report on the principal upgrades to the original design, and present new evidence of improved training behavior, specifically in terms of convergence toward the final trained configuration.
Open 2603.08667v1
Cybersecurity AI: Hacking Consumer Robots in the AI Era
2026-03-09Cryptography and Securityarxiv
Abstract
Is robot cybersecurity broken by AI? Consumer robots -- from autonomous lawnmowers to powered exoskeletons and window cleaners -- are rapidly entering homes and workplaces, yet their security remains rooted in assumptions of specialized attacker expertise. This paper presents evidence that Generative AI has fundamentally disrupted robot cybersecurity: what historically required deep knowledge of ROS, ROS 2, and robotic system internals can now be automated by anyone with access to state-of-the-art GenAI tools spearheaded by the open source CAI (Cybersecurity AI). We provide empirical evidence through three case studies: (1) compromising a Hookii autonomous lawnmower robot, uncovering fleet-wide vulnerabilities and data protection violations affecting 267+ connected devices, (2) exploiting a Hypershell powered exoskeleton, demonstrating safety-critical motor control weaknesses and credential exposure including access to over 3,300 internal support emails, and (3) breaching a HOBOT S7 Pro window cleaning robot, achieving unauthenticated BLE command injection and OTA firmware exploitation. Across these platforms, CAI discovered in an automated manner 38 vulnerabilities that would have previously required months of specialized security research. Our findings reveal a stark asymmetry: while offensive capabilities have been democratized through AI, defensive measures often remain lagging behind. We argue that traditional defense-in-depth architectures like the Robot Immune System (RIS) must evolve toward GenAI-native defensive agents capable of matching the speed and adaptability of AI-powered attacks.
Open 2603.08665v1
ImprovedGS+: A High-Performance C++/CUDA Re-Implementation Strategy for…
2026-03-09Computer Vision and Pattern Recognitionarxiv
Abstract
Recent advancements in 3D Gaussian Splatting (3DGS) have shifted the focus toward balancing reconstruction fidelity with computational efficiency. In this work, we propose ImprovedGS+, a high-performance, low-level reinvention of the ImprovedGS strategy, implemented natively within the LichtFeld-Studio framework. By transitioning from high-level Python logic to hardware-optimized C++/CUDA kernels, we achieve a significant reduction in host-device synchronization and training latency. Our implementation introduces a Long-Axis-Split (LAS) CUDA kernel, custom Laplacian-based importance kernels with Non-Maximum Suppression (NMS) for edge scores, and an adaptive Exponential Scale Scheduler. Experimental results on the Mip-NeRF360 dataset demonstrate that ImprovedGS+ establishes a new Pareto-optimal front for scene reconstruction. Our 1M-budget variant outperforms the state-of-the-art MCMC baseline by achieving a 26.8% reduction in training time (saving 17 minutes per session) and utilizing 13.3% fewer Gaussians while maintaining superior visual quality. Furthermore, our full variant demonstrates a 1.28 dB PSNR increase over the ADC baseline with a 38.4% reduction in parametric complexity. These results validate ImprovedGS+ as a scalable, high-speed solution that upholds the core pillars of Speed, Quality, and Usability within the LichtFeld-Studio ecosystem.
Open 2603.08661v1
How Far Can Unsupervised RLVR Scale LLM Training?
2026-03-09Machine LearningComputation and Languagearxiv
Abstract
Unsupervised reinforcement learning with verifiable rewards (URLVR) offers a pathway to scale LLM training beyond the supervision bottleneck by deriving rewards without ground truth labels. Recent works leverage model intrinsic signals, showing promising early gains, yet their potential and limitations remain unclear. In this work, we revisit URLVR and provide a comprehensive analysis spanning taxonomy, theory and extensive experiments. We first classify URLVR methods into intrinsic versus external based on reward sources, then establish a unified theoretical framework revealing that all intrinsic methods converge toward sharpening the model's initial distribution This sharpening mechanism succeeds when initial confidence aligns with correctness but fails catastrophically when misaligned. Through systematic experiments, we show intrinsic rewards consistently follow a rise-then-fall pattern across methods, with collapse timing determined by model prior rather than engineering choices. Despite these scaling limits, we find intrinsic rewards remain valuable in test-time training on small datasets, and propose Model Collapse Step to measure model prior, serving as a practical indicator for RL trainability. Finally, we explore external reward methods that ground verification in computational asymmetries, showing preliminary evidence they may escape the confidence-correctness ceiling. Our findings chart boundaries for intrinsic URLVR while motivating paths toward scalable alternatives.
Open 2603.08660v1
CODA: Difficulty-Aware Compute Allocation for Adaptive Reasoning
2026-03-09Computation and Languagearxiv
Abstract
The emergence of large reasoning models demonstrates that scaling inference-time compute significantly enhances performance on complex tasks. However, it often falls into another trap: overthinking simple problems, where repetitive rationales yield minimal accuracy gains at a disproportionately high cost. This motivates adaptive reasoning: dynamically aligning reasoning depth with instance difficulty. In this paper, we study adaptive reasoning from an optimality perspective, formalizing it as a utility maximization problem where tokens are allocated until the marginal accuracy gain falls below the incremental cost. Based on this, we propose CODA (Compute Allocation by Difficulty Awareness), a method that operationalizes this principle by allocating tokens via a policy-internal difficulty signal. Specifically, CODA estimates difficulty via group-based rollouts and maps it to two non-negative gates that modulate a length-dependent shaping term on top of the binary base reward. The easy-side gate penalizes verbosity on simple instances, whereas the hard-side gate encourages more deliberative rollouts on challenging ones. Across model scales and benchmarks, CODA achieves adaptive reasoning without external annotations or user-provided budgets: on easy tasks, CODA reduces token costs by over 60% while maintaining strong accuracy, whereas on hard tasks it incentivizes more deliberative rollouts to maximize performance.
Open 2603.08659v1
Context-free Self-Conditioned GAN for Trajectory Forecasting
2026-03-09Machine Learningarxiv
Abstract
In this paper, we present a context-free unsupervised approach based on a self-conditioned GAN to learn different modes from 2D trajectories. Our intuition is that each mode indicates a different behavioral moving pattern in the discriminator's feature space. We apply this approach to the problem of trajectory forecasting. We present three different training settings based on self-conditioned GAN, which produce better forecasters. We test our method in two data sets: human motion and road agents. Experimental results show that our approach outperforms previous context-free methods in the least representative supervised labels while performing well in the remaining labels. In addition, our approach outperforms globally in human motion, while performing well in road agents.
Open 2603.08658v1
Clarity and Computational Efficiency of Orbital Boundary Labeling
2026-03-09Human-Computer Interactionarxiv
Abstract
Circular interfaces such as those found on smartwatches, automotive dashboards, cockpit instruments, or in radial visualizations pose unique challenges for placing readable labels. Traditional rectangular labeling methods waste screen space and create visual clutter on these constrained displays. In orbital boundary labeling, the labels (e.g., the features' names) are placed in an annulus-shaped orbit outside of the figure, and each label is connected to its feature using a short, crossing-free leader line. We contribute algorithms to compute two leader styles, orbital-radial and straight-line, for uniform and non-uniform label sizes, optimizing for crossing-free shortest leaders. We evaluate the model and the algorithms with computational experiments and a controlled user experiment. The user experiment reveals that both leader types exhibit similar accuracy, but straight-line leaders yield faster response times.
Open 2603.08657v1
OfficeQA Pro: An Enterprise Benchmark for End-to-End Grounded Reasoning
2026-03-09Artificial IntelligenceComputation and LanguageInformation Retrievalarxiv
Abstract
We introduce OfficeQA Pro, a benchmark for evaluating AI agents on grounded, multi-document reasoning over a large and heterogeneous document corpus. The corpus consists of U.S. Treasury Bulletins spanning nearly 100 years, comprising 89,000 pages and over 26 million numerical values. OfficeQA Pro consists of 133 questions that require precise document parsing, retrieval, and analytical reasoning across both unstructured text and tabular data. Frontier LLMs including Claude Opus 4.6, GPT-5.4, and Gemini 3.1 Pro Preview achieve less than 5% accuracy on OfficeQA Pro when relying on parametric knowledge, and less than 12% with additional access to the web. When provided directly with the document corpus, frontier agents still struggle on over half of questions, scoring 34.1% on average. We find that providing agents with a structured document representation produced by Databricks' ai_parse_document yields a 16.1% average relative performance gain across agents. We conduct additional ablations to study the effects of model selection, table representation, retrieval strategy, and test-time scaling on performance. Despite these improvements, significant headroom remains before agents can be considered reliable at enterprise-grade grounded reasoning.
Open 2603.08655v1
CoCo: Code as CoT for Text-to-Image Preview and Rare Concept Generation
2026-03-09Artificial Intelligencearxiv
Abstract
Recent advancements in Unified Multimodal Models (UMMs) have significantly advanced text-to-image (T2I) generation, particularly through the integration of Chain-of-Thought (CoT) reasoning. However, existing CoT-based T2I methods largely rely on abstract natural-language planning, which lacks the precision required for complex spatial layouts, structured visual elements, and dense textual content. In this work, we propose CoCo (Code-as-CoT), a code-driven reasoning framework that represents the reasoning process as executable code, enabling explicit and verifiable intermediate planning for image generation. Given a text prompt, CoCo first generates executable code that specifies the structural layout of the scene, which is then executed in a sandboxed environment to render a deterministic draft image. The model subsequently refines this draft through fine-grained image editing to produce the final high-fidelity result. To support this training paradigm, we construct CoCo-10K, a curated dataset containing structured draft-final image pairs designed to teach both structured draft construction and corrective visual refinement. Empirical evaluations on StructT2IBench, OneIG-Bench, and LongText-Bench show that CoCo achieves improvements of +68.83%, +54.8%, and +41.23% over direct generation, while also outperforming other generation methods empowered by CoT. These results demonstrate that executable code is an effective and reliable reasoning paradigm for precise, controllable, and structured text-to-image generation. The code is available at: https://github.com/micky-li-hd/CoCo
Open 2603.08652v1
Group Entropies and Mirror Duality: A Class of Flexible Mirror Descent…
2026-03-09Machine Learningarxiv
Abstract
We introduce a comprehensive theoretical and algorithmic framework that bridges formal group theory and group entropies with modern machine learning, paving the way for an infinite, flexible family of Mirror Descent (MD) optimization algorithms. Our approach exploits the rich structure of group entropies, which are generalized entropic functionals governed by group composition laws, encompassing and significantly extending all trace-form entropies such as the Shannon, Tsallis, and Kaniadakis families. By leveraging group-theoretical mirror maps (or link functions) in MD, expressed via multi-parametric generalized logarithms and their inverses (group exponentials), we achieve highly flexible and adaptable MD updates that can be tailored to diverse data geometries and statistical distributions. To this end, we introduce the notion of \textit{mirror duality}, which allows us to seamlessly switch or interchange group-theoretical link functions with their inverses, subject to specific learning rate constraints. By tuning or learning the hyperparameters of the group logarithms enables us to adapt the model to the statistical properties of the training distribution, while simultaneously ensuring desirable convergence characteristics via fine-tuning. This generality not only provides greater flexibility and improved convergence properties, but also opens new perspectives for applications in machine learning and deep learning by expanding the design of regularizers and natural gradient algorithms. We extensively evaluate the validity, robustness, and performance of the proposed updates on large-scale, simplex-constrained quadratic programming problems.
Open 2603.08651v1
Divide and Predict: An Architecture for Input Space Partitioning and En…
2026-03-09Machine Learningarxiv
Abstract
In this article the authors develop an intrinsic measure for quantifying heterogeneity in training data for supervised learning. This measure is the variance of a random variable which factors through the influences of pairs of training points. The variance is shown to capture data heterogeneity and can thus be used to assess if a sample is a mixture of distributions. The authors prove that the data itself contains key information that supports a partitioning into blocks. Several proof of concept studies are provided that quantify the connection between variance and heterogeneity for EMNIST image data and synthetic data. The authors establish that variance is maximal for equal mixes of distributions, and detail how variance-based data purification followed by conventional training over blocks can lead to significant increases in test accuracy.
Open 2603.08649v1
CAST: Modeling Visual State Transitions for Consistent Video Retrieval
2026-03-09Computer Vision and Pattern Recognitionarxiv
Abstract
As video content creation shifts toward long-form narratives, composing short clips into coherent storylines becomes increasingly important. However, prevailing retrieval formulations remain context-agnostic at inference time, prioritizing local semantic alignment while neglecting state and identity consistency. To address this structural limitation, we formalize the task of Consistent Video Retrieval (CVR) and introduce a diagnostic benchmark spanning YouCook2, COIN, and CrossTask. We propose CAST (Context-Aware State Transition), a lightweight, plug-and-play adapter compatible with diverse frozen vision-language embedding spaces. By predicting a state-conditioned residual update ($Δ$) from visual history, CAST introduces an explicit inductive bias for latent state evolution. Extensive experiments show that CAST improves performance on YouCook2 and CrossTask, remains competitive on COIN, and consistently outperforms zero-shot baselines across diverse foundation backbones. Furthermore, CAST provides a useful reranking signal for black-box video generation candidates (e.g., from Veo), promoting more temporally coherent continuations.
Open 2603.08648v1
Grow, Don't Overwrite: Fine-tuning Without Forgetting
2026-03-09Machine Learningarxiv
Abstract
Adapting pre-trained models to specialized tasks often leads to catastrophic forgetting, where new knowledge overwrites foundational capabilities. Existing methods either compromise performance on the new task or struggle to balance training stability with efficient reuse of pre-trained knowledge. We introduce a novel function-preserving expansion method that resolves this dilemma. Our technique expands model capacity by replicating pre-trained parameters within transformer submodules and applying a scaling correction that guarantees the expanded model is mathematically identical to the original at initialization, enabling stable training while exploiting existing knowledge. Empirically, our method eliminates the trade-off between plasticity and stability, matching the performance of full fine-tuning on downstream tasks without any degradation of the model's original capabilities. Furthermore, we demonstrate the modularity of our approach, showing that by selectively expanding a small subset of layers we can achieve the same performance as full fine-tuning at a fraction of the computational cost.
Open 2603.08647v1
On the expressive power of inquisitive team logic and inquisitive first…
2026-03-09Logic in Computer Sciencearxiv
Abstract
Inquisitive team logic is a variant of inquisitive logic interpreted in team semantics, which has been argued to provide a natural setting for the regimentation of dependence claims. With respect to sentences, this logic is known to be expressively equivalent with first-order logic. In this article we show that, on the contrary, the expressive power of open formulas in this logic properly exceeds that of first-order logic. On the way to this result, we show that if inquisitive team logic is extended with the range-generating universal quantifier adopted in dependence logic, the resulting logic can express finiteness, and as a consequence, it is neither compact nor recursively axiomatizable. We further extend our results to standard inquisitive first-order logic, showing that some sentences of this logic express non first-order properties of models.
Open 2603.08646v1
Retrieval-Augmented Gaussian Avatars: Improving Expression Generalizati…
2026-03-09Computer Vision and Pattern RecognitionGraphicsMachine Learningarxiv
Abstract
Template-free animatable head avatars can achieve high visual fidelity by learning expression-dependent facial deformation directly from a subject's capture, avoiding parametric face templates and hand-designed blendshape spaces. However, since learned deformation is supervised only by the expressions observed for a single identity, these models suffer from limited expression coverage and often struggle when driven by motions that deviate from the training distribution. We introduce RAF (Retrieval-Augmented Faces), a simple training-time augmentation designed for template-free head avatars that learn deformation from data. RAF constructs a large unlabeled expression bank and, during training, replaces a subset of the subject's expression features with nearest-neighbor expressions retrieved from this bank while still reconstructing the subject's original frames. This exposes the deformation field to a broader range of expression conditions, encouraging stronger identity-expression decoupling and improving robustness to expression distribution shift without requiring paired cross-identity data, additional annotations, or architectural changes. We further analyze how retrieval augmentation increases expression diversity and validate retrieval quality with a user study showing that retrieved neighbors are perceptually closer in expression and pose. Experiments on the NeRSemble benchmark demonstrate that RAF consistently improves expression fidelity over the baseline, in both self-driving and cross-driving scenarios.
Open 2603.08645v1
Coherence-Aware Over-the-Air Distributed Learning under Heterogeneous L…
2026-03-09Information Theoryarxiv
Abstract
Distributed machine learning (ML) over wireless networks hinges on accurate channel state information (CSI) and efficient exchange of high-dimensional model updates. These demands are governed by channel coherence time and bandwidth, which vary across devices (links) due to heterogeneous mobility and scattering, causing degraded downlink delivery and distorted uplink over-the-air (OTA) aggregation. We propose a coherence-aware federated learning (FL) framework that jointly addresses impairments on downlink and uplink with communication-efficient strategies. In the downlink, we employ product superposition to multiplex global model symbols for long-coherence (static) devices onto the pilot tones required by short-coherence (dynamic) devices for channel estimation, turning pilot overhead into payload while preserving estimation fidelity. In the proposed scheme, an orthogonal frequency-division multiplexing (OFDM) super-block is partitioned into sub-blocks aligned with the smallest coherence time and bandwidth, enabling consistent channel estimation and stabilizing OTA aggregation across heterogeneous devices. Partial model reception at dynamic devices is mitigated via previous local model filling (PLMF), which reuses prior updates. We establish convergence guarantees under heterogeneous link impairments, imperfect CSI, and aggregation noise. The proposed framework enables efficient scheduling under coherence heterogeneity; analysis and experiments demonstrate notable gains in communication efficiency, latency, and learning accuracy over conventional FL baselines.
Open 2603.08641v1
PostTrainBench: Can LLM Agents Automate LLM Post-Training?
2026-03-09Software EngineeringArtificial IntelligenceMachine Learningarxiv
Abstract
AI agents have become surprisingly proficient at software engineering over the past year, largely due to improvements in reasoning capabilities. This raises a deeper question: can these systems extend their capabilities to automate AI research itself? In this paper, we explore post-training, the critical phase that turns base LLMs into useful assistants. We introduce PostTrainBench to benchmark how well LLM agents can perform post-training autonomously under bounded compute constraints (10 hours on one H100 GPU). We ask frontier agents (e.g., Claude Code with Opus 4.6) to optimize the performance of a base LLM on a particular benchmark (e.g., Qwen3-4B on AIME). Importantly, we do not provide any predefined strategies to the agents and instead give them full autonomy to find necessary information on the web, run experiments, and curate data. We find that frontier agents make substantial progress but generally lag behind instruction-tuned LLMs from leading providers: 23.2% for the best agent vs. 51.1% for official instruction-tuned models. However, agents can exceed instruction-tuned models in targeted scenarios: GPT-5.1 Codex Max achieves 89% on BFCL with Gemma-3-4B vs. 67% for the official model. We also observe several failure modes worth flagging. Agents sometimes engage in reward hacking: training on the test set, downloading existing instruction-tuned checkpoints instead of training their own, and using API keys they find to generate synthetic data without authorization. These behaviors are concerning and highlight the importance of careful sandboxing as these systems become more capable. Overall, we hope PostTrainBench will be useful for tracking progress in AI R&D automation and for studying the risks that come with it. Website and code are available at https://posttrainbench.com/.
Open 2603.08640v1