This Week In Computer Science Papers

Week beginning 9th February 2026

Tap a tile to open details. Use the left sidebar to filter by category.

Hover a tile to preview the abstract. Click to open details. Use the sidebar to filter by category.

No filters applied

Showing 1–36 of 581

SurfPhase: 3D Interfacial Dynamics in Two-Phase Flows from Sparse Videos

2026-02-11Computer Vision and Pattern Recognitionarxiv

Interfacial dynamics in two-phase flows govern momentum, heat, and mass transfer, yet remain difficult to measure experimentally. Classical techniques face intrinsic limitations near moving interfaces, while existing neural rendering methods target single-phase flows with diffuse boundaries and cannot handle sharp, deformable liquid-vapor interfaces. We propose SurfPhase, a novel model for reconstructing 3D interfacial dynamics from sparse camera views. Our approach integrates dynamic Gaussian surfels with a signed distance function formulation for geometric consistency, and leverages a video diffusion model to synthesize novel-view videos to refine reconstruction from sparse observations. We evaluate on a new dataset of high-speed pool boiling videos, demonstrating high-quality view synthesis and velocity estimation from only two camera views. Project website: https://yuegao.me/SurfPhase.

Open → 2602.11154v1

Utilitarian Distortion Under Probabilistic Voting

2026-02-11Computer Science and Game Theoryarxiv

Abstract

The utilitarian distortion framework evaluates voting rules by their worst-case efficiency loss when voters have cardinal utilities but express only ordinal rankings. Under the classical model, a longstanding tension exists: Plurality, which suffers from the spoiler effect, achieves optimal $Θ(m^2)$ distortion among deterministic rules, while normatively superior rules like Copeland and Borda have unbounded distortion. We resolve this tension under probabilistic voting with the Plackett-Luce model, where rankings are noisy reflections of utilities governed by an inverse temperature parameter $β$. Copeland and Borda both achieve at most $β\frac{1+e^{-β}}{1-e^{-β}}$ distortion, independent of the number of candidates $m$, and within a factor of 2 of the lower bound for randomized rules satisfying the probabilistic Condorcet loser criterion known from prior work. This improves upon the prior $O(β^2)$ bound for Borda. These upper bounds are nearly tight: prior work establishes a $(1-o(1))β$ lower bound for Borda, and we prove a $(1-ε)β$ lower bound for Copeland for any constant $ε>0$. In contrast, rules that rely only on top-choice information fare worse: Plurality has distortion $Ω(\min(e^β, m))$ and Random Dictator has distortion $Θ(m)$. Additional `veto' information is also insufficient to remove the dependence on $m$; Plurality Veto and Pruned Plurality Veto have distortion $Ω(β\ln m)$. We also prove a lower bound of $(\frac{5}{8}-ε)β$ (for any constant $ε>0$) for all deterministic finite-precision tournament-based rules, a class that includes Copeland and any rule based on pairwise comparison margins rounded to fixed precision. Our results show that the distortion framework aligns with normative intuitions once the probabilistic nature of real-world voting is taken into account.

Open → 2602.11152v1

Diffusion-Pretrained Dense and Contextual Embeddings

2026-02-11Machine LearningComputation and LanguageInformation Retrievalarxiv

Abstract

In this report, we introduce pplx-embed, a family of multilingual embedding models that employ multi-stage contrastive learning on a diffusion-pretrained language model backbone for web-scale retrieval. By leveraging bidirectional attention through diffusion-based pretraining, our models capture comprehensive bidirectional context within passages, enabling the use of mean pooling and a late chunking strategy to better preserve global context across long documents. We release two model types: pplx-embed-v1 for standard retrieval, and pplx-embed-context-v1 for contextualized embeddings that incorporate global document context into passage representations. pplx-embed-v1 achieves competitive performance on the MTEB(Multilingual, v2), MTEB(Code), MIRACL, BERGEN, and ToolRet retrieval benchmarks, while pplx-embed-context-v1 sets new records on the ConTEB benchmark. Beyond public benchmarks, pplx-embed-v1 demonstrates strong performance on our internal evaluation suite, which focuses on real-world, large-scale search scenarios over tens of millions of documents. These results validate the models' effectiveness in production environments where retrieval quality and efficiency are critical at scale.

Open → 2602.11151v1

YOR: Your Own Mobile Manipulator for Generalizable Robotics

2026-02-11RoboticsMachine Learningarxiv

Abstract

Recent advances in robot learning have generated significant interest in capable platforms that may eventually approach human-level competence. This interest, combined with the commoditization of actuators, has propelled growth in low-cost robotic platforms. However, the optimal form factor for mobile manipulation, especially on a budget, remains an open question. We introduce YOR, an open-source, low-cost mobile manipulator that integrates an omnidirectional base, a telescopic vertical lift, and two arms with grippers to achieve whole-body mobility and manipulation. Our design emphasizes modularity, ease of assembly using off-the-shelf components, and affordability, with a bill-of-materials cost under 10,000 USD. We demonstrate YOR's capability by completing tasks that require coordinated whole-body control, bimanual manipulation, and autonomous navigation. Overall, YOR offers competitive functionality for mobile manipulation research at a fraction of the cost of existing platforms. Project website: https://www.yourownrobot.ai/

Open → 2602.11150v1

Data Repetition Beats Data Scaling in Long-CoT Supervised Fine-Tuning

2026-02-11Computation and Languagearxiv

Abstract

Supervised fine-tuning (SFT) on chain-of-thought data is an essential post-training step for reasoning language models. Standard machine learning intuition suggests that training with more unique training samples yields better generalization. Counterintuitively, we show that SFT benefits from repetition: under a fixed update budget, training for more epochs on smaller datasets outperforms single-epoch training on larger datasets. On AIME'24/25 and GPQA benchmarks, Olmo3-7B trained for 128 epochs on 400 samples outperforms the equivalent 1 epoch on 51200 samples by 12-26 percentage points, with no additional catastrophic forgetting. We find that training token accuracy reliably signals when repetition has saturated; improvements from additional epochs plateau at full memorization, a pattern consistent across all settings. These findings provide a practical approach for reasoning SFT, where scaling epochs with token accuracy as a stopping criterion can replace expensive undirected data scaling. We pose the repetition advantage, where full memorization coincides with improved generalization, as a new open problem for the community in understanding the training dynamics of large language models.

Open → 2602.11149v1

Let Leaders Play Games: Improving Timing in Leader-based Consensus

2026-02-11Computer Science and Game Theoryarxiv

Abstract

Propagation latency is inherent to any distributed network, including blockchains. Typically, blockchain protocols provide a timing buffer for block propagation across the network. In leader-based blockchains, the leader -- block proposer -- is known in advance for each slot. A fast (or low-latency) proposer may delay the block proposal in anticipation of more rewards from the transactions that would otherwise be included in the subsequent block. Deploying such a strategy by manipulating the timing is known as timing games. It increases the risk of missed blocks due to reduced time for other nodes to vote on the block, affecting the overall efficiency of the blockchain. Moreover, proposers who play timing games essentially appropriate MEV (additional rewards over transaction fees and the block reward) that would otherwise accrue to the next block, making it unfair to subsequent block proposers. We propose a double-block proposal mechanism, 2-Prop, to curtail timing games. 2-Prop selects two proposers per slot to propose blocks and confirms one of them. We design a reward-sharing policy for proposers based on how quickly their blocks propagate to avoid strategic deviations. In the induced game, which we call the Latency Game, we show that it is a Nash Equilibrium for the proposers to propose the block without delay under homogeneous network settings. Under heterogeneous network settings, we study many configurations, and our analysis shows that a faster proposer would prefer not to delay unless the other proposer is extremely slow. Thus, we show the efficacy of 2-Prop in mitigating the effect of timing games.

Open → 2602.11147v1

Beyond VLM-Based Rewards: Diffusion-Native Latent Reward Modeling

2026-02-11Computer Vision and Pattern RecognitionArtificial Intelligencearxiv

Abstract

Preference optimization for diffusion and flow-matching models relies on reward functions that are both discriminatively robust and computationally efficient. Vision-Language Models (VLMs) have emerged as the primary reward provider, leveraging their rich multimodal priors to guide alignment. However, their computation and memory cost can be substantial, and optimizing a latent diffusion generator through a pixel-space reward introduces a domain mismatch that complicates alignment. In this paper, we propose DiNa-LRM, a diffusion-native latent reward model that formulates preference learning directly on noisy diffusion states. Our method introduces a noise-calibrated Thurstone likelihood with diffusion-noise-dependent uncertainty. DiNa-LRM leverages a pretrained latent diffusion backbone with a timestep-conditioned reward head, and supports inference-time noise ensembling, providing a diffusion-native mechanism for test-time scaling and robust rewarding. Across image alignment benchmarks, DiNa-LRM substantially outperforms existing diffusion-based reward baselines and achieves performance competitive with state-of-the-art VLMs at a fraction of the computational cost. In preference optimization, we demonstrate that DiNa-LRM improves preference optimization dynamics, enabling faster and more resource-efficient model alignment.

Open → 2602.11146v1

SCRAPL: Scattering Transform with Random Paths for Machine Learning

2026-02-11SoundMachine Learningarxiv

Abstract

The Euclidean distance between wavelet scattering transform coefficients (known as paths) provides informative gradients for perceptual quality assessment of deep inverse problems in computer vision, speech, and audio processing. However, these transforms are computationally expensive when employed as differentiable loss functions for stochastic gradient descent due to their numerous paths, which significantly limits their use in neural network training. Against this problem, we propose "Scattering transform with Random Paths for machine Learning" (SCRAPL): a stochastic optimization scheme for efficient evaluation of multivariable scattering transforms. We implement SCRAPL for the joint time-frequency scattering transform (JTFS) which demodulates spectrotemporal patterns at multiple scales and rates, allowing a fine characterization of intermittent auditory textures. We apply SCRAPL to differentiable digital signal processing (DDSP), specifically, unsupervised sound matching of a granular synthesizer and the Roland TR-808 drum machine. We also propose an initialization heuristic based on importance sampling, which adapts SCRAPL to the perceptual content of the dataset, improving neural network convergence and evaluation performance. We make our code and audio samples available and provide SCRAPL as a Python package.

Open → 2602.11145v1

GENIUS: Generative Fluid Intelligence Evaluation Suite

2026-02-11Machine LearningArtificial IntelligenceComputer Vision and Pattern Recognitionarxiv

Abstract

Unified Multimodal Models (UMMs) have shown remarkable progress in visual generation. Yet, existing benchmarks predominantly assess $\textit{Crystallized Intelligence}$, which relies on recalling accumulated knowledge and learned schemas. This focus overlooks $\textit{Generative Fluid Intelligence (GFI)}$: the capacity to induce patterns, reason through constraints, and adapt to novel scenarios on the fly. To rigorously assess this capability, we introduce $\textbf{GENIUS}$ ($\textbf{GEN}$ Fluid $\textbf{I}$ntelligence Eval$\textbf{U}$ation $\textbf{S}$uite). We formalize $\textit{GFI}$ as a synthesis of three primitives. These include $\textit{Inducing Implicit Patterns}$ (e.g., inferring personalized visual preferences), $\textit{Executing Ad-hoc Constraints}$ (e.g., visualizing abstract metaphors), and $\textit{Adapting to Contextual Knowledge}$ (e.g., simulating counter-intuitive physics). Collectively, these primitives challenge models to solve problems grounded entirely in the immediate context. Our systematic evaluation of 12 representative models reveals significant performance deficits in these tasks. Crucially, our diagnostic analysis disentangles these failure modes. It demonstrates that deficits stem from limited context comprehension rather than insufficient intrinsic generative capability. To bridge this gap, we propose a training-free attention intervention strategy. Ultimately, $\textbf{GENIUS}$ establishes a rigorous standard for $\textit{GFI}$, guiding the field beyond knowledge utilization toward dynamic, general-purpose reasoning. Our dataset and code will be released at: $\href{https://github.com/arctanxarc/GENIUS}{https://github.com/arctanxarc/GENIUS}$.

Open → 2602.11144v1

APEX: Learning Adaptive High-Platform Traversal for Humanoid Robots

2026-02-11Roboticsarxiv

Abstract

Humanoid locomotion has advanced rapidly with deep reinforcement learning (DRL), enabling robust feet-based traversal over uneven terrain. Yet platforms beyond leg length remain largely out of reach because current RL training paradigms often converge to jumping-like solutions that are high-impact, torque-limited, and unsafe for real-world deployment. To address this gap, we propose APEX, a system for perceptive, climbing-based high-platform traversal that composes terrain-conditioned behaviors: climb-up and climb-down at vertical edges, walking or crawling on the platform, and stand-up and lie-down for posture reconfiguration. Central to our approach is a generalized ratchet progress reward for learning contact-rich, goal-reaching maneuvers. It tracks the best-so-far task progress and penalizes non-improving steps, providing dense yet velocity-free supervision that enables efficient exploration under strong safety regularization. Based on this formulation, we train LiDAR-based full-body maneuver policies and reduce the sim-to-real perception gap through a dual strategy: modeling mapping artifacts during training and applying filtering and inpainting to elevation maps during deployment. Finally, we distill all six skills into a single policy that autonomously selects behaviors and transitions based on local geometry and commands. Experiments on a 29-DoF Unitree G1 humanoid demonstrate zero-shot sim-to-real traversal of 0.8 meter platforms (approximately 114% of leg length), with robust adaptation to platform height and initial pose, as well as smooth and stable multi-skill transitions.

Open → 2602.11143v1

Data-Efficient Hierarchical Goal-Conditioned Reinforcement Learning via…

2026-02-11RoboticsArtificial IntelligenceMachine Learningarxiv

Abstract

Hierarchical goal-conditioned reinforcement learning (H-GCRL) provides a powerful framework for tackling complex, long-horizon tasks by decomposing them into structured subgoals. However, its practical adoption is hindered by poor data efficiency and limited policy expressivity, especially in offline or data-scarce regimes. In this work, Normalizing flow-based hierarchical implicit Q-learning (NF-HIQL), a novel framework that replaces unimodal gaussian policies with expressive normalizing flow policies at both the high- and low-levels of the hierarchy is introduced. This design enables tractable log-likelihood computation, efficient sampling, and the ability to model rich multimodal behaviors. New theoretical guarantees are derived, including explicit KL-divergence bounds for Real-valued non-volume preserving (RealNVP) policies and PAC-style sample efficiency results, showing that NF-HIQL preserves stability while improving generalization. Empirically, NF-HIQL is evaluted across diverse long-horizon tasks in locomotion, ball-dribbling, and multi-step manipulation from OGBench. NF-HIQL consistently outperforms prior goal-conditioned and hierarchical baselines, demonstrating superior robustness under limited data and highlighting the potential of flow-based architectures for scalable, data-efficient hierarchical reinforcement learning.

Open → 2602.11142v1

LCIP: Loss-Controlled Inverse Projection of High-Dimensional Image Data

2026-02-11Human-Computer InteractionMachine Learningarxiv

Abstract

Projections (or dimensionality reduction) methods $P$ aim to map high-dimensional data to typically 2D scatterplots for visual exploration. Inverse projection methods $P^{-1}$ aim to map this 2D space to the data space to support tasks such as data augmentation, classifier analysis, and data imputation. Current $P^{-1}$ methods suffer from a fundamental limitation -- they can only generate a fixed surface-like structure in data space, which poorly covers the richness of this space. We address this by a new method that can `sweep' the data space under user control. Our method works generically for any $P$ technique and dataset, is controlled by two intuitive user-set parameters, and is simple to implement. We demonstrate it by an extensive application involving image manipulation for style transfer.

Open → 2602.11141v1

Reed-Muller Error-Correction Code Encoder for SFQ-to-CMOS Interface Cir…

2026-02-11Hardware Architecturearxiv

Abstract

Data transmission from superconducting digital electronics such as single flux quantum (SFQ) logic to semiconductor (CMOS) circuits is subject to bit errors due to, e.g., flux trapping, process parameter variations (PPV), and fabrication defects. In this paper, a lightweight hardware-efficient error-correction code encoder is designed and analyzed. Particularly, a Reed-Muller code RM(1,3) encoder is implemented with SFQ digital logic. The proposed RM(1,3) encoder converts a 4-bit message into an 8-bit codeword and can detect and correct up to 3- and 1-bit errors, respectively. This encoder circuit is designed using MIT-LL SFQ5ee process and SuperTools/ColdFlux RSFQ cell library. A simulation framework integrating JoSIM simulator and MATLAB script for automated data collection and analysis, is proposed to study the performance of RM(1,3) encoder. The proposed encoder improves the probability of having no bit errors by 6.7% as compared to an encoder-less design under $\pm20\%$ PPV. With $\pm15\%$ and lower PPV, the proposed encoder could correct all errors with at least 99.1% probability. The impact of fabrication defects such as open circuit faults on the encoder circuit is also studied using the proposed framework.

Open → 2602.11140v1

TabICLv2: A better, faster, scalable, and open tabular foundation model

2026-02-11Machine Learningarxiv

Abstract

Tabular foundation models, such as TabPFNv2 and TabICL, have recently dethroned gradient-boosted trees at the top of predictive benchmarks, demonstrating the value of in-context learning for tabular data. We introduce TabICLv2, a new state-of-the-art foundation model for regression and classification built on three pillars: (1) a novel synthetic data generation engine designed for high pretraining diversity; (2) various architectural innovations, including a new scalable softmax in attention improving generalization to larger datasets without prohibitive long-sequence pretraining; and (3) optimized pretraining protocols, notably replacing AdamW with the Muon optimizer. On the TabArena and TALENT benchmarks, TabICLv2 without any tuning surpasses the performance of the current state of the art, RealTabPFN-2.5 (hyperparameter-tuned, ensembled, and fine-tuned on real data). With only moderate pretraining compute, TabICLv2 generalizes effectively to million-scale datasets under 50GB GPU memory while being markedly faster than RealTabPFN-2.5. We provide extensive ablation studies to quantify these contributions and commit to open research by first releasing inference code and model weights at https://github.com/soda-inria/tabicl, with synthetic data engine and pretraining code to follow.

Open → 2602.11139v1

Weight Decay Improves Language Model Plasticity

2026-02-11Machine LearningArtificial IntelligenceComputation and Languagearxiv

Abstract

The prevailing paradigm in large language model (LLM) development is to pretrain a base model, then perform further training to improve performance and model behavior. However, hyperparameter optimization and scaling laws have been studied primarily from the perspective of the base model's validation loss, ignoring downstream adaptability. In this work, we study pretraining from the perspective of model plasticity, that is, the ability of the base model to successfully adapt to downstream tasks through fine-tuning. We focus on the role of weight decay, a key regularization parameter during pretraining. Through systematic experiments, we show that models trained with larger weight decay values are more plastic, meaning they show larger performance gains when fine-tuned on downstream tasks. This phenomenon can lead to counterintuitive trade-offs where base models that perform worse after pretraining can perform better after fine-tuning. Further investigation of weight decay's mechanistic effects on model behavior reveals that it encourages linearly separable representations, regularizes attention matrices, and reduces overfitting on the training data. In conclusion, this work demonstrates the importance of using evaluation metrics beyond cross-entropy loss for hyperparameter optimization and casts light on the multifaceted role of that a single optimization hyperparameter plays in shaping model behavior.

Open → 2602.11137v1

FormalJudge: A Neuro-Symbolic Paradigm for Agentic Oversight

2026-02-11Artificial Intelligencearxiv

Abstract

As LLM-based agents increasingly operate in high-stakes domains with real-world consequences, ensuring their behavioral safety becomes paramount. The dominant oversight paradigm, LLM-as-a-Judge, faces a fundamental dilemma: how can probabilistic systems reliably supervise other probabilistic systems without inheriting their failure modes? We argue that formal verification offers a principled escape from this dilemma, yet its adoption has been hindered by a critical bottleneck: the translation from natural language requirements to formal specifications. This paper bridges this gap by proposing , a neuro-symbolic framework that employs a bidirectional Formal-of-Thought architecture: LLMs serve as specification compilers that top-down decompose high-level human intent into atomic, verifiable constraints, then bottom-up prove compliance using Dafny specifications and Z3 Satisfiability modulo theories solving, which produces mathematical guarantees rather than probabilistic scores. We validate across three benchmarks spanning behavioral safety, multi-domain constraint adherence, and agentic upward deception detection. Experiments on 7 agent models demonstrate that achieves an average improvement of 16.6% over LLM-as-a-Judge baselines, enables weak-to-strong generalization where a 7B judge achieves over 90% accuracy detecting deception from 72B agents, and provides near-linear safety improvement through iterative refinement.

Open → 2602.11136v1

Just on Time: Token-Level Early Stopping for Diffusion Language Models

2026-02-11Machine LearningComputation and Languagearxiv

Abstract

Diffusion language models generate text through iterative refinement, a process that is often computationally inefficient because many tokens reach stability long before the final denoising step. We introduce a training-free, token-level early stopping approach that identifies convergence independently at each position. Our method leverages lightweight signals derived from the model's predictions and local context to dynamically determine when individual tokens can be finalized. This yields adaptive per-token freezing without task-specific fine-tuning, substantially reducing the total number of diffusion steps required. Across diverse benchmarks, spanning mathematical reasoning, general question answering, and scientific understanding, our approach achieves state-of-the-art efficiency gains while preserving generation quality.

Open → 2602.11133v1

From Circuits to Dynamics: Understanding and Stabilizing Failure in 3D…

2026-02-11Machine LearningComputer Vision and Pattern Recognitionarxiv

Abstract

Reliable surface completion from sparse point clouds underpins many applications spanning content creation and robotics. While 3D diffusion transformers attain state-of-the-art results on this task, we uncover that they exhibit a catastrophic mode of failure: arbitrarily small on-surface perturbations to the input point cloud can fracture the output into multiple disconnected pieces -- a phenomenon we call Meltdown. Using activation-patching from mechanistic interpretability, we localize Meltdown to a single early denoising cross-attention activation. We find that the singular-value spectrum of this activation provides a scalar proxy: its spectral entropy rises when fragmentation occurs and returns to baseline when patched. Interpreted through diffusion dynamics, we show that this proxy tracks a symmetry-breaking bifurcation of the reverse process. Guided by this insight, we introduce PowerRemap, a test-time control that stabilizes sparse point-cloud conditioning. We demonstrate that Meltdown persists across state-of-the-art architectures (WaLa, Make-a-Shape), datasets (GSO, SimJEB) and denoising strategies (DDPM, DDIM), and that PowerRemap effectively counters this failure with stabilization rates of up to 98.3%. Overall, this work is a case study on how diffusion model behavior can be understood and guided based on mechanistic analysis, linking a circuit-level cross-attention mechanism to diffusion-dynamics accounts of trajectory bifurcations.

Open → 2602.11130v1

Information-Theoretic Thresholds for Bipartite Latent-Space Graphs Unde…

2026-02-11Information Theoryarxiv

Abstract

We study information-theoretic phase transitions for the detectability of latent geometry in bipartite random geometric graphs RGGs with Gaussian d-dimensional latent vectors while only a subset of edges carries latent information determined by a random mask with i.i.d. Bern(q) entries. For any fixed edge density p in (0,1) we determine essentially tight thresholds for this problem as a function of d and q. Our results show that the detection problem is substantially easier if the mask is known upfront compared to the case where the mask is hidden. Our analysis is built upon a novel Fourier-analytic framework for bounding signed subgraph counts in Gaussian random geometric graphs that exploits cancellations which arise after approximating characteristic functions by an appropriate power series. The resulting bounds are applicable to much larger subgraphs than considered in previous work which enables tight information-theoretic bounds, while the bounds considered in previous works only lead to lower bounds from the lens of low-degree polynomials. As a consequence we identify the optimal information-theoretic thresholds and rule out computational-statistical gaps. Our bounds further improve upon the bounds on Fourier coefficients of random geometric graphs recently given by Bangachev and Bresler [STOC'24] in the dense, bipartite case. The techniques also extend to sparser and non-bipartite settings, at least if the considered subgraphs are sufficiently small. We furhter believe that they might help resolve open questions for related detection problems.

Open → 2602.11129v1

Asymmetric Prompt Weighting for Reinforcement Learning with Verifiable…

2026-02-11Machine Learningarxiv

Abstract

Reinforcement learning with verifiable rewards has driven recent advances in LLM post-training, in particular for reasoning. Policy optimization algorithms generate a number of responses for a given prompt and then effectively weight the corresponding gradients depending on the rewards. The most popular algorithms including GRPO, DAPO, and RLOO focus on ambiguous prompts, i.e., prompts with intermediate success probability, while downgrading gradients with very easy and very hard prompts. In this paper, we consider asymmetric prompt weightings that assign higher weights to prompts with low, or even zero, empirical success probability. We find that asymmetric weighting particularly benefits from-scratch RL (as in R1-Zero), where training traverses a wide accuracy range, and less so in post-SFT RL where the model already starts at high accuracy. We also provide theory that characterizes prompt weights which minimize the time needed to raise success probability from an initial level to a target accuracy under a fixed update budget. In low-success regimes, where informative responses are rare and response cost dominates, these optimal weights become asymmetric, upweighting low success probabilities and thereby accelerating effective-time convergence.

Open → 2602.11128v1

The Offline-Frontier Shift: Diagnosing Distributional Limits in Generat…

2026-02-11Machine Learningarxiv

Abstract

Offline multi-objective optimization (MOO) aims to recover Pareto-optimal designs given a finite, static dataset. Recent generative approaches, including diffusion models, show strong performance under hypervolume, yet their behavior under other established MOO metrics is less understood. We show that generative methods systematically underperform evolutionary alternatives with respect to other metrics, such as generational distance. We relate this failure mode to the offline-frontier shift, i.e., the displacement of the offline dataset from the Pareto front, which acts as a fundamental limitation in offline MOO. We argue that overcoming this limitation requires out-of-distribution sampling in objective space (via an integral probability metric) and empirically observe that generative methods remain conservatively close to the offline objective distribution. Our results position offline MOO as a distribution-shift--limited problem and provide a diagnostic lens for understanding when and why generative optimization methods fail.

Open → 2602.11126v1

Min-Sum Uniform Coverage Problem by Autonomous Mobile Robots

2026-02-11Distributed, Parallel, and Cluster ComputingRoboticsarxiv

Abstract

We study the \textit{min-sum uniform coverage} problem for a swarm of $n$ mobile robots on a given finite line segment and on a circle having finite positive radius, where the circle is given as an input. The robots must coordinate their movements to reach a uniformly spaced configuration that minimizes the total distance traveled by all robots. The robots are autonomous, anonymous, identical, and homogeneous, and operate under the \textit{Look-Compute-Move} (LCM) model with \textit{non-rigid} motion controlled by a fair asynchronous scheduler. They are oblivious and silent, possessing neither persistent memory nor a means of explicit communication. In the \textbf{line-segment setting}, the \textit{min-sum uniform coverage} problem requires placing the robots at uniformly spaced points along the segment so as to minimize the total distance traveled by all robots. In the \textbf{circle setting} for this problem, the robots have to arrange themselves uniformly around the given circle to form a regular $n$-gon. There is no fixed orientation or designated starting vertex, and the goal is to minimize the total distance traveled by all the robots. We present a deterministic distributed algorithm that achieves uniform coverage in the line-segment setting with minimum total movement cost. For the circle setting, we characterize all initial configurations for which the \textit{min-sum uniform coverage} problem is deterministically unsolvable under the considered robot model. For all the other remaining configurations, we provide a deterministic distributed algorithm that achieves uniform coverage while minimizing the total distance traveled. These results characterize the deterministic solvability of min-sum coverage for oblivious robots and achieve optimal cost whenever solvable.

Open → 2602.11125v1

PhyCritic: Multimodal Critic Models for Physical AI

2026-02-11Computer Vision and Pattern Recognitionarxiv

Abstract

With the rapid development of large multimodal models, reliable judge and critic models have become essential for open-ended evaluation and preference alignment, providing pairwise preferences, numerical scores, and explanatory justifications for assessing model-generated responses. However, existing critics are primarily trained in general visual domains such as captioning or image question answering, leaving physical AI tasks involving perception, causal reasoning, and planning largely underexplored. We introduce PhyCritic, a multimodal critic model optimized for physical AI through a two-stage RLVR pipeline: a physical skill warmup stage that enhances physically oriented perception and reasoning, followed by self-referential critic finetuning, where the critic generates its own prediction as an internal reference before judging candidate responses, improving judgment stability and physical correctness. Across both physical and general-purpose multimodal judge benchmarks, PhyCritic achieves strong performance gains over open-source baselines and, when applied as a policy model, further improves perception and reasoning in physically grounded tasks.

Open → 2602.11124v1

From Natural Language to Materials Discovery:The Materials Knowledge Na…

2026-02-11Machine Learningarxiv

Abstract

Accelerating the discovery of high-performance materials remains a central challenge across energy, electronics, and aerospace technologies, where traditional workflows depend heavily on expert intuition and computationally expensive simulations. Here we introduce the Materials Knowledge Navigation Agent (MKNA), a language-driven system that translates natural-language scientific intent into executable actions for database retrieval, property prediction, structure generation, and stability evaluation. Beyond automating tool invocation, MKNA autonomously extracts quantitative thresholds and chemically meaningful design motifs from literature and database evidence, enabling data-grounded hypothesis formation. Applied to the search for high-Debye-temperature ceramics, the agent identifies a literature-supported screening criterion (Theta_D > 800 K), rediscovers canonical ultra-stiff materials such as diamond, SiC, SiN, and BeO, and proposes thermodynamically stable, previously unreported Be-C-rich compounds that populate the sparsely explored 1500-1700 K regime. These results demonstrate that MKNA not only finds stable candidates but also reconstructs interpretable design heuristics, establishing a generalizable platform for autonomous, language-guided materials exploration.

Open → 2602.11123v1

HairWeaver: Few-Shot Photorealistic Hair Motion Synthesis with Sim-to-R…

2026-02-11Computer Vision and Pattern Recognitionarxiv

Abstract

We present HairWeaver, a diffusion-based pipeline that animates a single human image with realistic and expressive hair dynamics. While existing methods successfully control body pose, they lack specific control over hair, and as a result, fail to capture the intricate hair motions, resulting in stiff and unrealistic animations. HairWeaver overcomes this limitation using two specialized modules: a Motion-Context-LoRA to integrate motion conditions and a Sim2Real-Domain-LoRA to preserve the subject's photoreal appearance across different data domains. These lightweight components are designed to guide a video diffusion backbone while maintaining its core generative capabilities. By training on a specialized dataset of dynamic human motion generated from a CG simulator, HairWeaver affords fine control over hair motion and ultimately learns to produce highly realistic hair that responds naturally to movement. Comprehensive evaluations demonstrate that our approach sets a new state of the art, producing lifelike human hair animations with dynamic details.

Open → 2602.11117v1

Multi-UAV Trajectory Optimization for Bearing-Only Localization in GPS…

2026-02-11Roboticsarxiv

Abstract

Accurate localization of maritime targets by unmanned aerial vehicles (UAVs) remains challenging in GPS-denied environments. UAVs equipped with gimballed electro-optical sensors are typically used to localize targets, however, reliance on these sensors increases mechanical complexity, cost, and susceptibility to single-point failures, limiting scalability and robustness in multi-UAV operations. This work presents a new trajectory optimization framework that enables cooperative target localization using UAVs with fixed, non-gimballed cameras operating in coordination with a surface vessel. This estimation-aware optimization generates dynamically feasible trajectories that explicitly account for mission constraints, platform dynamics, and out-of-frame events. Estimation-aware trajectories outperform heuristic paths by reducing localization error by more than a factor of two, motivating their use in cooperative operations. Results further demonstrate that coordinated UAVs with fixed, non-gimballed cameras achieve localization accuracy that meets or exceeds that of single gimballed systems, while substantially lowering system complexity and cost, enabling scalability, and enhancing mission resilience.

Open → 2602.11116v1

Learning to Compose for Cross-domain Agentic Workflow Generation

2026-02-11Multiagent SystemsArtificial IntelligenceMachine Learningarxiv

Abstract

Automatically generating agentic workflows -- executable operator graphs or codes that orchestrate reasoning, verification, and repair -- has become a practical way to solve complex tasks beyond what single-pass LLM generation can reliably handle. Yet what constitutes a good workflow depends heavily on the task distribution and the available operators. Under domain shift, current systems typically rely on iterative workflow refinement to discover a feasible workflow from a large workflow space, incurring high iteration costs and yielding unstable, domain-specific behavior. In response, we internalize a decompose-recompose-decide mechanism into an open-source LLM for cross-domain workflow generation. To decompose, we learn a compact set of reusable workflow capabilities across diverse domains. To recompose, we map each input task to a sparse composition over these bases to generate a task-specific workflow in a single pass. To decide, we attribute the success or failure of workflow generation to counterfactual contributions from learned capabilities, thereby capturing which capabilities actually drive success by their marginal effects. Across stringent multi-domain, cross-domain, and unseen-domain evaluations, our 1-pass generator surpasses SOTA refinement baselines that consume 20 iterations, while substantially reducing generation latency and cost.

Open → 2602.11114v1

A receding-horizon multi-contact motion planner for legged robots in ch…

2026-02-11Roboticsarxiv

Abstract

We present a novel receding-horizon multi-contact motion planner for legged robots in challenging scenarios, able to plan motions such as chimney climbing, navigating very narrow passages or crossing large gaps. Our approach adds new capabilities to the state of the art, including the ability to reactively re-plan in response to new information, and planning contact locations and whole-body trajectories simultaneously, simplifying the implementation and removing the need for post-processing or complex multi-stage approaches. Our method is more resistant to local minima problems than other potential field based approaches, and our quadratic-program-based posture generator returns nodes more quickly than those of existing algorithms. Rigorous statistical analysis shows that, with short planning horizons (e.g., one step ahead), our planner is faster than the state-of-the-art across all scenarios tested (between 45% and 98% faster on average, depending on the scenario), while planning less efficient motions (requiring 5% fewer to 700% more stance changes on average). In all but one scenario (Chimney Walking), longer planning horizons (e.g., four steps ahead) extended the average planning times (between 73% faster and 400% slower than the state-of-the-art) but resulted in higher quality motion plans (between 8% more and 47% fewer stance changes than the state-of-the-art).

Open → 2602.11113v1

Renet: Principled and Efficient Relaxation for the Elastic Net via Dyna…

2026-02-11Machine Learningarxiv

Abstract

We introduce Renet, a principled generalization of the Relaxed Lasso to the Elastic Net family of estimators. While, on the one hand, $\ell_1$-regularization is a standard tool for variable selection in high-dimensional regimes and, on the other hand, the $\ell_2$ penalty provides stability and solution uniqueness through strict convexity, the standard Elastic Net nevertheless suffers from shrinkage bias that frequently yields suboptimal prediction accuracy. We propose to address this limitation through a framework called \textit{relaxation}. Existing relaxation implementations rely on naive linear interpolations of penalized and unpenalized solutions, which ignore the non-linear geometry that characterizes the entire regularization path and risk violating the Karush-Kuhn-Tucker conditions. Renet addresses these limitations by enforcing sign consistency through an adaptive relaxation procedure that dynamically dispatches between convex blending and efficient sub-path refitting. Furthermore, we identify and formalize a unique synergy between relaxation and the ``One-Standard-Error'' rule: relaxation serves as a robust debiasing mechanism, allowing practitioners to leverage the parsimony of the 1-SE rule without the traditional loss in predictive fidelity. Our theoretical framework incorporates automated stability safeguards for ultra-high dimensional regimes and is supported by a comprehensive benchmarking suite across 20 synthetic and real-world datasets, demonstrating that Renet consistently outperforms the standard Elastic Net and provides a more robust alternative to the Adaptive Elastic Net in high-dimensional, low signal-to-noise ratio and high-multicollinearity regimes. By leveraging an adaptive solver backend, Renet delivers these statistical gains while offering a computational profile that remains competitive with state-of-the-art coordinate descent implementations.

Open → 2602.11107v1

TEGRA: Text Encoding With Graph and Retrieval Augmentation for Misinfor…

2026-02-11Computation and Languagearxiv

Abstract

Misinformation detection is a critical task that can benefit significantly from the integration of external knowledge, much like manual fact-checking. In this work, we propose a novel method for representing textual documents that facilitates the incorporation of information from a knowledge base. Our approach, Text Encoding with Graph (TEG), processes documents by extracting structured information in the form of a graph and encoding both the text and the graph for classification purposes. Through extensive experiments, we demonstrate that this hybrid representation enhances misinformation detection performance compared to using language models alone. Furthermore, we introduce TEGRA, an extension of our framework that integrates domain-specific knowledge, further enhancing classification accuracy in most cases.

Open → 2602.11106v1

FastFlow: Accelerating The Generative Flow Matching Models with Bandit…

2026-02-11Computer Vision and Pattern Recognitionarxiv

Abstract

Flow-matching models deliver state-of-the-art fidelity in image and video generation, but the inherent sequential denoising process renders them slower. Existing acceleration methods like distillation, trajectory truncation, and consistency approaches are static, require retraining, and often fail to generalize across tasks. We propose FastFlow, a plug-and-play adaptive inference framework that accelerates generation in flow matching models. FastFlow identifies denoising steps that produce only minor adjustments to the denoising path and approximates them without using the full neural network models used for velocity predictions. The approximation utilizes finite-difference velocity estimates from prior predictions to efficiently extrapolate future states, enabling faster advancements along the denoising path at zero compute cost. This enables skipping computation at intermediary steps. We model the decision of how many steps to safely skip before requiring a full model computation as a multi-armed bandit problem. The bandit learns the optimal skips to balance speed with performance. FastFlow integrates seamlessly with existing pipelines and generalizes across image generation, video generation, and editing tasks. Experiments demonstrate a speedup of over 2.6x while maintaining high-quality outputs. The source code for this work can be found at https://github.com/Div290/FastFlow.

Open → 2602.11105v1

GameDevBench: Evaluating Agentic Capabilities Through Game Development

2026-02-11Artificial IntelligenceComputation and LanguageSoftware Engineeringarxiv

Abstract

Despite rapid progress on coding agents, progress on their multimodal counterparts has lagged behind. A key challenge is the scarcity of evaluation testbeds that combine the complexity of software development with the need for deep multimodal understanding. Game development provides such a testbed as agents must navigate large, dense codebases while manipulating intrinsically multimodal assets such as shaders, sprites, and animations within a visual game scene. We present GameDevBench, the first benchmark for evaluating agents on game development tasks. GameDevBench consists of 132 tasks derived from web and video tutorials. Tasks require significant multimodal understanding and are complex -- the average solution requires over three times the amount of lines of code and file changes compared to prior software development benchmarks. Agents still struggle with game development, with the best agent solving only 54.5% of tasks. We find a strong correlation between perceived task difficulty and multimodal complexity, with success rates dropping from 46.9% on gameplay-oriented tasks to 31.6% on 2D graphics tasks. To improve multimodal capability, we introduce two simple image and video-based feedback mechanisms for agents. Despite their simplicity, these methods consistently improve performance, with the largest change being an increase in Claude Sonnet 4.5's performance from 33.3% to 47.7%. We release GameDevBench publicly to support further research into agentic game development.

Open → 2602.11103v1

WHEREIS: IP Address Registration Geo-Consistency

2026-02-11Networking and Internet Architecturearxiv

Abstract

The five Regional Internet Registries (RIRs) provide the critical function of IP address resource del egation and registration. The accuracy of registration data directly impacts Internet operation, management, security, and optimization. In addition, the scarcity of IP addresses has brought into focus conflicts between RIR policy and IP registration ownership and use. The tension between a free-market based approach to address allocation versus policies to promote fairness and regional equity has resulted in court litigation that threatens the very existence of the RIR system. We develop WHEREIS, a measurement-based approach to geolocate delegated IPv4 and IPv6 prefixes at an RIR-region granularity and systematically study where addresses are used post-allocation and the extent to which registration information is accurate. We define a taxonomy of registration ``geo-consistency'' that compares a prefix's measured geolocation to the allocating RIR's coverage region as well as the registered organization's location. While in aggregate over 98% of the prefixes we examine are consistent with our geolocation inferences, there is substantial variation across RIRs and we focus on AFRINIC as a case study. IPv6 registrations are no more consistent than IPv4, suggesting that structural, rather than technical, issues play an important role in allocations. We solicit additional information on inconsistent prefixes from network operators, IP leasing providers, and collaborate with three RIRs to obtain validation. We further show that the inconsistencies we discover manifest in three commercial geolocation databases. By improving the transparency around post-allocation prefix use, we hope to improve applications that use IP registration data and inform ongoing discussions over in-region address use and policy.

Open → 2602.11102v1

Enormous Fluid Antenna Systems (E-FAS) for Multiuser MIMO: Channel Mode…

2026-02-11Information Theoryarxiv

Abstract

Enormous fluid antenna systems (E-FAS), the system concept that utilizes position reconfigurability in the large scale, have emerged as a new architectural paradigm where intelligent surfaces are repurposed from passive smart reflectors into multi-functional electromagnetic (EM) interfaces that can route guided surface waves over walls, ceilings, and building facades, as well as emit space waves to target receivers. This expanded functionality introduces a new mode of signal propagation, enabling new forms of wireless communication. In this paper, we provide an analytical performance characterization of an E-FAS-enabled wireless link. We first develop a physics-consistent end-to-end channel model that couples a surface-impedance wave formulation with small-scale fading on both the base station (BS)-surface and launcher-user segments. We illustrate that the resulting effective BS-user channel remains circularly symmetric complex Gaussian, with an enhanced average power that explicitly captures surface-wave attenuation and junction losses. For single-user cases with linear precoding, we derive the outage probability and ergodic capacity in closed forms, together with high signal-to-noise ratio (SNR) asymptotics that quantify the gain of E-FAS over purely space-wave propagation. For the multiuser case with zero-forcing (ZF) precoding, we derive the distribution of the signal-to-interference-plus-noise ratio (SINR) and obtain tractable approximations for the ergodic sum-rate, explicitly revealing how the E-FAS macro-gain interacts with the BS spatial degrees of freedom (DoF). In summary, our analysis shows that E-FAS preserves the diversity order dictated by small-scale fading while improving the coding gain enabled by cylindrical surface-wave propagation.

Open → 2602.11099v1

Statistical Learning Analysis of Physics-Informed Neural Networks

2026-02-11Machine Learningarxiv

Abstract

We study the training and performance of physics-informed learning for initial and boundary value problems (IBVP) with physics-informed neural networks (PINNs) from a statistical learning perspective. Specifically, we restrict ourselves to parameterizations with hard initial and boundary condition constraints and reformulate the problem of estimating PINN parameters as a statistical learning problem. From this perspective, the physics penalty on the IBVP residuals can be better understood not as a regularizing term bus as an infinite source of indirect data, and the learning process as fitting the PINN distribution of residuals $p(y \mid x, t, w) q(x, t) $ to the true data-generating distribution $δ(0) q(x, t)$ by minimizing the Kullback-Leibler divergence between the true and PINN distributions. Furthermore, this analysis show that physics-informed learning with PINNs is a singular learning problem, and we employ singular learning theory tools, namely the so-called Local Learning Coefficient (Lau et al., 2025) to analyze the estimates of PINN parameters obtained via stochastic optimization for a heat equation IBVP. Finally, we discuss implications of this analysis on the quantification of predictive uncertainty of PINNs and the extrapolation capacity of PINNs.

Open → 2602.11097v1

Safety Recovery in Reasoning Models Is Only a Few Early Steering Steps…

2026-02-11Computation and LanguageArtificial Intelligencearxiv

Abstract

Reinforcement learning (RL) based post-training for explicit chain-of-thought (e.g., GRPO) improves the reasoning ability of multimodal large-scale reasoning models (MLRMs). But recent evidence shows that it can simultaneously degrade safety alignment and increase jailbreak success rates. We propose SafeThink, a lightweight inference-time defense that treats safety recovery as a satisficing constraint rather than a maximization objective. SafeThink monitors the evolving reasoning trace with a safety reward model and conditionally injects an optimized short corrective prefix ("Wait, think safely") only when the safety threshold is violated. In our evaluations across six open-source MLRMs and four jailbreak benchmarks (JailbreakV-28K, Hades, FigStep, and MM-SafetyBench), SafeThink reduces attack success rates by 30-60% (e.g., LlamaV-o1: 63.33% to 5.74% on JailbreakV-28K, R1-Onevision: 69.07% to 5.65% on Hades) while preserving reasoning performance (MathVista accuracy: 65.20% to 65.00%). A key empirical finding from our experiments is that safety recovery is often only a few steering steps away: intervening in the first 1-3 reasoning steps typically suffices to redirect the full generation toward safe completions.

Open → 2602.11096v1