FinHarness: An Inline Lifecycle Safety Harness for Finance LLM Agents

2026-05-26Computation and Language

Computation and Language
AI summary

The authors developed FinHarness, a safety system for finance-focused AI agents that helps prevent unauthorized actions while allowing valid multi-step tasks. Their system monitors both the agent’s current intent and past behavior, evaluates each tool use before it happens, and uses a two-level judge to assess risks efficiently. When risks are detected, this information is fed back to the agent so it can decide to stop, rethink, or continue safely. Tests showed FinHarness significantly reduced mistakes with fewer expensive checks, while still approving most correct actions.

LLM agentsprompt injectiontool callsrisk assessmentintent monitoringmulti-step workflowspost-hoc auditingcascade verificationfinance AIsafety harness
Authors
Haoxuan Jia, Yang Liu, Bin Chong, Yingguang Yang, Yancheng Chen, Jiayu Liang, Qian Li, Hanning Lu, Kefu Xu, Hao Zheng, Chongyang Zhang, Hao Peng, Philip S. Yu
Abstract
Finance LLM agents must simultaneously block prompt-induced unauthorized actions and approve legitimate multi-step business workflows. However, boundary filters often miss irreversible mid-trajectory tool calls, while post-hoc LLM judges perform auditing only after termination -- too late for intervention and at a computational cost that scales linearly with trace length. We present FinHarness, an inline safety harness that wraps a finance agent end-to-end with three components: a Query Monitor that fuses single-turn intent with cross-turn drift, a Tool Monitor that evaluates each prospective tool call, and a Cascade module that integrates per-step risk and adaptively routes verification between a lightweight and an advanced-tier LLM judge. Fired risk factors are re-injected into the agent input as ex-ante evidence, enabling the agent to refuse, re-plan, or approve on its own. On FinVault, routed FinHarness cuts ASR from 38.3% to 15.0% while largely preserving benign approval ($41.1\% \to 39.3\%$), and uses $4.7\times$ fewer advanced-judge calls than an always-advanced ablation.