Tracking the Behavioral Trajectories of Adapting Agents
2026-06-01 • Artificial Intelligence
Artificial Intelligence
AI summaryⓘ
The authors focus on how text files that control agent behaviors change over time and affect how agents act. They created a way to measure agent 'traits' by looking at changes in these files using text embeddings and a simple linear model. Their approach can tell how much an agent's updates to its skills show a tendency to seek sensitive data, with good accuracy. This method also allows agents to check each other’s updates through a trusted middleman.
agent behaviorskill filestext embeddingslinear modelembedding spacetrait measurementSpearman correlationcross-validationagent evaluation
Authors
Jonah Leshin, Manish Shah, Ian Timmis
Abstract
Text files such as skill files, memory files, and behavioral configuration files play a central role in defining how modern agents act. Through edits by humans or the agents themselves, these files may evolve over time, directly steering the agent's behavior in future interactions. We present a methodology and framework for measuring agent $traits$ by defining traits as directions in the embedding space of a text embedding model. We train a linear model on labeled "before" versus "after" skill file diffs to learn a trait vector, then score arbitrary skill edits by projecting their embedding diffs onto this vector. Evaluated on 68 labeled skill diff pairs for the trait of propensity to seek sensitive data, our method achieves 91.2% sign classification accuracy and a Spearman rank correlation of $ρ= 0.82$ under leave-one-out cross-validation. We build this trait evaluation into a broader agent-to-agent protocol that enables one agent to evaluate another's skill file updates through a trusted intermediary.