AI summaryⓘ
The authors explain that many people use web browsers to do a variety of complex tasks, like coding or filling out forms, which means there are lots of useful skills hidden in how people interact online. They suggest that the hardest part for automated browsing tools isn't basic actions but making smart decisions without all the information. To help with this, the authors turn users' online actions into easy-to-understand, reusable instructions in natural language that machines can use to improve. They also organize these instructions into a connected map so the system grows by combining skills instead of just adding more. This approach relies on learning from real users' behavior to make browser agents better at handling tasks.
behavior cloningskill distillationbrowser agentsnatural language skillsuser interaction trajectoriesdecision-making under uncertaintyskill graphweb automationscalable learninghuman-computer interaction
Authors
Kaisen Yang, Zheng Jiang, Yuzhao Peng, Houde Qian, Boshi Zhang, Youjie Zheng, Shijin Hong, Qingle Liu, Ruoyu Han, Bohan Lyu, Bingxiang He, Eren Cai, Calvin Xiao, Qinhuai Na
Abstract
Internet users collectively perform an enormous range of skilled work through web browsers, from software development and document editing to search, forms, and enterprise workflows, making human browsing a highly scalable but under-exploited source of reusable browser skills. We argue that the bottleneck for browser agents is decision-making under incomplete information rather than low-level operation, and that the priors agents lack are already implicit in human interaction traces. We therefore study scalable behavior cloning for browser agents via skill distillation, converting user interaction trajectories into compact natural-language skills that agents can read, retrieve, reuse, and compose directly. We further organize the distilled skills into a skill graph so that growth proceeds through consolidation rather than unbounded accumulation. This suggests that the scalability of browser agents may come less from manually designed tasks and more from the collective skills already expressed by internet users. Our project is available at: https://lab.einsia.ai/browserbc/.