Unsafe and Unused? A History of Utility Code in Mature Open Source Projects

2026-04-30Software Engineering

Software Engineering
AI summary

The authors studied files named with 'util' in seven large open source projects to understand how these utility files evolve over time. They found that such files are common and often meant to hold reusable code, but they can be more prone to security vulnerabilities than other files. The study tracked changes every month over many years to see who used and maintained these files, revealing that 'util' files reflect important developer behaviors and software structure. Their goal is to help developers avoid creating unnecessary or risky utility files.

utility filessource codeopen sourcesoftware vulnerabilitiesGit repositorylongitudinal studycode reusedeveloper collaboration
Authors
Brandon Keller, Kaitlin Yandik, Angela Ngo, Andy Meneely
Abstract
Filenames are a concise means of conveying information about source code to fellow developers. One such convention is util. Commonly understood to stand for "utility", filenames with the letters util are often an indication that the file contains code that may be broadly useful or reusable. Some projects use this convention heavily, for example, the Apache Tomcat server contains 925 files with util in the path name, which is 17.9% of all source code files in the tree. While the intent of the name may be to prevent duplicate code and reduce workload, what actually happens to util code over time? Do projects move away from util code as they mature? Are util files being used by fellow colleagues, or maintained and used by their author? The goal of our work is to help developers avoid creating unsafe and unused util files when developing their projects. We conducted a longitudinal mining study of the Git repositories of seven open source projects that have a long development history (Linux kernel, Django, FFmpeg, httpd, Struts, systemd, Tomcat). We analyzed how util usage, complexity, developer collaboration, and security are potentially correlated within these projects. Our longitudinal analysis was measured at 30-day intervals throughout the entire history of each project, resulting in 1773 snapshots over 147 project-years of development. We conducted rename tracking at every 30-day snapshot to examine util files over their entire lifetime in a codebase. For example, we found that a util file can be as much as 2.75 times more likely to be involved in a vulnerability than non-util files. While every project can adopt their own naming conventions, the ubiquity and longevity of util files shows a broader developer intent that is useful for understanding the socio-technical nature of software development.