ReqFusion: A Multi-Provider Framework for Automated PEGS Analysis Across Software Domains

2026-03-24 • Software Engineering

Software EngineeringArtificial Intelligence

AI summaryⓘ

The authors present ReqFusion, a system that uses multiple AI language models to automatically find and organize software requirements from documents. They use a special detailed format called PEGS to help the AI understand the requirements better, which improved accuracy compared to generic prompts. Tests on real-world documents showed that ReqFusion works well at identifying both functional and non-functional requirements and drastically reduces the time needed for manual analysis. Their design also uses several AI models to ensure reliability and thorough coverage of different requirement types.

Requirements EngineeringLarge Language ModelsPEGS ApproachFunctional RequirementsNon-Functional RequirementsNatural Language ProcessingAutomated ClassificationModel EnsembleSoftware Development ProcessPrompt Engineering

Authors

Muhammad Khalid, Manuel Oriol, Yilmaz Uygun

Abstract

Requirements engineering is a vital, yet labor-intensive, stage in the software development process. This article introduces ReqFusion: an AI-enhanced system that automates the extraction, classification, and analysis of software requirements utilizing multiple Large Language Model (LLM) providers. The architecture of ReqFusion integrates OpenAI GPT, Anthropic Claude, and Groq models to extract functional and non-functional requirements from various documentation formats (PDF, DOCX, and PPTX) in academic, industrial, and tender proposal contexts. The system uses a domain-independent extraction method and generates requirements following the Project, Environment, Goal, and System (PEGS) approach introduced by Bertrand Meyer. The main idea is that, because the PEGS format is detailed, LLMs have more information and cues about the requirements, producing better results than a simple generic request. An ablation study confirms this hypothesis: PEGS-guided prompting achieves an F1 score of 0.88, compared to 0.71 for generic prompting under the same multi-provider configuration. The evaluation used 18 real-world documents to generate 226 requirements through automated classification, with 54.9% functional and 45.1% nonfunctional across academic, business, and technical domains. An extended evaluation on five projects with 1,050 requirements demonstrated significant improvements in extraction accuracy and a 78% reduction in analysis time compared to manual methods. The multi-provider architecture enhances reliability through model consensus and fallback mechanisms, while the PEGS-based approach ensures comprehensive coverage of all requirement categories.

View PDFOpen arXiv