MuTSE: A Human-in-the-Loop Multi-use Text Simplification Evaluator
2026-04-10 • Computation and Language
Computation and LanguageArtificial Intelligence
AI summaryⓘ
The authors created MuTSE, a web tool to help compare how different prompts and language models simplify text targeting various language proficiency levels. This tool runs multiple prompt-model combinations at once and shows side-by-side comparisons, making it easier to see differences. MuTSE uses a special method to align the meaning of original and simplified sentences visually, which helps people analyze them more easily and consistently. This addresses problems with existing methods that are either too static or limited to simple chat interfaces.
Large Language Modelstext simplificationprompting strategiesCEFR proficiencysemantic alignmenthuman-in-the-loopNatural Language ProcessingIntelligent Tutoring Systemsannotationweb application
Authors
Rares-Alexandru Roscan, Gabriel Petre1, Adrian-Marius Dumitran, Angela-Liliana Dumitran
Abstract
As Large Language Models (LLMs) become increasingly prevalent in text simplification, systematically evaluating their outputs across diverse prompting strategies and architectures remains a critical methodological challenge in both NLP research and Intelligent Tutoring Systems (ITS). Developing robust prompts is often hindered by the absence of structured, visual frameworks for comparative text analysis. While researchers typically rely on static computational scripts, educators are constrained to standard conversational interfaces -- neither paradigm supports systematic multi-dimensional evaluation of prompt-model permutations. To address these limitations, we introduce \textbf{MuTSE}\footnote{The project code and the demo have been made available for peer review at the following anonymized URL. https://osf.io/njs43/overview?view_only=4b4655789f484110a942ebb7788cdf2a, an interactive human-in-the-loop web application designed to streamline the evaluation of LLM-generated text simplifications across arbitrary CEFR proficiency targets. The system supports concurrent execution of $P \times M$ prompt-model permutations, generating a comprehensive comparison matrix in real-time. By integrating a novel tiered semantic alignment engine augmented with a linearity bias heuristic ($λ$), MuTSE visually maps source sentences to their simplified counterparts, reducing the cognitive load associated with qualitative analysis and enabling reproducible, structured annotation for downstream NLP dataset construction.