AI summaryⓘ
The authors studied how Large Language Models (LLMs), like chatbots, handle U.S. History lessons that vary between states because each state has different curriculum rules. They created a system to see if these models reflect the exact content required by different states. They also tested if the models change their answers based on student details like location, grade, gender, or race. They found that while models adjust for grade level well, they don’t truly match state curricula and are influenced by assumptions about state politics rather than actual lesson content. The authors warn this mismatch could affect student learning and suggest better ways to align models with educational standards are needed.
Large Language ModelsCurriculum StandardsU.S. History EducationState Education PolicyUser PersonaDemographic BiasAlignmentEducational TechnologyNatural Language ProcessingChatbots
Authors
Lisa Korver, Tomo Lazovich, Sherief Reda
Abstract
As Large Language Models (LLMs) become increasingly popular in educational settings, they raise important questions about the ethical implications of their use. Publicly available online chatbots are quickly improving in capability and accuracy leading to more widespread use, including among students looking for help with their homework. This makes it crucial to consider whether these models are aligned with educational standards. Because curriculum standards in the United States are set at the state level, they differ significantly in required content, emphasis, and narrative focus. In this work, we develop an LLM-based pipeline to identify variations in U.S. History curricula across states and evaluate the extent to which different LLMs reflect these state-specific curricular differences. In addition, we conduct controlled experiments that vary user personas by stating user attributes such as geographic location, grade level, gender and race to evaluate the sensitivity of LLM responses to user characteristics. We find that while models are able to adjust their presentation of historical topics, these shifts may come from the perceived political leanings of states and do not necessarily reflect actual curriculum content. Additionally, models successfully adapt to a student's grade level while showing minimal sensitivity to race or gender, suggesting they are capable of useful adaptation to student personas with limited demographic bias. Together, these findings highlight potential risks that open access to LLM chatbots may cause to student learning outcomes stemming from misalignment with state curriculum standards and highlight the need for more robust alignment techniques.