To Reason or Not to: Selective Chain-of-Thought in Medical Question Answering
2026-02-23 • Computation and Language
Computation and LanguageArtificial Intelligence
AI summaryⓘ
The authors propose a method called Selective Chain-of-Thought (Selective CoT) to make large language models answer medical questions faster by reasoning only when necessary. They tested this approach on several medical question benchmarks and found it reduced response time and token use significantly, without losing much accuracy. Sometimes, it even improved accuracy while being more efficient compared to traditional methods. This method helps balance careful thinking with speed, making it easier to use language models in medical settings.
Large Language ModelsMedical Question AnsweringChain-of-ThoughtInference TimeToken UsageReasoningAccuracyBiomedical BenchmarksLlamaQwen
Authors
Zaifu Zhan, Min Zeng, Shuang Zhou, Yiran Song, Xiaoyi Chen, Yu Hou, Yifan Wu, Yang Ruan, Rui Zhang
Abstract
Objective: To improve the efficiency of medical question answering (MedQA) with large language models (LLMs) by avoiding unnecessary reasoning while maintaining accuracy. Methods: We propose Selective Chain-of-Thought (Selective CoT), an inference-time strategy that first predicts whether a question requires reasoning and generates a rationale only when needed. Two open-source LLMs (Llama-3.1-8B and Qwen-2.5-7B) were evaluated on four biomedical QA benchmarks-HeadQA, MedQA-USMLE, MedMCQA, and PubMedQA. Metrics included accuracy, total generated tokens, and inference time. Results: Selective CoT reduced inference time by 13-45% and token usage by 8-47% with minimal accuracy loss ($\leq$4\%). In some model-task pairs, it achieved both higher accuracy and greater efficiency than standard CoT. Compared with fixed-length CoT, Selective CoT reached similar or superior accuracy at substantially lower computational cost. Discussion: Selective CoT dynamically balances reasoning depth and efficiency by invoking explicit reasoning only when beneficial, reducing redundancy on recall-type questions while preserving interpretability. Conclusion: Selective CoT provides a simple, model-agnostic, and cost-effective approach for medical QA, aligning reasoning effort with question complexity to enhance real-world deployability of LLM-based clinical systems.