Securing Retrieval-Augmented Generation: A Taxonomy of Attacks, Defenses, and Future Directions

2026-04-09Cryptography and Security

Cryptography and SecurityArtificial Intelligence
AI summary

The authors explain that when large language models use outside information to answer questions, it can create new security risks. They point out that some risks come directly from the models themselves, while others are specifically due to how the models access external knowledge. The authors break down the process of using outside information into six parts and organize related research around different trust zones and types of security problems. They find that current protections mostly respond to problems after they happen and are not well connected. Finally, they suggest ways to improve security by paying attention to the whole process of getting and using outside information.

Retrieval-Augmented GenerationLarge Language ModelsSecurity RisksExternal Knowledge AccessThreat ModelingKnowledge-Access PipelineAttack SurfacesDefensesEvaluation BenchmarksTrust Boundaries
Authors
Yuming Xu, Mingtao Zhang, Zhuohan Ge, Haoyang Li, Nicole Hu, Jason Chen Zhang, Qing Li, Lei Chen
Abstract
Retrieval-augmented generation (RAG) significantly enhances large language models (LLMs) but introduces novel security risks through external knowledge access. While existing studies cover various RAG vulnerabilities, they often conflate inherent LLM risks with those specifically introduced by RAG. In this paper, we propose that secure RAG is fundamentally about the security of the external knowledge-access pipeline. We establish an operational boundary to separate inherent LLM flaws from RAG-introduced or RAG-amplified threats. Guided by this perspective, we abstract the RAG workflow into six stages and organize the literature around three trust boundaries and four primary security surfaces, including pre-retrieval knowledge corruption, retrieval-time access manipulation, downstream context exploitation, and knowledge exfiltration. By systematically reviewing the corresponding attacks, defenses, remediation mechanisms, and evaluation benchmarks, we reveal that current defenses remain largely reactive and fragmented. Finally, we discuss these gaps and highlight future directions toward layered, boundary-aware protection across the entire knowledge-access lifecycle.