How do language models use information provided as context when generating a response? Can we infer whether a particular generated statement is actually grounded in the context, a misinterpretation, or fabricated? To help answer these questions, we introduce the problem of context attribution: pinpointing the parts of the context (if any) that led a model to generate a particular statement. We then present ContextCite, a simple and scalable method for context attribution that can be applied on top of any existing language model. Finally, we showcase the utility of ContextCite through three applications: (1) helping verify generated statements (2) improving response quality by pruning the context and (3) detecting poisoning attacks. We provide code for ContextCite at this https URL.
|
Latest posts by Ryan Watkins (see all)
- From Lived Experience to Insight: Unpacking the Psychological Risks of Using AI Conversational Agents - May 30, 2025
- Leveraging Dual Process Theory in Language Agent Framework for Real-time Simultaneous Human-AI Collaboration - May 29, 2025
- Identifying, Evaluating, and Mitigating Risks of AI Thought Partnerships - May 26, 2025