Current AI evaluation paradigms that rely on static, model-only tests fail to capture harms that emerge through sustained human-AI interaction. As interactive AI systems, such as AI companions, proliferate in daily life, this mismatch between evaluation methods and real-world use becomes increasingly consequential. We argue for a paradigm shift toward evaluation centered on \textit{interactional ethics}, which addresses risks like inappropriate human-AI relationships, social manipulation, and cognitive overreliance that develop through repeated interaction rather than single outputs. Drawing on human-computer interaction, natural language processing, and the social sciences, we propose principles for evaluating generative models through interaction scenarios and human impact metrics. We conclude by examining implementation challenges and open research questions for researchers, practitioners, and regulators integrating these approaches into AI governance frameworks.
Latest posts by Ryan Watkins (see all)
- Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task - June 19, 2025
- The Memory Paradox: Why Our Brains Need Knowledge in an Age of AI - June 13, 2025
- Artificial Intelligence Software to Accelerate Screening for Living Systematic Reviews - June 13, 2025