Natural language is a vital source of evidence for the social sciences. Yet quantifying large volumes of text rigorously and precisely is extremely difficult, and automated methods have struggled to match the “gold standard” of human coding. The present work used GPT-4 to conduct an automated analysis of 1,356 essays, rating the authors’ spirituality on a continuous scale. This presents an especially challenging test for automated methods, due to the subtlety of the concept and the difficulty of inferring complex personality traits from a person’s writing. Nonetheless, we found that GPT-4’s ratings demonstrated excellent internal reliability, remarkable consistency with a human rater, and strong correlations with self-report measures and behavioral indicators of spirituality. These results suggest that, even on nuanced tasks requiring a high degree of conceptual sophistication, automated text analysis with Generative Pre-trained Transformers can match human-level performance. Hence, these results demonstrate the extraordinary potential for such tools to advance social scientific research.
Latest posts by Ryan Watkins (see all)
- Synergizing Human-AI Agency: A Guide of 23 Heuristics for Service Co-Creation with LLM-Based Agents - December 1, 2023
- AI-Augmented Surveys: Leveraging Large Language Models and Surveys for Opinion Prediction [imputation] - November 29, 2023
- Enhancing Human Persuasion With Large Language Models - November 29, 2023