Reaching the Gold Standard: Automated Text Analysis with Generative Pre-trained Transformers Matches Human-Level Performance

posted in: reading | 0
Natural language is a vital source of evidence for the social sciences. Yet quantifying large volumes of text rigorously and precisely is extremely difficult, and automated methods have struggled to match the “gold standard” of human coding. The present work used GPT-4 to conduct an automated analysis of 1,356 essays, rating the authors’ spirituality on a continuous scale. This presents an especially challenging test for automated methods, due to the subtlety of the concept and the difficulty of inferring complex personality traits from a person’s writing. Nonetheless, we found that GPT-4’s ratings demonstrated excellent internal reliability, remarkable consistency with a human rater, and strong correlations with self-report measures and behavioral indicators of spirituality. These results suggest that, even on nuanced tasks requiring a high degree of conceptual sophistication, automated text analysis with Generative Pre-trained Transformers can match human-level performance. Hence, these results demonstrate the extraordinary potential for such tools to advance social scientific research.

Ryan Watkins