Reasoning Models Don’t Always Say What They Think

posted in: reading | 0

Chain-of-thought (CoT) offers a potential boon for AI safety as it allows monitoring a model’s CoT to try to understand its intentions and reasoning processes. However, the effectiveness of such monitoring hinges on CoTs faithfully representing models’ actual reasoning processes. … Continued

What We Are Reading: