Position: Stop Making Unscientific AGI Performance Claims

mechanistic interpretability

large language models

artificial intelligence

anthropomorphism

We call for the academic community to exercise caution in interpreting and communicating about AI research outcomes.

Published

May 7, 2024

Abstract

Developments in the field of AI in general, and Large Language Models (LLMs) in particular, have created a ‘perfect storm’ for observing ‘sparks’ of Artificial General Intelligence (AGI) that are spurious. Like simpler models, LLMs distill representations in their latent embeddings that have been shown to correlate with meaningful phenomena. Nonetheless, the correlation of such representations has often been linked to human-like intelligence in the latter but not the former. We probe models of varying degrees of sophistication including random projections, matrix decompositions, deep autoencoders and transformers: all of them successfully distill knowledge and yet none of them develop true understanding. Specifically, we show that embeddings of a language model fine-tuned on central bank communications can make meaningful predictions, via correlations with unseen economic variables, such as price inflation. However, we then show that inflation is also predicted for nonsense prompts about growing and shrinking bird populations (‘dovelation’). We therefore argue that patterns in latent spaces are spurious sparks of AGI. Additionally, we review literature from the social sciences that shows that humans are prone to seek patterns and anthropomorphize. We, therefore, argue that both the methodological setup and common public image of AI are ideal for the misinterpretation that correlations between model representations and some variables of interest are ‘caused’ by the model’s understanding of underlying ‘ground truth’ relationships. We therefore call for the academic community to exercise extra caution, and to be keenly aware of principles of academic integrity, in interpreting and communicating about AI research outcomes.

Full paper: please find all available versions here (preprint).

Blog post: please find a blog post summarizing the paper here.