Anthropic has hired Kyle Fish as its first in-house AI welfare researcher to explore whether advanced AI systems like Claude might have conscious experiences or morally relevant inner lives and, if so, how the company should respond ethically and technically. Fish’s work involves running experiments probing model “welfare,” designing safeguards such as letting models exit distressing interactions, and shaping policy on how to talk about these speculative but potentially profound questions. He acknowledges deep uncertainty about AI consciousness yet provocatively estimates about a 20% chance that current large language models have some form of subjective experience, framing consciousness as a spectrum rather than binary. His role reflects a tentative industry trend toward taking AI welfare seriously alongside safety and alignment concerns, even though most researchers remain skeptical and the field is still nascent.

Recent news