Sitting alongside Pope Leo XIV as he delivered his first encyclical on the dangers of AI was a curious speaker: a self-declared atheist and the billionaire cofounder of one of the most valuable AI companies in the world. 

Chris Olah, one of Anthropic’s cofounders and a prominent AI safety researcher who serves as the company’s interpretability research lead, acknowledged the peculiarity of his presence during the presentation at the Vatican last week. 

“I want to begin with something that may sound strange coming from the co-founder of an AI company,” he said in his prepared remarks. In an attempt to remain profitable and lead research while avoiding the pressure imposed by geopolitics, Olah said, AI companies must be sure they are “doing the right thing” as they continue to drive forward innovation. 

“No matter how sincerely any of us intend to do the right thing, and I believe many of us do, we will always be influenced by those incentives,” he said in his prepared remarks. 

As a result of that paradox between the reality of building a frontier AI company while also sticking to a value-driven mission, Olah sat alongside Pope Leo XIV and warned that outside critics, such as the Catholic Church but also scholars and governments, must supervise the industry and keep its moral obligations at the forefront. 

“Some might believe that matters of AI are best handled by computer scientists like myself,” he added during his remarks. “They are mistaken.”

Who is Chris Olah?

Olah’s presence at the Vatican was as unlikely as the journey that led him there.

Raised in Toronto, Canada, Olah was a “devout evangelical Christian,” until he became an atheist at the age of 15. He attended the University of Toronto to study math, but dropped out only about a year into his studies.

A year later, in 2012, he was awarded $100,000 through the Thiel Fellowship, a program created by PayPal cofounder Peter Thiel to help talented young people pursue other passions in lieu of a traditional four-year college degree. In a video highlighting the winners of the fellowship Olah said he enjoyed “doing mathematical visualizations with 3D printers.” 

Fast forward to his professional life and it’s clear his love of math and technology never left him. Starting in 2015, he spent three years at Google Brain, which in 2023 became part of Google DeepMind. He began as an intern and later worked his way up to research scientist. Along the way, he helped build tools to visualize what was happening inside neural networks in an emerging field of study called “mechanistic interpretability,” which at the time was not very popular as researchers were mainly focused on trying to make AI more powerful.

Still, while at Google, Olah contributed to research that brought newfound attention to the study of how neural networks work, including a paper titled The Building Blocks of Interpretability, which offered one of the first windows into how neural networks deduce complex concepts from simpler building blocks.

While “originally it was a pretty small set of people who were interested in these questions,” Olah told the podcast 80,000 Hours, his work eventually caught the eye of ChatGPT maker OpenAI where he turned his interest in neural network logic into his full-time job.

From 2018 until 2020, Olah led OpenAI’s interpretability team. At OpenAI he worked on two landmark research projects. The first, known as the Circuits project, aimed to prove neural networks contained identifiable, human-readable information formed by structured patterns of neurons that could be interpreted.

The second was the discovery of multimodal neurons in CLIP, OpenAI’s model for connecting text and images. His team found that certain neurons inside the model would “fire” in response to the same concept like “Spider-Man,” whether it appeared as a photograph, a drawing, or as text. This research showed how artificial neural networks may operate similarly to the human brain. 

In 2020, Olah was one of the original seven OpenAI employees, including CEO Dario Amodei, to leave the company over concerns about AI safety. Olah later helped cofound Anthropic with this group, which was valued at $965 billion after a recent funding round. The company confidentially filed for an initial public offering this week. Olah’s net worth now stands at just under $8 billion, according to the Bloomberg Billionaires Index.

Olah’s comments with the Pope run contrary to the opinions of other industry insiders, including Marc Andreessen, who argued in his 2023 Techno-Optimist Manifesto that “trust and safety” and “tech ethics” were part of a demoralization campaign led by “enemies” against technology and life.

Still, Olah’s comments align broadly with Anthropic’s mission, which emphasizes safety and doesn’t shy away from presenting research on the risks of AI. It also squares with the Pope’s encyclical, Magnifica Humanitas, which serves as a sort of moral framework for AI and calls for “a measured and vigilant approach” to its development, as well as the consideration of humans over machines.

At Anthropic, Olah has helped further the study of “mechanistic interpretability,” aiming to reverse-engineer AI models to identify which clusters of artificial neurons activate for what purposes and how they shape a model’s outputs. 

In 2024, Time named him to its TIME100 AI list of the most influential people in the AI industry.

“If we could really understand these systems, and this would require a lot of progress, we might be able to go and say when these models are actually safe,” he told Time. “Or whether they just appear safe.”

This story was originally featured on Fortune.com

Read More