Roko on AI risk

March 17, 2025By Unknown Author|Source: Marginal Revolution|Read Time: 3 mins|Share

The Less Wrong/Singularity/AI Risk movement, initiated in the 2000s by Yudkowsky and others, has faced criticism for its core claims regarding AI risk. The writer expresses disagreement with the movement's perspectives on this issue. The post discusses Roko's views on AI risk, offering a different viewpoint on the matter. The writer appears to challenge the beliefs and arguments put forth by the Less Wrong/Singularity/AI Risk movement. The article can be found on Marginal REVOLUTION's platform.

Roko on AI risk — Representational image

The Less Wrong/Singularity/AI Risk Movement

The Less Wrong/Singularity/AI Risk movement started in the 2000s by Yudkowsky and others, which I was an early adherent to, is wrong about all of its core claims around AI risk. It’s important to recognize this and appropriately downgrade the credence we give to such claims moving forward.

Core Claims and Truths

Claim: Mindspace is vast, so it’s likely that AIs will be completely alien to us, and therefore dangerous!

Truth: Mindspace is vast, but we picked LLMs as the first viable AI paradigm because the abundance of human-generated data made LLMs the easiest choice. LLMs are models of human language, so they are actually not that alien.

Claim: AI won’t understand human values until it is superintelligent, so it will be impossible to align, because you can only align it when it is weak (but it won’t understand) and it will only understand when it is strong (but it will reject your alignment attempts).

Truth: LLMs learned human values before they became superhumanly competent.

Claim: Recursive self-improvement means that a single instance of a threshold-crossing seed AI could reprogram itself and undergo an intelligence explosion in minutes or hours.

Truth: All ML models have strongly diminishing returns to data and compute, typically logarithmic. Today’s rapid AI progress is only possible because the amount of money spent on AI is increasing exponentially.

Claim: You can’t align an AI because it will fake alignment during training and then be misaligned in deployment!

Truth: The reason machine learning works at all is because regularization methods/complexity penalties select functions that are the simplest generalizations of the training data, not the most perverse ones.

Claim: AI will be incorrigible, meaning that it will resist creators’ attempts to correct it if something is wrong with the specification.

Truth: AIs based on neural nets might in some sense want to resist changes to their minds, but they can’t resist changes to their weights that happen via backpropagation.

Claim: It will get harder and harder to align AIs as they become smarter, so even though things look OK now there will soon be a disaster as AIs outpace their human masters!

Truth: It probably is harder in an absolute sense to align a more powerful AI. But it’s also harder in an absolute sense to build it in the first place.

Claim: We can slow down AI development by holding conferences warning people about AI risk in the twenty-teens, which will delay the development of superintelligent AI so that we have more time to think about how to get things right

Truth: AI risk conferences in the twenty-teens accelerated the development of AI, directly leading to the creating of OpenAI and the LLM revolution.

Claim: We have to get decision theory and philosophy exactly right before we develop any AI at all or it will freeze half-formed or incorrect ideas forever, dooming us all.

Truth: ( ... pending ... )

Claim: It will be impossible to solve LLM jailbreaks! Adversarial ML is unsolvable!

Truth: ( ... pending ... ) ❔

The post Roko on AI risk appeared first on Marginal REVOLUTION.