The Use Of Scenario-Driven Simulations Won’t Protect Us From AGI And AI Superintelligence Going Rogue
Testing AGI in simulations can help us anticipate and address potential risks before deploying it in the real world. It is crucial to thoroughly assess and improve the safety measures of AGI to mitigate any potential existential risks. By ensuring that AGI is thoroughly tested and debugged, we can minimize the likelihood of unexpected outcomes or dangerous scenarios. It is essential to prioritize the safety and ethical considerations of AGI development to prevent any negative consequences. The insights and expertise of AI insiders can provide valuable perspectives on how to responsibly develop and deploy AGI.

Devising simulations to test AGI have their tradeoffs. In today’s column, I examine a highly touted means of staving off the existential risk of attaining artificial general intelligence (AGI) and artificial superintelligence (ASI). Some stridently believe that one means of ensuring that AGI and ASI won’t opt to wipe out humanity is to first put them into a computer-based simulated world and test them to see what they will do. If the AI goes wild and is massively destructive, no worries, since those actions are only happening in the simulation. We can then either try to fix the AI to prevent that behavior or ensure that it is not released into real-world usage. That all sounds quite sensible and a smart way to proceed, but the matter is more complex and a lot of gotchas and challenges confront such a solution. Let’s talk about it.
The Fundamentals of AGI Testing
This analysis of an innovative AI breakthrough is part of my ongoing Forbes column coverage on the latest in AI, including identifying and explaining various impactful AI complexities. First, some fundamentals are required to set the stage for this weighty discussion. There is a great deal of research going on to further advance AI. The general goal is to either reach artificial general intelligence (AGI) or maybe even the outstretched possibility of achieving artificial superintelligence (ASI). AGI is AI that is considered on par with human intellect and can seemingly match our intelligence. ASI is AI that has gone beyond human intellect and would be superior in many if not all feasible ways. The idea is that ASI would be able to run circles around humans by outthinking us at every turn. For more details on the nature of conventional AI versus AGI and ASI, see my analysis.
The Challenge of Testing AGI
We have not yet attained AGI. In fact, it is unknown as to whether we will reach AGI, or that maybe AGI will be achievable in decades or perhaps centuries from now. The AGI attainment dates that are floating around are wildly varying and wildly unsubstantiated by any credible evidence or ironclad logic. ASI is even more beyond the pale when it comes to where we are currently with conventional AI. Let’s focus primarily here on AGI since it is more likely to arise in the near-term than ASI. The upside of AGI is that it might discover a cure for cancer and perform other amazing acts that greatly benefit humanity. Not everyone is so grandly upbeat about attaining AGI. Some take the alarming stance that AGI is more likely to decide to attack humankind and either enslave us or possibly destroy us. How can we determine beforehand whether AGI will be evil?
The Role of Simulated Testing
One hearty suggestion is that we ought to test AGI. The usual approach to testing would consist of asking AGI what it intends to do and gauging the answers that we get. A stronger way to perform the test would be to set up a computer-based simulation that tricks AGI into assuming it is interacting with the real world. Via the simulation, we could try all manner of scenarios to see how AGI reacts. Anything AGI does is wholly capped within the simulation. This is somewhat reminiscent of the blockbuster movie The Matrix. In an interesting twist, humans are placed into a vast computer-based simulation by external real-world machines that want to keep humans compliant. We can do the same with budding AGI. Just devise an impressive computer-based simulation of the real world and have AGI interact in it without realizing where it really is. A reason to snooker the AGI is that if we outrightly tell AGI that it is working inside a simulation, the AGI is undoubtedly smart enough to pretend to be good, even if it truly is evil. Remember that AGI is supposed to be as astute as humans are. The idea is to fool AGI into not realizing it is within a simulation and that it is being tested accordingly.
The Concept of AI Sandboxing
In the parlance of AI software development, establishing a testing environment to try out AI is known generally as AI sandboxing. An AI sandbox might be barebones and simply be nothing more than an inert containment sphere, aiming to keep the AI from going beyond the virtual walls of the setup environment. Developers and testers can extensively test the AI while it is sandboxed. An AI sandbox can be increasingly amped up by having it model a particular environment for the AI to exist within. A full-blown AI sandbox might mirror a vast expanse that then interacts with the AI. Thus, not only are human developers and testers interacting with AI, but the containment itself also interacts with AI. A robust active sandbox is often referred to as a simulated world.
Challenges and Considerations
Suppose we craft an elaborate simulated world. We quietly place AGI into this fake world. Doing so should be relatively easy to pull off since AGI is also running on computers. The entire kit-and-kaboodle involves both the AGI running on servers and the simulated world running on servers. After being inside the simulated world for some length of time, we are meanwhile monitoring how AGI is performing and watching closely to detect any signs of evilness afoot. How long would this test need to take place? That’s a tough call. Some might claim that if AGI is free of evil signs for a few days, weeks, or months, we ought to feel comfortable giving AGI the green light to access the real world. Others say that if you speed up the clock on the simulation, you could do the whole exercise in mere seconds since the AGI is interacting perhaps billions of times via the quickness of the computers involved. But a big qualm is that maybe the length of time you’ve chosen is insufficient for the AGI to act up. Perhaps we need to keep AGI contained for years, possibly decades. Furthermore, maybe we should never let AGI go free with access to the real world. The logic is that we could have AGI find cures for cancer while fully immersed in the simulation, thus, no need to unleash AGI beyond that contained realm. A counterpoint to the permanent containment of AGI is that AGI might not produce the anticipated wonders due to being confined in a fake environment.
These vexing arguments go round and round. Envision that we put AGI into a simulation. We believe that we are all safe since AGI is constrained to the simulation. Oopsie, AGI figures out how to break out of the simulation. It then starts accessing the real world. Evilness is unleashed and AGI exploits our autonomous weapons systems and other vulnerabilities. This is the feared scenario of an AGI escape. Boom, drop the mic. Here’s another mind-bender. AGI is placed into a simulated world. We test the heck out of AGI. AGI is fine with this. Humans and AGI are seemingly fully aligned as to our values and what AGI is doing. Kumbaya. We then take AGI out of the simulation. AGI has access to the real world. But the real world turns out to differ from the simulation. Though the simulation was supposed to be as close as possible to the reality of the real world, it missed the mark. AGI now begins to go awry. It is being confronted with aspects that were never tested. The testing process gave us a false sense of comfort or confidence. We were lulled into believing that AGI would work well in the real world. The simulation was insufficient to give us that confidence, but we assumed all was perfectly fine.
HONESTAI ANALYSIS
From a practical perspective, devising a computer-based simulation that fully mimics the real world is quite a quest unto itself. That’s often an overlooked or neglected factor in these thorny debates. The amount of cost and effort, along with the time that would be required to craft such a simulation would undoubtedly be enormous. Would the cost to devise a bona fide simulation be worth the effort? An ROI would need to come into the calculation. One concern too is that the monies spent on building the simulation would potentially divert funds that could instead go toward building and improving AGI. We might end up with a half-baked AGI because we spent tons of dough crafting a simulation for testing AGI. The other side of that coin is that we spent our money on AGI and did a short-shrift job of devising the simulation. That’s not very good either. The simulation would be a misleading indicator since it is only half-baked. The smarmy answer is that we ought to have AGI devise the simulation for us. Yes, that’s right, just tell AGI to create a simulation that can be used to test itself. Voila, the cost and effort by humans drop to nothing. Problem solved. I’m sure you can guess why that isn’t necessarily the best solution per se. There aren’t any free lunches when it comes to figuring out whether AGI is going to be positive for humankind or negative. Developing and using a simulation is a worthy consideration. We must be mindful and cautiously smart in how we undertake this sobering endeavor.
Final Thoughts
A vociferous AI advocate might claim that all this talk about simulations is hogwash. Our attention should be fully on devising good AGI. Put aside the simulation aspirations. It is a waste of time and energy. Just do things right when it comes to shaping AGI. Period, end of story. This reminds me of a famous quote by Albert Einstein: “The only thing more dangerous than ignorance is arrogance.” Please keep his remark firmly in mind as we proceed on the rocky road toward AGI and ASI.