Human-like behaviour key to AI models passing the Turing Test

April 3, 2025By Unknown Author|Source: Hindustan Times|Read Time: 3 mins|Share

The study demonstrates that GPT-4.5 and LLaMa-3.1-405B have a promising performance in passing the Turing test, with GPT-4.5 showing superior results. This indicates advancements in natural language processing technology. Further research is needed to explore the capabilities of these models in various contexts and applications. The findings highlight the potential of these AI models in improving human-computer interactions.

Human-like behaviour key to AI models passing the Turing Test — Representational image

OpenAI’s GPT-4.5 and Meta’s LLaMa models have passed the Turing Test, a benchmark proposed by Alan Turing in the 1950s to assess whether a machine can exhibit intelligent behavior indistinguishable from humans.

Background on GPT-4.5 and LLaMa Models

A pivotal moment for conversational AI, one easily eclipsed amid a flurry of intriguing developments, including ChatGPT’s Ghibli imaging, pursuit of Agentic AI (human-like responses are especially relevant for this frontier), breakthroughs in cancer detection using AI, and Google unlocking a ‘thinking’ Gemini 2.5 model.

Though not the first AI models to pass this test, it is one of the most noticeable among recent contenders. GPT 4.5, released in 2023, exhibited the most human-like behavior in the tests, competing with Meta’s LLaMa-3.1-405B and its sibling, the GPT-4o (released in 2024).

Researchers Cameron R. Jones and Benjamin K. Bergen of the University of California San Diego, in a study awaiting peer review, noted that when prompted to adopt a humanlike persona, GPT-4.5 was judged to be the human 73% of the time, significantly more often than interrogators selected the real human participant.

On the other hand, LLaMa-3.1 was judged to be the human 56% of the time, not significantly different from the humans they were being compared to, while baseline models (ELIZA and GPT-4o) achieved win rates significantly below chance.

Interpreting the Turing Test Results

The Turing Test measures conversational performance, not comprehension or consciousness. A high success rate in the test shows the ability to mimic human behavior convincingly but may not necessarily indicate true intelligence in terms of reasoning or intent in responses to queries.

ELIZA, a chatbot from the 1960s, was included in the test as a reference point to ensure that interrogators could identify human responses. The study confirmed that both GPT-4.5 and LLaMa-3.1-405B passed the Turing test, with GPT-4.5 achieving better scores.

Insights and Reflections on AI Advancements

OpenAI’s relentless refinement of large language models (LLMs) has contributed to the success of GPT-4.5, which showcases enhanced natural language processing and context retention capabilities. The persona prompt has been pivotal in allowing AI models to tailor responses with a human-like flair.

Susan Schneider, Founding Director, Center for the Future Mind at the Florida Atlantic University (FAU), highlights the potential implications of AI advancements, predicting challenges related to alignment, emergent properties, and ethical considerations.

Looking ahead, there is a growing focus on practical utility for AI, emphasizing problem-solving abilities over just conversational skills. The evolving landscape of AI technology calls for new benchmarks that can better assess reasoning and ethical alignment to gauge AI’s progress more effectively.