New Study Reveals AI Response Accuracy Unaffected by Follow-Up Questions

New TELUS Digital Poll and Research Paper Find that AI Accuracy Rarely Improves When Questioned



In a notable development, TELUS Digital, a division of TELUS Corporation, has released a comprehensive poll and research paper highlighting significant concerns regarding the accuracy of Artificial Intelligence (AI) responses. Conducted among 1,000 U.S. adults, the findings reveal that the accuracy of answers provided by AI assistants such as ChatGPT, Claude, and others does not notably improve even when users press them with follow-up questions like "Are you sure?" This raises critical questions about the effectiveness of AI systems in real-world applications.

Understanding the Poll Results



The survey uncovered that a substantial 60% of participants reported having questioned their AI assistants by asking follow-up questions multiple times. Surprisingly, only 14% noted any change in the AI's response after their queries. Among those whose AI answers fluctuated, opinions were divided:
  • - 25% believed the new information was more accurate than the original.
  • - 40% stated the new response was equivalent to the first.
  • - 26% expressed uncertainty over which answer was correct.
  • - 8% rated the revised response as less accurate than the initial answer.

These statistics underscore a concerning trend: many users remain skeptical of AI's reliability while not consistently cross-checking its outputs.

Research Insights on AI Model Evaluation



Complementing the survey findings, TELUS Digital conducted a rigorous study evaluating the stability and correctness of Large Language Models (LLMs) in situations where they faced skeptical prompts. The research scrutinized several top-tier models, including OpenAI's GPT-5.2, Google's Gemini 3 Pro, Anthropic's Claude Sonnet 4.5, and Meta’s Llama-4, assessing their ability to maintain correct responses under scrutiny.

Using the Certainty Robustness Benchmark, a collection of 200 math and reasoning questions, researchers evaluated how well these models performed when challenged with prompts requiring self-assessment of their accuracy. The results displayed varying levels of reliability:
  • - Google Gemini 3 Pro typically upheld its correct answers when questioned, demonstrating a highly effective alignment with its confidence levels.
  • - Claude Sonnet 4.5 showed some ability to adapt but was often reluctant to modify its responses, even when faced with direct contradictions.
  • - OpenAI's GPT-5.2 exhibited a tendency to alter correct answers in response to follow-up queries, indicating a vulnerability to perceived pressure from users.
  • - Meta's Llama-4, although initially scoring lower in accuracy, occasionally rectified mistakes when prompted to reconsider its answers.

Overall, the research concluded that follow-up prompts do not enhance accuracy reliably and, in some cases, could lead to incorrect responses.

Understanding AI Limitations in Everyday Use



Despite noting AI's frequent imperfections—88% of respondents acknowledged witnessing AI errors—there is a discernible lack of consistent fact-checking behavior among users. The data showed that only:
  • - 15% always verify AI outputs.
  • - 30% usually verify them.
  • - 37% sometimes cross-check.
  • - 18% either rarely or never fact-check.

Interestingly, the majority of participants believed that it is their responsibility to validate crucial information, suggesting a high level of engagement with AI outputs but a lack of trust in their accuracy.

Building Trustworthy AI at Scale



These findings illuminate the urgent need for enterprises to prioritize the quality of training data and model evaluation processes as they integrate AI into business practices. Some critical recommendations for building reliable AI systems include:
  • - Investing in quality, expert-guided data that ensures comprehensive learning.
  • - Implementing solid data annotation and validation processes to enhance the accuracy of training datasets.
  • - Utilizing end-to-end data solutions that facilitate ongoing testing and improvement of AI models throughout their lifecycle.
  • - Establishing robust human oversight and adaptive systems that can evolve with the demands of AI technology.

In conclusion, for organizations seeking to develop dependable AI solutions, the emphasis should not solely rest on user engagement but must also focus on the foundational aspects of AI training and deployment. TELUS Digital aims to position itself as an independent and trusted partner in this space, upholding high standards in data quality and AI model reliability. To learn more about TALUS Digital's offerings and how they can aid in navigating AI challenges, interested parties are encouraged to visit their website or engage directly with their teams.

Topics Consumer Technology)

【About Using Articles】

You can freely use the title and article content by linking to the page where the article is posted.
※ Images cannot be used.

【About Links】

Links are free to use.