General Discussion

highplainsdem

(63,222 posts) Tue May 26, 2026, 04:48 PM 19 hrs ago

AI Just Isn't Right (Wired, 5/26/26 - a human fact-checker FTW over AI) [View all]

https://www.wired.com/story/fact-checking-ai/

-snip-

In any article that comes across WIRED’s fact-checking desk, there’s usually a decent amount of “b-matter”: statistics, news events, quotes, anything that helps contextualize the topic. Fact-checkers tend to Google this basic information, and that process, in the form of the search engine’s dreaded AI Overviews, constitutes my main interaction with AI. In my professional opinion, it’s unusable—wrong—about a third of the time.

This might be a generous assessment, though. A March 2025 study from the Tow Center for Digital Journalism found that more than 60 percent of responses from AI-powered search engines were inaccurate. A BBC study puts the wrongness of chatbots closer to 45 percent, the number I see cited more often. Because percentages are distancing, let me put this more plainly: AI could be wrong about half the time.

Does it matter which model? Elon Musk has said Grok is the smartest, but I haven’t seen much research that agrees. Claude led the pack in RealFactBench, a fact-checking-focused benchmark test developed by computer scientists in China and the UK last year. It scored 73 percent accuracy across all metrics. (To be fair, Grok was not assessed.) Another benchmark, SimpleQA, developed by OpenAI in October 2024, posed more than 4,000 single-answer questions to models from OpenAI and Anthropic. None of the models exceeded 50 percent accuracy. Google updated the benchmark earlier this year, winnowing the question set to 1,000. Gemini 2.5 Pro came out on top, with 55.6 percent accuracy.

Then there’s the models’ own assessments. When I asked ChatGPT how accurate the major LLMs are, it told me that most models had 90 to 96 percent accuracy on some professional-style tests. It then offered a link, confusingly, to a paper on a sleep medicine certification exam. On “general real-world questions,” it simply offered me the rate at which models like it have been shown to hallucinate: 1 to 2 percent, apparently, though when I tried to click through to that referenced source, it didn’t exist.

-snip-

8 replies

= new reply since forum marked as read

Highlight:

AI Just Isn't Right (Wired, 5/26/26 - a human fact-checker FTW over AI) [View all] highplainsdem 19 hrs ago OP

K&R'D snot 18 hrs ago #1

Wired does some stellar reporting. yellow dahlia 17 hrs ago #2

Including on political issues, despite whining from rightwing readers who want the magazine's editors highplainsdem 13 hrs ago #5

They report on truth. Truth is "left" leaning. yellow dahlia 13 hrs ago #8

Techbros: "THAT'S WHY WE NEED MORE DATA CENTERS!!!!!!!" durablend 16 hrs ago #3

Yes, with more stolen data, and then their flawed tech with FINALLY work. highplainsdem 13 hrs ago #6

AI, who are The Beatles... lame54 16 hrs ago #4

I would not be surprised if chatbots have sometimes gotten their names wrong. highplainsdem 13 hrs ago #7