Welcome to DU!
The truly grassroots left-of-center political community where regular people, not algorithms, drive the discussions and set the standards.
Join the community:
Create a free account
Support DU (and get rid of ads!):
Become a Star Member
Latest Breaking News
Editorials & Other Articles
General Discussion
The DU Lounge
All Forums
Issue Forums
Culture Forums
Alliance Forums
Region Forums
Support Forums
Help & Search
Science
Related: About this forumAI model OpenScholar synthesizes scientific research and cites sources as accurately as human experts
https://phys.org/news/2026-02-ai-openscholar-scientific-cites-sources.htmlUniversity of Washington
This looks really interesting.
Keeping up with the latest research is vital for scientists, but given that millions of scientific papers are published every year, that can prove difficult. Artificial intelligence systems show promise for quickly synthesizing seas of information, but they still tend to make things up, or "hallucinate."
For instance, when a team led by researchers at the University of Washington and The Allen Institute for AI, or Ai2, studied a recent OpenAI model, GPT-4o, they found it fabricated 78-90% of its research citations. And general-purpose AI models like ChatGPT often can't access papers that were published after their training data was collected.
So the UW and Ai2 research team built OpenScholar, an open-source AI model designed specifically to synthesize current scientific research. The team also created the first large, multi-domain benchmark for evaluating how well models can synthesize and cite scientific research. In tests, OpenScholar cited sources as accurately as human experts, and 16 scientists preferred its response to those written by subject experts 51% of the time.
The team published its findings in Nature. The project's code, data and a demo are publicly available and free to use.
. . .
For instance, when a team led by researchers at the University of Washington and The Allen Institute for AI, or Ai2, studied a recent OpenAI model, GPT-4o, they found it fabricated 78-90% of its research citations. And general-purpose AI models like ChatGPT often can't access papers that were published after their training data was collected.
So the UW and Ai2 research team built OpenScholar, an open-source AI model designed specifically to synthesize current scientific research. The team also created the first large, multi-domain benchmark for evaluating how well models can synthesize and cite scientific research. In tests, OpenScholar cited sources as accurately as human experts, and 16 scientists preferred its response to those written by subject experts 51% of the time.
The team published its findings in Nature. The project's code, data and a demo are publicly available and free to use.
. . .
2 replies
= new reply since forum marked as read
Highlight:
NoneDon't highlight anything
5 newestHighlight 5 most recent replies
AI model OpenScholar synthesizes scientific research and cites sources as accurately as human experts (Original Post)
erronis
10 hrs ago
OP
eppur_se_muova
(41,260 posts)1. Preferred 51% of the time ? That's a coin toss. nt
erronis
(23,069 posts)2. Further reading of the article might help. To me, the major advance is accurate citing of the sources.
Last edited Wed Feb 4, 2026, 04:50 PM - Edit history (1)
"Early on, we experimented with using an AI model with Google's search data, but we found it wasn't very good on its own," said lead author Akari Asai, a research scientist at Ai2 who completed this research as a UW doctoral student in the Allen School.
"It might cite some research papers that weren't the most relevant, or cite just one paper, or pull from a blog post randomly. We realized we needed to ground this in scientific papers. We then made the system flexible so that it could incorporate emerging research through results."
. . .
The team compared OpenScholar against other state-of-the-art AI models, such as OpenAI's GPT-4o and two models from Meta. ScholarQABench automatically evaluated AI models' answers on metrics such as their accuracy, writing quality and relevance.
OpenScholar outperformed all the systems it was tested against. The team had 16 scientists review answers from the models and compare them with human-written responses.
The scientists preferred OpenScholar answers to human answers 51% of the time, but when they combined OpenScholar citation methods and pipelines with GPT-4o (a much bigger model), the scientists preferred the AI written answers to human answers 70% of the time. They picked answers from GPT-4o on its own only 32% of the time.
"It might cite some research papers that weren't the most relevant, or cite just one paper, or pull from a blog post randomly. We realized we needed to ground this in scientific papers. We then made the system flexible so that it could incorporate emerging research through results."
. . .
The team compared OpenScholar against other state-of-the-art AI models, such as OpenAI's GPT-4o and two models from Meta. ScholarQABench automatically evaluated AI models' answers on metrics such as their accuracy, writing quality and relevance.
OpenScholar outperformed all the systems it was tested against. The team had 16 scientists review answers from the models and compare them with human-written responses.
The scientists preferred OpenScholar answers to human answers 51% of the time, but when they combined OpenScholar citation methods and pipelines with GPT-4o (a much bigger model), the scientists preferred the AI written answers to human answers 70% of the time. They picked answers from GPT-4o on its own only 32% of the time.