Large language models (LLMs) trained on massive text datasets can extract patterns from scientific literature, allowing them to predict scientific outcomes with superhuman accuracy, according to the research published in Nature Human Behaviour.
According to the researchers, this demonstrates their potential as formidable instruments that can speed up research in ways that go well beyond knowledge retrieval.
Since the advent of generative AI like ChatGPT, much research has focused on LLMs’ question-answering capabilities, showcasing their remarkable skill in summarising knowledge from extensive training data. However, rather than emphasising their backward-looking ability to retrieve past information, we explored whether LLMs could synthesise knowledge to predict future outcomes,
Scientific progress often relies on trial and error, but each meticulous experiment demands time and resources. Even the most skilled researchers may overlook critical insights from the literature. Our work investigates whether LLMs can identify patterns across vast scientific texts and forecast outcomes of experiments.
Dr Ken Luo
In order to assess how well large language models (LLMs) can forecast neuroscience outcomes, the multinational research team started by creating BrainBench.
There are several pairs of abstracts from neuroscience studies on BrainBench. One version of each pair is an actual study abstract that provides a brief overview of the study’s history, methodology, and findings. The background and methodology are identical in the other version, but the results have been altered by subject-matter specialists in the field of neuroscience to provide a believable but inaccurate conclusion.
In order to evaluate whether the AI or the human could accurately identify which of the two paired abstracts was the real one containing the actual study results, the researchers tested 171 human neuroscience experts and 15 different general-purpose LLMs. All of the experts had completed a screening exam to verify their expertise.
With an average accuracy of 81% and a human accuracy of 63%, the LLMs all performed better than the neuroscientists. The accuracy of the neuroscientists was 66% lower than that of the LLMs, even after the study team limited the human replies to only individuals who had the highest level of competence for a particular area of neuroscience (based on self-reported expertise). The researchers also discovered that LLMs were more likely to be right when they had greater confidence in their choices. According to the researchers, this discovery opens the door for future collaboration between human specialists and accurately calibrated models.
Then, by training it on neuroscience-specific material, the researchers modified an already-existing LLM (a version of Mistral, an open-source LLM). With an accuracy of 86%, the new LLM with a neuroscience focus, called BrainGPT, outperformed the general-purpose Mistral version, which was 83% accurate, in forecasting research outcomes.
In light of our results, we suspect it won’t be long before scientists are using AI tools to design the most effective experiment for their question. While our study focused on neuroscience, our approach was universal and should successfully apply across all of science,
What is remarkable is how well LLMs can predict the neuroscience literature. This success suggests that a great deal of science is not truly novel, but conforms to existing patterns of results in the literature. We wonder whether scientists are being sufficiently innovative and exploratory.
Professor Bradley Love
Building on our results, we are developing AI tools to assist researchers. We envision a future where researchers can input their proposed experiment designs and anticipated findings, with AI offering predictions on the likelihood of various outcomes. This would enable faster iteration and more informed decision-making in experiment design.
Dr Ken Luo
Source: UCL News
Journal Reference: Luo, Xiaoliang, et al. “Large Language Models Surpass Human Experts in Predicting Neuroscience Results.” Nature Human Behaviour, 2024, pp. 1-11, DOI: https://doi.org/10.1038/s41562-024-02046-9.