Development
Two AI-based science assistants succeed with drug-retargeting tasks
May 20, 2026 Development Source: Ars Technica
Share this article
As the people at FutureHouse put this issue, “By focusing on ‘combinatorial synthesis’ (identifying non-obvious connections between disparate fields), Robin effectively targets ‘low-hanging fruit’ that human experts may overlook due to the compartmentalization of scientific knowledge.”
This is a task that’s well-suited to AI, which can chew through the peer-reviewed literature in the background while researchers do other things. This isn’t really a question of whether an AI could do something better or worse than a human; it’s more of an issue of whether any human would end up doing these sorts of searches at all.
By finding enough connections among disparate research, these tools can make suggestions—hypotheses, really—about the biology. This can include things like what processes underlie biological behaviors and what pathways and networks regulate those processes. And, in the cases explored here, it included suggesting known drugs that might target some of these pathways in diseased cells: acute myeloid leukemia in Google’s case, and a form of macular degeneration for FutureHouse.
As you might imagine, Google’s system is based on the company’s Gemini large language model. That helps the system interpret a statement of research goals provided by human scientists and starts a literature search to find relevant information and form hypotheses. Those are then evaluated relative to each other in a “tournament,” the results of which are evaluated by a Reflection agent. An Evolution agent can then make improvements to any surviving ideas, which can be sent back through the process.
Key criteria considered throughout this process include plausibility, novelty, testability, and safety. And the Reflection tool has access to external search tools, as access to the scientific literature “prevented the hallucination of seemingly novel but implausible hypotheses,” the company wrote.
As the paper puts it, scientists were kept in the loop at all times. In the search for potential drugs targeting leukemia, the suggestions made by the system were prioritized based on a review by a panel of experts, who had access to the literature Co-Scientist used to formulate its suggestions.
The results are what you would expect from cancer therapies. Some of the drugs identified were effective, but only against subsets of a panel of myeloid leukemia cells. That’s not unusual, given that there are multiple routes to unchecked growth, so drugs that block the route followed by one cell type may not be effective in cells that took a different route.
Google also mentioned that the system could do more general hypothesizing that doesn’t involve drugs, using an example of the spread of virulence genes in bacteria. But the details of that work were fairly sparse.
The system is also set up so that it’s model agnostic, allowing it to be switched over to better-performing models as AI systems evolve. But they also warn that, “Co-Scientist also inherits the intrinsic limitations of its underlying models, including imperfect factuality and the potential for hallucinations.”
FutureHouse’s system has some similarities but a couple of critical differences that go beyond naming all the agentic tools after birds. The main system, Robin, has access to specialized literature search tools. One, Crow, produces a concise summary of papers, while Falcon gives a deep overview of the information contained in the paper. The paper describing the system provides a clear sense of the advantages here: “Robin analyses 551 papers in 30 minutes compared to an estimated time of 540 hours for a human.”
Taking those summaries, Robin then formed a series of hypotheses about disease mechanisms for macular degeneration and used these tools to provide a detailed report on the evidence for each mechanism. An LLM judge then made pairwise comparisons among the hypotheses, which resulted in relative rankings—a bit like Google’s tournament system.
For starters, it’s important to note that these successes come in one of the easier parts of drug development (not that any part of it can really be said to be easy). The AIs weren’t being asked to design entirely new molecules, and most drugs fail during the animal and clinical trials phase, rather than during testing in cell culture. That’s not to say repurposing existing drugs is nothing—we already have safety profiles and agency approvals for these molecules, and many are off-patent and therefore cheap. But we’re not at the point where AIs are solving hard problems.
This sort of hypothesis—this mechanism underlies that disease, and the drug over there can target it—is also one of the more concrete forms of hypothesis in biology. In my career as a scientist, I had to develop hypotheses that were meant to address things like “mice with this mutation have a whole lot of defects in very different tissues; is there a single mechanism underlying them?” Or, “What’s going on at the border of this gene’s expression that is changing how cells respond to this signaling molecule?” It’s unclear how these systems could handle these more open-ended scientific problems.
That said, the problem of literature overload is a real one in many fields, and systems meant to address it could help us avoid a situation where all the information we needed was sitting around for a decade, but nobody put it together. Given we’re still working through AI’s growing pains, however, I’m also happy that there are at least two independently developed systems tackling this problem so that we can potentially run both and compare the results.
Nature, 2026. DOI: 10.1038/s41586-026-10652-y, /10.1038/s41586-026-10644-y (About DOIs).