AI in Science. The dawn of LLM-driven discovery.

The emergence of GPT-5 marks a transformative shift in scientific discovery, showcasing AI's ability to synthesize novel insights through advanced reasoning capabilities.

Article written by

Jan Lisowski

The Taste of LLM-Driven Scientific Discovery: What GPT-5’s Reasoning Unveils

Something remarkable happened in 2025: a research team, exploring the boundaries of machine-driven science, posed a challenge to GPT-5 Pro: propose a therapeutic intervention for an untreatable food allergy. After just 12 minutes of reasoning—no manual experiments, no literature review, just raw AI abstraction—the model suggested repurposing a known drug. When the researchers double-checked, they found that GPT-5’s independent reasoning had converged on the exact same molecule as a then-unpublished (but peer-reviewed) biomedical breakthrough.

This was not a statistical fluke or a parlor trick of information retrieval. This is a glimpse of the future promised by large language models (LLMs) as scientific discovery engines[1].

What Changed in GPT-5?

GPT-5’s architecture is not just bigger—it’s smarter and differently wired. The Pro variant, in particular, builds on parallel reasoning chains, which means it does not simply output the next-token prediction but instead runs multiple internal debates, converges on the most coherent answer, and crucially, outputs the reasoning process[5]. This is a departure from the brute-force scaling of parameters, moving toward models that combine generative reasoning, multimodal understanding, and self-oversight.

On benchmarks, GPT-5 is not just a better pattern-matcher. It achieves state-of-the-art results in health (46.2% on HealthBench Hard), coding (74.9% on SWE-bench Verified), and multimodal reasoning (84.2% on MMMU, interpreting visual and text modalities together)[3]. These are leaps, not steps, beyond GPT-4.

Model-Driven Discovery: Beyond Serendipity

There’s a myth that AI can only search knowledge, not discover new knowledge. This incident shatters that myth—for the first time, an LLM did not just summarize but synthesized a novel therapeutic insight, independently and quickly. The drug in question was not trivially associated with allergies; GPT-5 had to make a non-obvious connection between molecular pathways, clinical evidence, and pathology.

This is not yet autonomous AI discovery—the model was supervised, the task was bounded, and the reasoning was rapid but not exhaustive—but it is a clear signal that AI is shifting from a literacy tool to a collaborator in the scientific process[1]. The same model has already played a tangible role in mathematics and quantum complexity proofs[4][6].

Technical Cautions: The Jagged Horizon

Not all domains will advance at the same pace. GPT-5 Pro’s brilliance is jagged: it excels at structured, knowledge-dense reasoning tasks but can still stumble on commonsense or creativity outside its training distribution[5]. Its “thinking models” are not uniformly competent; their performance depends on the task’s structure, the quality of the prompt, and the internal reasoning process selected by the system[7].

Moreover, the hallucination rate—while reduced by 65% compared to predecessors—is not eliminated[2]. The model’s confidence does not guarantee correctness, so the final scientific judgment still requires human oversight.

Implications for Research Practice

For AI researchers and domain scientists, this means that LLMs are entering a new phase. They are no longer just for writing and summarizing, but for generative reasoning: proposing hypotheses, identifying anomalies, and even suggesting experimental directions, all within a multimodal, cross-domain framework[1][3].

This does not eliminate human scientists. Instead, it redefines their workflow: the new scientist will curate prompts, interpret reasoning traces, and validate model outputs—sometimes making the final creative leap, sometimes recognizing the model’s leap and following it up. The combination of machine speed and human judgment is where the next breakthroughs will happen.

Looking Forward: The Road to Autonomous Discovery

If this is what 12 minutes of thinking from a Pro model can do, what happens with 12 hours, or when dozens of such models “think” together, debate, and refine? The architecture—parallel reasoning, self-critique, multimodal grounding—suggests a future where AI does not just assist discovery, but accelerates and diversifies it, connecting concepts across fields that no human could survey alone[1].

But as Scott Aaronson, Terence Tao, and others have noted, originality in AI is hard to prove. Sometimes, the model’s “discovery” was already present in its training corpus, and only deep human expertise can confirm novelty[4]. This is a fundamental challenge for quantifying and trusting AI-driven breakthroughs.

The lesson for 2025 is clear: LLM-driven scientific discovery is real, it’s here, and it’s improving fast. The best practice—for now—is to use these models as reasoning sparring partners: probe them, challenge them, and most of all, learn from their mistakes as well as their insights. The era of AI as partner in discovery has begun.

Article written by

Jan Lisowski

Want to see us in action?

Schedule a 30-min demo

Get candidates this week

Short-list in 2–4 days. Pilot in 1–2 weeks. Scale on proof.