Solving the Knowledge Problem in Life Sciences

March 10, 2026

Ariana Milligan, Paula Mello Ferber, Vaibhav Adlakha PhD, Sandor Toth PhD, Marc Bellemare PhD

For more than two decades, the life sciences have promised themselves knowledge graphs. The intuition has always been correct: progress depends on understanding relationships and finding order in fundamentally messy data — whether those are between genes and proteins, compounds and targets, trials and outcomes, or mechanisms and phenotypes. Graphs, in principle, are the natural representation of this world.

And yet, most large-scale biomedical knowledge graph efforts stalled, fragmented, or quietly receded into maintenance mode, often transforming into bespoke enterprise projects that burn millions of dollars annually in manual curation and engineering just to fight data decay¹. For example, the Unified Medical Language System was and remains a landmark effort by the U.S. National Library of Medicine to unify medical vocabularies across clinical and research domains. However, the sheer scale of the problem is staggering: the UMLS Metathesaurus attempts to integrate over 200 source vocabularies, managing approximately 4.4 million distinct concepts and tens of millions of interconnected relationships². While it works, it is slow to reflect emerging biomedical concepts, trial endpoints, or mechanistic hypotheses—because these things can be so difficult to uncover, hidden behind press releases, jargon, and subcontractors. It is an infrastructure tool rather than an insights engine, one whose ignition isn’t quite there. The life sciences are saturated with similar tools and infrastructure that lack a web connecting their hidden and observable parts.

This failure is often misattributed to execution. In reality, it is structural. Knowledge graphs in life sciences did not fail because the idea was wrong, but because the ecosystem lacked two essential capabilities: scalable knowledge extraction and rich semantic representations that capture ambiguity and uncertainty. Without those, graphs became brittle artifacts rather than living systems.

Today, that constraint is changing. Large language models (LLMs) shift the feasibility frontier of knowledge graphs, but only if embedded in architectures that respect the epistemic complexity of science. These shifting constraints give rise to a new generation of systems, and this is where Reliant fits in.

A Brief History of Biomedical Knowledge Graphs

Early biomedical knowledge graphs emerged from a symbolic paradigm that consists of subject-predicate-object triples. Projects such as Bio2RDF sought to link heterogeneous biological databases using RDF—a data standard that strictly structures information into these triples (e.g., "Drug A" [subject] -> "targets" [predicate] -> "Protein B" [object])—and shared identifiers. The aforementioned Unified Medical Language System attempted something even more ambitious: a metathesaurus capable of harmonizing clinical and biomedical vocabularies across institutions, languages, and use cases.

These efforts were foundational. They provided standardized identifiers, ontological scaffolding, and a generation of researchers trained to think relationally. But they also revealed the limits of purely symbolic systems in a domain characterized by rapid discovery, contested definitions, and incomplete evidence.

Why Prior Attempts Stalled

1. The Scalability Crisis of Fragile Ontologies
Biomedical ontologies capture expert consensus at a given point in time. But science is not static. New mechanisms are proposed, disease taxonomies shift, endpoints are redefined, and clinical language evolves faster than formal ontologies can be revised. For example, when oncology shifted from classifying tumors purely by their tissue of origin to categorizing them by molecular drivers (such as HER2 or EGFR mutations), legacy ontologies rigidly built on anatomical hierarchies required massive, manual overhauls to remain relevant.

This exposed a massive scalability bottleneck. Because extraction pipelines relied heavily on manual curation or rigid, rule-based systems, adapting to these scientific shifts required sustained human labor. As the marginal cost of updating relationships skyrocketed, institutions couldn't keep up, and the graphs became stale. Consequently, these systems began to actively mislead: researchers missed critical, emerging connections, and downstream applications inherited severe blind spots.

2. Deterministic Semantics in a Probabilistic World
Most critically, early systems treated the representation of life sciences knowledge as binary: relationships either existed or they did not. While researchers could certainly run statistical algorithms over these networks to uncover non-obvious connections, the underlying data structure remained rigid. Life sciences are dominated by uncertainty, conflicting trial results, context-dependent effects, and incomplete causal chains. Deterministic graphs could not express confidence, disagreement, or evolving evidence, leading to overconfidence or underuse.

By forcing complex, contested science into definitive subject-predicate-object facts, these systems stripped away the epistemic nuance before the analysis even began, leading to overconfidence or underuse.

What’s Different Now

The emergence of LLMs changes the economics of knowledge graphs, but not automatically. Three advances matter, and only in combination.

Learned Semantic Representations

LLMs enable the extraction of structured relationships from unstructured scientific text at scale. Because of their massive pretraining corpus, they are experts at linguistic variation, evolving terminology, and incomplete syntax that dominate biomedical literature. Their generative and few-shot capabilities allow flexible, schema-driven extraction without task-specific retraining.

Crucially, this does not eliminate error. It shifts extraction from a brittle, high-precision regime to a probabilistic one. That tradeoff is acceptable only if downstream systems can reason about uncertainty.

Probabilistic Inference

Modern knowledge architectures increasingly treat inferred relationships not as facts, but as hypotheses with associated confidence. This allows systems to propagate uncertainty rather than collapse it. In life sciences, this distinction is non-negotiable: error does not disappear simply because it is ignored.

Explicit modeling of uncertainty also enables selective use. Systems can abstain, flag ambiguity, or request human input when confidence falls below a threshold, a capability absent in earlier generations of knowledge graphs.

Human-in-the-Loop Correction

Because LLMs “speak” in natural language, modern AI systems make it easy to place experts back into the loop, not as primary extractors, but as validators and adjudicators. This dramatically lowers the cost of keeping knowledge bases, graphs included, aligned with the scientific frontier while preserving epistemic best practices.

Why Life Sciences Are Uniquely Structure-able

Paradoxically, the same features that make life sciences difficult for generic AI systems make them well-suited to graph-based representations.

Unlike domains like retail or finance where insights are driven by directly measurable quantities such as inventory or revenue, life sciences reward systems that can represent structure without pretending to certainty. For example, a traditional system might assert as a hard fact that "Drug X inhibits Target Y." But scientific reality is rarely that clean: Drug X might inhibit Target Y strongly in vitro, show conflicting results in Phase II trials, and only work in patients with a specific genetic biomarker. You need a system that maps the structural relationship (Drug X to Target Y) while natively preserving the contradictory data, the clinical context, and the statistical confidence.

This is why we must move past the traditional, deterministic knowledge graph. When a structured system becomes probabilistic, dynamic, and updateable (anchoring every relationship directly to its underlying scientific proof rather than an abstract rule) it evolves into something fundamentally different. It becomes an Evidence Graph.

Where Reliant Fits in the Stack

At Reliant, we are building the definitive epistemic infrastructure for the life sciences. We view the organization of knowledge not as a static artifact, but as the dynamic foundation required to power ever more capable AI co-scientists.

Because we understand that the organization of knowledge must mirror the complex, probabilistic reality of biological data, we are not attempting to resurrect first-generation knowledge graphs. We sit at a fundamentally different layer of the stack.

Rather than treating our Evidence Graph as an end product, Reliant treats it as a continuously updated engine that supports reasoning, comparison, and exploration across the entire drug development lifecycle. This infrastructure interconnects:

The Biological Layer: Diseases and indications, targets (proteins, genes, receptors), pathways, biomarkers, phenotypes, patient populations, and epidemiology.
The Asset Layer: Compounds and drugs, formulations, routes of administration, target product profiles (TPPs), lines of therapy, and combination regimens.
The Development Layer: Clinical trials, endpoints, trial designs, preclinical data, and safety/adverse event profiles.

What does this look like in practice? When a clinical strategy team is faced with contradictory drug trial results or a sudden shift in the competitive landscape, Reliant empowers them to cut through the noise. Instead of manually synthesizing isolated papers and datasets, decision-makers can leverage Reliant to instantly surface the necessary, context-rich data required to evaluate novel targets, compare trial designs, and move ahead with confidence.

Why Now—and Why This Approach Works

The timing matters. Representation learning, probabilistic reasoning, and human-in-the-loop systems are mature enough to coexist. Organizational expectations have shifted towards transparency and greater awareness of uncertainty. And the volume of biomedical knowledge has exceeded the capacity of linear search and narrative review.

Reliant’s approach works not because it promises omniscience, but because it is designed for life sciences realism. It assumes ambiguity, encodes uncertainty, and keeps humans in the loop where judgment matters most.

The original vision of the knowledge graph didn't fail; its rigid structure simply wasn't fit for purpose in a fundamentally probabilistic domain. By keeping the best of relational mapping and leaving behind the deterministic brittleness of the past, we are no longer just building knowledge bases. Now, with the right architecture and the right humility, we are building Evidence Graphs. And they can finally do the work this industry has always needed.

Unlock Insights Others Miss.

Life Sciences Agentic AI empowering you to make decisions in minutes, not weeks.

Company About us Customer stories Partner with us Contact us Why reel pay

Resources Pricing Integrations FAQs

Solutions Compliance For Hiring Managers For enterprises

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

EN | FR Privacy Policy Imprint