The University of York may not be a household name in every kitchen-table health debate, but it sits at the center of a quietly important twist in how we think about cancer screening. A team there, led by Sofia Dias with colleagues Yiwen Liu, Stephen Palmer and Marta O Soares, asked a deceptively simple question: when you’re trying to judge how well a blood test catches many different cancers, what should you do when the data come in unevenly across tumor types and disease stages? Their answer isn’t a dramatic laboratory breakthrough so much as a smarter way to reason about evidence itself.
They study Galleri®, a multi-cancer early detection test that looks for traces of circulating tumour DNA in blood. In principle, if the test reliably detects cancers at early stages across many types, it could shift cancers to earlier, more treatable phases and improve outcomes. In practice, sensitivity—the chance the test spots cancer when it’s there—varies a lot by cancer type and by stage. So the big question is how to combine the scattered bits of evidence into something that can guide policy and practice without pretending the data are neater than they are. The York team answers with a Bayesian framework that lets evidence “borrow” strength from related cancers and stages, while carefully guarding against overconfidence when data are sparse or inconsistent.
But the core payoff is not a single number of how good the Galleri test is. It’s a method for thinking about what we know, what we don’t know, and how much we should trust each inference as we plan screening programs in real life. The work is a reminder that statistics isn’t just math; it’s a mapmaker’s job—drawing lines between islands of data so public health decisions don’t stumble over gaps.
A shared map for many cancers
At the heart of the paper is a family of Bayesian information-sharing models. The intuition is simple: if two cancers shed ctDNA into the blood at similar stages, then data from one cancer can help sharpen the estimate for the other—especially when that other cancer has few cases in the dataset. But the authors are careful about how and when sharing happens. They don’t assume all cancers are the same; instead, they test multiple sharing rules, constrained by what we know about biology and tumor behavior.
They start with a base model that treats sensitivity for each cancer type and each stage as independent but ordered: for a given cancer type, sensitivity should not decrease as the cancer advances from stage I to stage IV. This monotonicity constraint is a nod to the biology: a bigger tumor generally sheds more ctDNA, making detection easier. Yet the real world is messier. Some cancers stubbornly refuse to follow the textbook pattern, and data for rarer cancers are sparse. The base model by itself is a useful anchor, but it’s not the whole story.
To move beyond the anchor, the team builds several information-sharing structures. Exchangeability models imagine that the log-odds of detecting cancer at a given stage can be drawn from a common distribution across cancers. In other words, there’s a shared “signal” at that stage, even if individual cancers wiggle a bit differently. They also add mixture and class models: mixtures allow a cancer to contribute to sharing with a certain probability, so “extreme” cancers don’t disproportionately drag the others along; class models group cancers into pre-defined or data-driven clusters, within which sharing is allowed but across groups it isn’t. The big idea is to let the data decide how much borrowing is reasonable, rather than forcing a single, blunt assumption on all cancers at all stages.
The results aren’t a single verdict but a set of nuanced findings. The evidence most strongly supports the idea that sensitivity can be shared across cancer types for stage IV disease. In plain terms: once a cancer is in the late, more detectable stage, the test’s performance looks more similar across different cancers. That’s a helpful anchor when planning late-stage screening expectations across a spectrum of cancers.
There’s also meaningful, but more conditional, support for sharing across cancer types at earlier stages, provided certain low-sensitivity cancers are excluded. In practical terms: if you ignore cancers that consistently show weak early detection signals (in the data at hand), you can borrow strength across the rest to tighten early-stage estimates. It’s a reminder that in evidence synthesis, the quality and context of the data matter as much as the statistical method.
The authors compare seven modeling approaches and assess fit, precision, and interpretability. The base model remains a strong reference point, but it’s the additional sharing models that reveal where information really travels in the data. The top performers—Models 2 and 3, with the latter offering a practical alternative when you want to avoid pulling in cancers that don’t behave like the others—strike a balance between realism and precision. The headline upshot: you can gain sharper estimates without overstepping what the data can support, but heterogeneity remains a stubborn reality.
What drives ctDNA shedding and where the model lands
The paper doesn’t pretend we can explain away all the differences in test sensitivity with one neat biological rule. Instead, the modeling approach is anchored in what the literature suggests about ctDNA shedding: newer, larger, or more widespread tumors tend to release more ctDNA into the bloodstream, boosting the chance the Galleri test will pick up a signal. The authors draw on a targeted literature review to justify sharing decisions: ctDNA levels rise with tumor burden and with stage, central nervous system cancers can be harder to detect due to the blood-brain barrier, and certain organ-specific factors can influence ctDNA dynamics.
Yet the data show substantial heterogeneity across cancer types and stages, especially for early-stage disease. Some cancers—lung, colon/rectum, head and neck—show fairly high sensitivity even at later stages, while others, such as certain kidney or thyroid cancers, exhibit much lower early-stage detectability. In other words, a single, universal claim about early detection is fragile in a multi-cancer test. The models make this fragility explicit: even with advanced sharing rules, the precision gains depend on how much, and how plausibly, cancers resemble each other in their ctDNA behavior.
The study’s empirical backbone is CCGA3, a sub-study within the Galleri validation program. The authors include all the data available on sensitivity by cancer type and stage, and they openly acknowledge where the data are thin. They also test how the results would change if you reorganize cancers into different groups or if you allow stage-specific sharing probabilities to differ across cancers. Across all configurations, a recurring theme emerges: stage IV sharing works, but the more you try to squeeze early-stage signals into a shared story, the more you run into heterogeneity that the data cannot ignore.
One striking practical note is that the models do not pretend to have solved every uncertainty. They quantify heterogeneity explicitly and test how much precision can be gained under different assumptions. The result is a more honest map of what policy-makers can and cannot rely on when planning population screening with a multi-cancer test. The authors even point to the NHS-Galleri trial as a critical source of future data. The trial’s results will help test whether the real-world UK population experience aligns with what these models anticipate, or whether new patterns of heterogeneity will push researchers to revise sharing assumptions further.
Beyond the numbers, the work also becomes a blueprint for how to reason about cross-indication tests. If a blood test is rolled out to detect dozens of cancers, the question isn’t just “how good is it?” but “how good is it across the tapestry of cancers we care about, and how should we improve the picture as new data arrive?” The paper answers with a pragmatic, principled approach: let biology inform how much you borrow, let data test those assumptions, and stay ready to adapt as evidence evolves.
Why this matters for screening and policy
At stake is a policy question with real-world consequences: if a screening program is to be broad and cost-effective, it must be based on credible estimates of how well the test detects cancers at the moments it matters most. For MCED tests like Galleri, the most valuable moment is often earlier in the disease when treatment works best. That makes early-stage sensitivity a critical lever for value. But early-stage data are sparse and noisy for many cancers. The authors’ Bayesian approach offers a disciplined way to combine disparate evidence without pretending it all lines up neatly.
One clear takeaway is humility about precision. The research shows that, given current data, information sharing yields meaningful but bounded gains. The strongest, most consistent improvement comes from sharing stage IV sensitivity across cancers, because the signal there is more uniform. For early stages, heterogeneity remains a wall that sharing alone cannot scale, unless researchers deliberately exclude outliers—an option that itself requires careful justification and external validation.
Still, the method matters. It provides a transparent framework for decision-makers who need to budget, plan follow-up testing, and estimate the potential impact of a screening program on outcomes and costs. It also highlights where data collection should be steered next: more early-stage data across diverse cancer types, better characterization of ctDNA shedding across tumor biology, and, crucially, data from the real-world NHS-Galleri trial. The authors point out that expert opinion could supplement empirical data to refine sharing assumptions, especially in areas where biology suggests plausible similarities but evidence remains thin. In short: the approach is a map, but it’s a map that improves as you collect better terrain data.
The openness of the work is worth noting. The data and code behind the analyses are publicly available on GitHub, inviting others to re-run, test alternative assumptions, or apply the framework to other cross-indication tests. And the study doesn’t pretend to be the final word; it charts a course for how evidence synthesis could evolve as more data pour in, including results from ongoing trials. That curatorial role—guiding interpretation and interpretation-guided decisions—is itself a kind of public health service.
In the end, the University of York team doesn’t claim to have solved the riddle of multi-cancer screening. What they do offer is a robust, flexible way to reason about a complex truth: the promise of a blood test that can sense many cancers lives in the interplay between biology, statistics, and the data we have today. Their Bayesian map shows where we can travel with confidence, where we must walk with caution, and where the next, louder answers will come from data that only a broad, real-world trial can supply.
As the researchers note, the NHS-Galleri trial will bring in data from a screening population that matters for policy and practice. It will also test whether the assumptions we borrow from a literature-driven model hold when millions of people are scanned and the test’s real-world performance is observed at scale. The study’s stance—careful, evidence-driven, and openly collaborative—feels well-timed for a moment when medicine is becoming as much about decision science as it is about biology. If the Galleri approach proves viable at scale, the way we evaluate complex tests may itself be transformed by models that learn not just from data, but from the limits of what those data can tell us—and from our willingness to listen to that limit with humility and rigor.