What Changes When Machines Read Science and Public Discourse?

Intro

The internet hums with science: fresh findings, dramatic headlines, and countless threads debating what a study means for our lives. Yet the stream isn’t always faithful to the original work. Dustin Wright, at the University of Copenhagen’s Department of Computer Science, spent years building a toolbox to study that translation layer between papers, news, and social media. He led a body of work that reframes how we think about truth in science: the goal isn’t merely labeling statements as true or false, but understanding how information itself changes as it travels through different kinds of texts. The result is a tapestry of data, models, and ideas that aim to help machines read science with a human sense of judgment.

In Wright’s own words, this work is about machine understanding of scientific language at scale. It ranges from automatic fact checking to learning with limited data, and it builds a bridge from dry abstracts to the messy world of journalism and online discussion. The central bet is that by analyzing how the same finding is described across papers, news, and posts, we can detect exaggeration, hedging, and other information changes that shape public understanding. This isn’t just academic curiosity; it’s a practical step toward better science literacy in a world awash with information—and misinformation.

A New Lens: Information Change in Science

Traditionally, fact-checking asks: is this claim true or false, given what we know? Wright argues that in science communication, the spectrum is subtler. A single sentence can echo a finding, then another article can recast it with more certainty, more nuance, or more hype. So the core question becomes: how does the information actually change when it moves from a scientist’s paper to a journalist’s paraphrase to a tweet? To answer this, the work introduces a paradigm that measures information change across texts rather than forcing a binary verdict on truth. It’s like tracking a musical melody as it travels through different arrangements, each version adding or trimming notes, emphasis, or tempo, while still being recognizably the same tune.

That shift matters because it reframes the problem of misinformation. If we can quantify how findings are restated, exaggerated, or hedged, we gain a more precise map of where misalignment appears and why it matters for public action. It also helps journalists and educators understand how their wording might tilt perception—intentionally or not—without altering the core science. Wright’s approach isn’t about discarding veracity; it’s about recognizing a broader landscape where information can drift in meaningful ways as it circulates. The work behind this new lens comes from the University of Copenhagen, where lead author Dustin Wright and adviser Isabelle Augenstein push the envelope on how machines read scientific text at scale.

From Check-Worthy Claims to Adversarial Robustness

One practical pillar in Wright’s portfolio is a new take on fact checking that recognizes the domain’s friction and subjectivity. In the general sense, check-worthiness is about flagging statements that deserve verification. Wright’s group reframes this as a positive-unlabelled learning problem: they treat data as a mix of clearly check-worthy positives and unlabelled items whose status is uncertain. This mirrors how real-world judgment works—experts don’t label every sentence, and many judgments hinge on context, background knowledge, and disciplinary norms. By embracing this uncertainty, their models become more robust when facing unfamiliar topics or domains.

The research shows intriguing transfer effects. Pretraining on Wikipedia’s citation-needed data, for example, improves performance when the model later tackles Twitter rumours or political speeches. The key trick is a method dubbed Positive Unlabelled Conversion, or PUC, which identifies highly confident positives among the unlabelled pool and elevates them into a training signal. The upshot: a unified approach to check-worthiness across domains becomes feasible, and the learned representations generalize better to out-of-domain data. This is not merely academic; it suggests a practical path to scalefact-checking tools that can operate beyond the narrow confines of a single dataset.

Behind these efforts lies a crucial insight: check-worthiness is subjective, and labels drift across communities. The work’s experiments show that Wikipedia and Twitter data can align with the broader idea of check-worthiness, while political speech data can resist simple cross-domain transfer, likely reflecting different editorial norms and audience expectations. Still, the broader theme—training with soft signals and learning to triage what’s worth verifying—holds promise for more scalable, humane information quality tools in a noisy information landscape.

Adversarial Claims and the Quest for Coherence

If you want to test a fact-checker, you don’t just want to trip it up with random noise—you want adversarial inputs that are plausible and coherent. Wright’s team develops a method to generate label-cohesive, well-formed adversarial claims. They pair a robust search for universal triggers with a language model to produce new claims that preserve meaning while steering the verdict of the verification system toward a target class. In other words, they aim to craft claims that feel natural, not jarringly robotic, so that a system’s vulnerability is revealed under realistic conditions.

The approach wrestles with a key tension: many simple adversarial tweaks (like negation words) trivially flip a label but destroy the sentence’s naturalness. To counter this, the researchers add a semantic similarity objective when searching for triggers, and they bring in GPT-2 to generate coherent, trigger-laden claims. The result is a suite of adversarial examples that stress-test the model without becoming nonsensical. The broader takeaway is not that systems will be perfect, but that we can systematically discover their blind spots and design better defenses, a vital step for responsible AI in science.

Beyond fortifying fact-checkers, this line of work also hints at a healthier ecosystem for science communication. If journalists and editors know that readers’ eyes will test claims against a modestly robust verifier, there’s a built-in incentive to keep phrasing accurate and precise. It’s a quiet nudge toward transparency—an invitation to explain not just whether a claim is true, but how the claim maps onto the underlying evidence.

Domain Adaptation: Teaching Big Models to Travel Light

Scientists don’t speak with a single accent. Medicine, biology, computer science, and psychology each have their own jargon, their own typical sentence structures, and their own ways of shaping arguments. Wright’s work probes how well large pretrained transformers—models like BERT, RoBERTa, and their descendants—can travel across these domains without retraining from scratch. The central question: can we build robust, domain-agnostic readers of science, or do we need a chorus of domain experts perched inside the model?

The findings are nuanced. In multi-source domain adaptation experiments, a simple averaging of domain-specific experts often beat more elaborate mixing strategies. Domain adversarial training, which forces the model to learn domain-invariant representations, can help in some settings but does not always boost target-domain performance. The message isn’t that these techniques are useless; it’s that the real world is messy, and large language models tend to produce homogenized classifiers when trained on diverse scientific corpora. The best path forward may be to lean on strong pretraining and targeted data scaffolds rather than trying to fine-tune a single giant model to be a chameleon across all fields.

These results matter because they shape how we deploy AI to understand science across disciplines. If we want machines that can assist researchers and readers alike, we’ll need adaptable, transparently trained tools that respect field-specific nuances rather than pretend to erase them. Wright’s experiments offer a map of what works—and what doesn’t—for cross-domain language understanding in science, a crucial guide as AI becomes more embedded in how we read and evaluate research.

SPICED: Measuring Information Change Across Media

One of Wright’s grand contributions is the SPICED dataset—Scientific Paraphrase and Information Change Dataset. SPICED pairs scientific findings drawn from papers with their counterparts in news and tweets, then asks annotators to rate how much information the two sentences convey about the same finding. The key metric, Information Matching Score (IMS), ranges from completely different to completely the same in terms of the underlying finding. SPICED isn’t just about semantic similarity; it’s about how the essential information changes when it travels through different media and genres.

What does SPICED reveal? It shows that general news outlets tend to exhibit more information change than press releases or science-and-technology outlets. It also finds that organizational accounts on Twitter tend to keep closer to the original finding, while verified users with large followings often introduce more change. These patterns aren’t judgments about good or bad journalism; they’re diagnostics about how information is communicated in public spaces, with implications for how readers calibrate trust and how platforms surface science content.

Beyond descriptive findings, SPICED proves practical. Models trained on SPICED transfer to tasks like evidence retrieval for scientific fact checking, improving the ability to find relevant supporting sentences across domains. The dataset also enables large-scale analyses of science communication, revealing which sections of papers are more prone to exaggeration and where the message drifts most in media coverage. It’s a new lens on the ecology of science communication, one that invites researchers, journalists, and readers to think more explicitly about the information changes that shape public understanding.

CiteWorth and the Transferability of Scientific Context

Another pillar of Wright’s thesis is CiteWorth, a large, carefully curated dataset for cite-worthiness detection. The goal is to identify sentences in scientific papers that truly demand a citation—an indicator of where external support is essential for trust. CiteWorth spans ten scientific domains and contains over a million sentences, contextually framed at the paragraph level to preserve surrounding meaning. The result is a robust resource for training models to recognize when a statement needs a citation, a seemingly small but consequential skill for reliable science communication.

Pretraining on CiteWorth yields meaningful downstream gains. SciBERT-type models trained with CiteWorth data show improved performance on citation-intent classification tasks, and the dataset itself acts as a useful scaffold for cross-domain domain adaptation. The broader takeaway is a practical one: quality, domain-anchored scaffolds—built from real scientific writing—can dramatically improve performance when applied to downstream tasks that matter to researchers and the public alike. The work behind CiteWorth reinforces the idea that the backbone of good science communication AI is not just vast data, but data that respects the structure and discipline of science itself.

In short, CiteWorth demonstrates that context-rich, domain-aware pretraining can yield better readers of science, capable of determining when a citation is warranted and how to frame it responsibly in downstream tasks such as citation-intent understanding and document comprehension. It’s a reminder that expertise—captured in carefully assembled corpora—can travel with machine models to new tasks, enabling more faithful and useful AI assistants in science.

Zero-Shot Scientific Fact Checking and Claim Generation

The thesis doesn’t stop at understanding; it also experiments with generation as a tool for fact checking. Wright and colleagues develop CLAIMGEN-ENTITY and CLAIMGEN-BART, two claim-generation approaches that distill a sentence into atomic, checkable claims. They augment this with KBIN, a knowledge-base-informed negation technique that crafts convincing, domain-relevant refutations by replacing entities with closely related concepts from UMLS. The goal is to generate high-quality, check-worthy claims and their negations to train and stress-test fact-checking systems, including zero-shot setups that don’t rely on hand-labeled data.

In SciFact-style benchmarks, training a model on claims generated by CLAIMGEN-ENTITY or CLAIMGEN-BART achieves around 90 percent of the performance of a model trained on in-domain manually written claims. That’s a striking demonstration that synthetic data, carefully grounded in domain knowledge and linguistics, can meaningfully bridge the gap when labeled data is scarce. The work also deploys a human-in-the-loop evaluation to assess fluency, faithfulness, and atomicity, showing that generated claims can be both convincing and technically accurate when constructed with care.

The implications are twofold. First, it offers a practical path to scaleup scientific fact-checking datasets without bearing the full cost of manual annotation. Second, it prompts a broader conversation about the trustworthiness of AI-generated training data in high-stakes domains like health and science. If we want machines to assist with truth—not replace human judgment—we need to design, validate, and transparently report how synthetic data shapes their reasoning.

Why This Matters: A Public-Driven Vision for Science AI

Taken together, Wright’s thesis sketches a road map for AI that can read science with nuance, help people navigate complex claims, and illuminate how information moves through society. It’s a practical program for the near term: build scalable data and models that identify when scientific statements should be checked, test those systems against adversarial and cross-domain challenges, and develop datasets that reflect the diversity of scientific writing across disciplines and media.

But the work also invites humility. Science itself evolves; what’s considered “truth” today may be revised tomorrow as new evidence arrives. A good AI reader must respect that dynamism, track how information changes, and be transparent about the limits of certainty. Wright’s emphasis on information change—rather than a rigid veracity verdict—acknowledges the living nature of scientific knowledge and seeks to support public understanding rather than rigid policing of it. The researchers’ center of gravity is the University of Copenhagen, where Dustin Wright, guided by Isabelle Augenstein, advances a humane, scientifically grounded approach to machine understanding of language.

In a world flooded with studies and sound bites, this work is a reminder that technology can be a companion for discernment. If machines can map how a claim travels from a lab bench to a newsroom to a tweet, we stand a better chance of keeping the core of science intact while making its insights accessible to more people. It’s not a guarantee of truth, but a disciplined, data-informed method for finding it—and for understanding where, when, and why science gets reshaped along the way.

Conclusion: Toward Machines That Read Science Like People Do

What makes this body of work powerful isn’t a single technique, but a tapestry of methods that together push natural language processing toward a more honest, scalable study of science. From positive-unlabelled learning that respects subjectivity, to coherent adversarial claims that stress-test verifiers, to multi-domain adaptation that respects field-specific language, to large-scale data resources like SPICED and CiteWorth, the thesis pieces together a vision of AI that can help society navigate scientific claims with greater clarity.

As Wright’s research unfolds, we may not get perfect answers about every claim. But we can expect machines that better illuminate how information changes across papers, news, and social media—and, crucially, how to keep that change tethered to the evidence that science actually produced. That’s a meaningful stride toward a more informed public, aided by AI that reads science not just for what it says, but for how it travels—and for how we, as readers, should interpret it.