Fermilab’s computing grid is more than a tech stack. It’s a living artery that pumps data, software, and collaboration across continents, fueling experiments that probe the mysteries of matter, energy, and the cosmos. For decades, the grid trusted users and machines through X.509 certificates—digital passports that could be shared and extended, but eventually grew unwieldy as the grid expanded. A landmark shift is underway: Fermilab has pivoted from those bulky certificates to a token-based system built on JSON Web Tokens, or JWTs, and a constellation of modern security tools. It’s not just a swap of credentials; it’s a reimagining of how a sprawling scientific enterprise proves its legitimacy every day. The work comes from Fermilab, a U.S. Department of Energy national laboratory, and it’s led by scientists including Dave Dykstra and Mine Altunay, with a larger team shaping the transition across experiments, storage, and data workflows.
In practical terms, this is about making a distributed sci‑grid safer, faster, and more automated. JWTs are compact, verifiable tokens that confirm who you are and what you’re allowed to do, without the overhead of traditional certificates. They’re designed to be fine‑grained, so a user or a robot account can be granted precisely what it needs—no more, no less. For a grid that runs thousands of jobs a day across dozens of experiments, that precision matters as much as speed. The Fermilab project is not just a theoretical exercise; it’s a full production shift, touching every corner of the grid—from the cerebrospinal-fluid‑fast data flow in dCache to the job orchestration in HTCondor and the automated distribution of code via RCDS.
And yet this is also a human story. The transition required rethinking workflows that had grown accustomed to a certificate-centric mindset. It demanded new tools, new governance, and new partnerships with token issuers, identity providers, and storage systems. The Fermilab team didn’t simply install a new login system; they rewired the grid’s trust fabric so that tokens are minted, rotated, and refreshed in a way that scales with the research calendar. This is science infrastructure as a living system—secure, adaptable, and built to evolve as experiments demand more compute, more data, and more collaboration across institutions.
From bulky certificates to agile tokens
The shift away from X.509 toward JWTs marks a practical turn in the way a grid proves access. JWTs are verifiable offline, which means they can be checked without always pinging a distant authority—a crucial capability when you’re moving bits and jobs across continents, sometimes with network hiccups. In Fermilab’s world, tokens aren’t a single passport but a suite of credentials that can be tailored to a task. A user might need one level of access to run a job on a local storage node, and a different, more restricted scope to access remote data repositories. JWTs make that granularity not only possible but manageable at scale.
Behind this simplification sits a web of moving parts that the paper lays out with a quietly audacious clarity. A central registry, called FERRY, keeps track of who is allowed to do what and translates that knowledge into the token issuer’s language. The token issuer in question is CILogon, which interfaces with identity providers to generate the actual JWTs. The design goal was to decouple “who you are” from “where you’re going,” while keeping everything auditable and revocable. If a collaborator shifts roles or leaves a project, the system can reflect those changes promptly—an essential feature when you’re coordinating thousands of researchers and machines across multiple experiments.
But tokens don’t grant unbounded power. The Fermilab team integrated a layered lifecyle: short-lived access tokens (about three hours) that travel with running jobs, supported by long-lived refresh tokens stored in a vault. The difference between a token and a credential is the difference between a quick trip and a long stay: access tokens are constantly refreshed, while refresh tokens are guarded in a security-first vault. This architectural choice is the heartbeat of the upgrade—short tokens to minimize the damage if something goes wrong, with a secure, recoverable pathway to obtain fresh tokens when needed.
The architecture behind token trust
Central to the transition is the vault-based system that stores refresh credentials and mediates token issuance. HashiCorp Vault becomes the quiet engine that makes token rotation reliable without exposing sensitive secrets. The Fermilab project didn’t just adopt Vault; it extended it with fying components like htvault-config, a set of scripts and configurations that tailor Vault for grid use. This pairing—Vault plus a Fermilab‑custom configuration—lets operators manage credentials in a controlled, auditable way while enabling automated workflows to keep running without human intervention.
To keep tokens usable inside automated workloads, Fermilab built a command-line companion called htgettoken. It automates the web-based authentication flows, fetches high-security refresh tokens once a user signs in, and then translates those into short-lived access tokens that running jobs can consume. The system is designed for everyday lab life: researchers don’t want to babysit credentials every time a job starts; they want seamless, secure access that just works. htgettoken, together with the vault, ensures that even unattended, long-running workflows can stay authenticated without compromising security.
HTCondor—the backbone of Fermilab’s grid scheduling—receives a dedicated integration layer that handles token storage and refresh. A component called condor-credmon-vault becomes the glue: it stores access tokens in the Condor credential daemon and triggers refreshes as needed. There’s a careful dance here: the vault hands out tokens with the correct scopes, but the actual workstation where a job runs only holds the token needed for that job’s lifetime. Importantly, the system also supports downgrading tokens to weaker scopes or tighter audiences when the situation calls for it, a flexibility that matters when the same infrastructure supports both sensitive experiments and more permissive data exploration chores.
The architecture doesn’t stop at substitution. The team redesigned jobsub, Fermilab’s job submission tool, to be a lightweight wrapper around HTCondor so it can piggyback on token handling from the ground up. GlideinWMS, the grid’s pilot job manager, was updated to manage multiple credentials across different experiments and sites. And on the storage and data side, dCache was reconfigured to accept token-based authentication in place of proxy certificates. This isn’t a patch; it’s a re-architecting of how trust flows through the grid, from the user’s terminal to the far corners of the distributed data fabric.
What this unlocks for science—and beyond
The shift to token-based authentication isn’t merely a security upgrade; it’s a practical enabler of scale. With JWTs and a managed token ecosystem, Fermilab’s grid can handle more complex, multi-experiment workflows without collapsing under the administrative overhead of certificate management. Automated job submissions—what the paper calls “robot” workflows—become more predictable and auditable. A Managed Tokens service was created precisely so that unattended processes could stay refreshed without exposing credentials across many machines. Operators can add new robots and protocols, and the system can renew, propagate, and revoke credentials in a centralized, controlled fashion. In other words, you get more automation without introducing new attack surfaces.
Security-by-design, in this setup, is not about locking down every path with a digital padlock; it’s about shrinking the surface that matters most: the tokens themselves. Access tokens are short-lived so even if they’re stolen, the window to misuse them is narrow. The refresh tokens live behind Vault, guarded by strong authentication, and the system is designed to revoke or adjust tokens quickly if an experiment’s access policy changes. The end result is a grid that’s both more secure and more adaptable—precisely what a modern science enterprise needs as data volumes skyrocket and collaborations cross borders and time zones.
Another layer of impact is openness and collaboration. Much of the software enabling this transition is open source, and Fermilab’s work has contributed to the broader ecosystem—while recognizing that some commercial components (like Vault) have licensing constraints. The authors acknowledge a path forward through open forks and community-driven improvements, echoing a broader trend in big science: the best ideas often leak out of labs into the wider software world, where they can be refined and repurposed by researchers everywhere. The project demonstrates that security engineering for science isn’t a luxury; it’s a critical driver of reliability, reproducibility, and speed in discovery.
What makes this story especially compelling is not just the technical cleverness but the cultural shift it signals inside large, international research collaborations. Tokens and automated token refresh alter who can submit jobs, what those jobs can access, and how quickly the results can be produced. It’s a governance model as much as a software one: it requires explicit definitions of roles, experiments, and data access scopes, all codified in a machine-readable form. The result isn’t just safer computing; it’s more deliberate collaboration—faster onboarding of new experiments, clearer data stewardship, and a more auditable history of how results were generated. When the grid is the lifeblood of physics—from the raw detector streams to the models that interpret them—this is the kind of modernization that can actually bend the curve of scientific progress.
Looking ahead, Fermilab’s path hints at a broader migration in large-scale research infrastructure. The token-based approach has already inspired discussion about open standards, better cross-site interoperability, and even lessons for industries that run their own distributed compute networks. The paper notes the potential for open-source forks around the Vault ecosystem, signaling a readiness to share the burdens and the breakthroughs. If the scientific community embraces this model, we may see more labs adopting token-based security not as a niche enhancement but as a standard enabling safer, more agile collaboration across institutions, disciplines, and borders. In that sense, Fermilab’s experiment with tokens is less about technology alone and more about building a resilient trust fabric for science in the 21st century.
In sum, Fermilab’s transition to token authentication is a quiet revolution under the hood of modern physics research. It shows how a well-orchestrated constellation of tools—FERRY, CILogon, Vault, htgettoken, HTCondor, GlideinWMS, and beyond—can redesign security for a grid that stretches across continents. It’s a reminder that the most transformative upgrades in science aren’t always about the loudest breakthroughs in the lab, but about the everyday reliability of the systems that run the experiments that change our understanding of the universe. And as the authors—led by Dykstra and Altunay—conclude, the work is ongoing, the lessons are being absorbed, and the grid is only going to get faster, safer, and smarter as token-based authentication becomes the new normal for big science.