Proofs Train Themselves to Refactor with Hidden Tactics

In the realm of formal mathematics, computers prove theorems by following carefully crafted steps called tactics. When proofs grow long and tangled, researchers imagine a smarter way to reuse patterns—the software-library mindset applied to reasoning itself. The idea sounds almost like teaching a machine not just to solve a single puzzle, but to learn how to solve many puzzles by recognizing shared tricks.

A team from The University of Texas at Austin, Stanford University, and New York University has done just that. They introduce a new lens on interactive theorem proving that treats proofs as living networks of decisions, not rigid scripts. Their approach revolves around tactic dependence graphs, or TDGs, which map the logical relationships between proof steps while sweeping away the cosmetic details that often clutter real proofs. The work is led by Yutong Xin and Jimmy Xin at UT Austin, with collaborators Gabriel Poesia and Noah Goodman from Stanford and Qiaochu Chen from NYU, alongside UT Austin’s Isil Dillig. Their goal is twofold: to make proofs more modular through refactoring, and to accelerate automation by discovering reusable, higher-level tactics.

Think of it as moving from line-by-line instruction books to a cookbook of proven cooking tricks. Instead of copying a recipe’s exact order and ingredients for every dish, you identify a handful of high-level techniques—sauteing, deglazing, emulsifying—that crop up across many recipes. TacMiner, the tool built around TDGs, scans large proof collections to identify these recurring patterns, packages them as new tactics, and then tests how well they compress (i.e., simplify) the entire corpus of proofs. The result is not just a shorter proof here and there, but a library of reusable strategies that can be applied across many proofs in the same domain. It’s an approach that could shift how researchers approach formal verification, making proofs both smaller and more portable across projects.

Tactic dependence graphs reveal proof patterns

To grasp the innovation, imagine a proof as a chain of decisions. Each link is a tactic application—an operation like introducing a hypothesis, rewriting with a fact, or decomposing a goal. But proofs aren’t just sequences of operations; they’re about how one step feeds into the next. The authors capture this essence in a tactic dependence graph (TDG): nodes are tactic invocations, and edges encode proof-state dependencies between them. An edge labeled with (α, β) means the formal output of one tactic becomes the formal input of the next. In practice, two proofs that look different on the surface—different names for subgoals, a different ordering of steps—can share the same semantic backbone when viewed through their TDGs.

Consider two compact Rocq proofs of related statements. Even though the syntactic details diverge, their TDGs align, exposing the same underlying dependency structure. That alignment is not obvious just by reading the proofs as text. TDGs strip away superficial differences—like how sub-goals are named or the exact sequence of low-level tactics—and highlight the essential choreography of proof reasoning. This is what enables refactoring: if a common subgraph appears across proofs, it suggests a high-level tactic that, when invoked, would replace a sequence of steps with a single, reusable move. The authors formalize this idea through the concept of collapsible embeddings, which guarantee that a common subgraph can be translated into a valid tactic and applied without breaking the proof’s correctness.

Beyond refactoring, the TDG abstraction opens the door to discovery. By scanning a corpus of proofs in the same domain, the system searches for isomorphic subgraphs that recur across proofs. Those recurring motifs become candidate tactics. The more often a motif appears—and the more compact the resulting tactic—the more valuable it is considered, because it promises broad applicability and a payoff in proof length. In short, TDGs turn the quest for reusable proof strategies into a graph-mining problem: find the densest, collapsible subgraphs across many proofs, and you’ve found the seeds of a tactic library.

TacMiner learns a library by reading proofs

The paper’s centerpiece is TacMiner, a tool that builds TDGs for a large corpus of Rocq proofs and then learns a graph grammar that governs how new tactics can connect into existing proofs. The learning loop is elegant in its rhythm. Start with a language-agnostic grammar of tactics inferred from the data. Create a worklist of simple, single-node TDGs (the most basic building blocks), and then iteratively expand them by attaching new nodes and edges according to the learned grammar. Each candidate tactic is evaluated on how well it compresses the corpus—how many proofs can be rewritten more succinctly using it, and by how much does the total proof size shrink.

Two ideas anchor the search. First, a tactic should be general enough to apply widely, but not so large that it becomes unwieldy. The authors measure this balance with an objective called compression power, roughly the product of how many proofs the tactic can influence and how much size it saves when it does. Second, to keep the search tractable, TacMiner derives an upper bound on how effective a candidate could possibly be before spending time exploring it fully. This pruning prevents the algorithm from chasing unlikely leads, which is crucial when the space of possible tactics explodes as proofs and tactics multiply.

The implementation, called TacMiner, targets Rocq proofs and works with Ltac-like custom tactics. The authors apply their method to a broad suite of formal developments, including CompCert (a formally verified compiler), program-logics, and arithmetic libraries. They compare TacMiner to a prior anti-unification baseline named Peano. The results are striking: TacMiner learns about three times as many tactics as Peano, and those tactics make proofs significantly shorter—on average, about 26% smaller across benchmarks, with some domains shrinking to roughly two-thirds of their original size. More impressively, TacMiner-backed tactics boost automated proof success rates in downstream tools. In one case, Copra, an LLM-informed proof assistant, increases its theorem-proving success by about 172% when equipped with TacMiner’s repertoire versus the baseline. That’s not just clever engineering; it’s a meaningful acceleration of automated reasoning on real-world formal tasks.

Several concrete observations anchor the value of this approach. First, the TDG abstraction proves robust across syntactic variety. Two proofs that diverge in verbage and order can still reveal the same semantic spine when viewed through the TDG lens. Second, the learned tactics tend to be substantial in size relative to the original steps they replace, which is important for practical use: larger tactics can skip many steps, accelerating proof discovery and maintenance. Third, TacMiner demonstrates a path toward curriculum-like learning for formal reasoning: once a moderate library of tactics is established, those tactics can underpin more advanced proofs and serve as the scaffolding for future automation. These are not merely incremental gains; they hint at a new pattern for growing formal libraries that mirrors how human programmers accumulate reusable abstractions over time.

Why this matters for automated reasoning

Why should a population of formal proofs care about tactics that live as subgraphs in a graph? Because the biggest bottleneck in automated theorem proving is not the existence of facts or theorems, but the too-slow human and machine effort to assemble proofs from low-level steps. High-level tactics—compact, reusable strategies—act like well-tuned functions in a programming language. They encapsulate domain knowledge (a specific way to manage subgoals, reorganize a proof, or perform a domain-specific transformation) and let proof systems operate at a higher level of abstraction. TacMiner gives proof engineers a method to mine and curate these abstractions from existing work, which could dramatically reduce the time and expertise required to produce reliable formal proofs at scale.

In practice, TacMiner’s approach harmonizes with modern proof automation tools that now lean on large language models and data-driven guidance. The study’s Copra experiment—where Copra’s success rate climbs from 22% with only built-in tactics to 60% when helped by TacMiner’s library—offers a compelling proof of concept. It shows that when an AI agent is endowed with domain-specific, human-curated abstractions, its reasoning can be more focused, more efficient, and less brittle. The implication is not that AI will replace human proof engineers overnight, but that collaboration between learned abstractions and AI-driven search can dramatically accelerate formal verification—especially for large, safety-critical systems where proofs must be both correct and maintainable over time.

And the potential reach goes beyond a single project. The CompCert compiler alone is a colossal verification effort; reducing proof size and increasing automation directly translates into shorter verification cycles, faster iteration, and more robust guarantees. If TDG-based tactic libraries scale to broader communities—across Coq, Lean, and other interactive theorem provers—the payoff could extend to industry-grade software verification, formal methods in safety-critical domains, and even education, where students learn proof strategies by reusing high-level tactics instead of reinventing the wheel for every problem.

The surprise: patterns over syntax

One of the most striking themes in the TacMiner work is the shift from syntactic pattern-finding to semantic pattern-finding. Traditional library learning often treats programs or proofs as sequences of concrete steps, emphasizing surface-level repetition. TacMiner, by contrast, looks for deep, structural regularities in the way proofs connect their pieces. The TDG captures how a proof’s subgoals and hypotheses feed into subsequent steps, not the exact wording of each tactic. This distinction matters because two proofs that look unlike each other can still share a powerful, reusable strategy hidden in their dependency graph.

That semantic focus has practical consequences. It explains why TacMiner can borrow a tactic that refactors a class of proofs across multiple domains, rather than producing a generic, brittle gadget that only works in one place. The method’s collapsibility condition ensures that when a subgraph is replaced by a single tactic call, the rest of the proof remains valid. In other words, the learned tactic is not just a shortcut for one proof; it’s a semantically safe abstraction that respects the logical dependencies of the entire argument. The result is a form of proof refactoring that mirrors how software engineers refactor code: you replace a repetition with a well-scoped abstraction, and the rest of the system continues to behave exactly as before—only more neatly and predictably.

Another surprise is the data efficiency. The researchers show that even with a handful of proofs, the method can discover meaningful tactics that compress proofs on unseen data. This bodes well for communities where proof corpora may be smaller or more specialized, suggesting that the approach can bootstrap useful abstractions with limited material. It’s a reminder that in reasoning, as in culture, powerful ideas often ride on a few key patterns rather than on mass exposure to a million examples.

Future directions and how this could reshape proof practice

Where does this lead, exactly? The authors sketch a roadmap of practical and methodological extensions. On the practical side, they envision applying the TDG framework to more sophisticated tactic languages and to other proof systems beyond Rocq. If researchers can generalize the TDG abstraction to Lean, Coq, or similar environments, the same core idea could unleash broad cross-pollination of tactics: a shared library of high-level proof strategies that can be discovered, refined, and deployed across communities. That would be a meaningful leap toward true proof portability—the ability to port a tactic library from one project to another with minimal friction, preserving correctness and reducing maintenance costs.

Methodologically, the work invites richer interactions between human experts, search-based synthesis, and AI agents. Imagine proof engineers guiding the search with domain knowledge while AI systems propose novel, high-level tactics inferred from TDGs. The authors’ experiment with Copra shows the promise of such partnerships: AI tools can perform better when they can lean on a curated library of robust abstractions rather than sifting through raw tactics in real time. The future could also see dynamic, curriculum-like growth of tactic libraries, where new abstractions emerge as proofs evolve and new libraries identify opportunities for refactoring across a project’s entire codebase.

Of course, there are caveats. TacMiner, as described, operates within a particular proof framework and language, using a specific form of tactic representation. Some proof patterns live in conditional or stateful zones of proof scripts that may resist clean abstraction, and the notion of collapsible embeddings may not always be satisfied in every mathematical domain. Yet the core ideas—the TDG abstraction, semantic compression, and graph-guided library synthesis—offer a versatile blueprint for how we might scale formal reasoning together with AI and human insight.

Takeaways: a new craft for formal reasoning

What makes TacMiner compelling is not a single headline result, but a shift in how we think about proofs. By treating tactics as first-class, reusable abstractions discovered from the way proofs actually unfold, the work turns proof engineering into a craft of building and recombining high-level strategies. It reuses ideas from software library learning, top-down synthesis, and graph-based program analysis, but adapts them to the peculiar logic of interactive theorem proving. The payoff is clear in the numbers: a threefold increase in learned tactics, a 26% shrinking of the proof corpus on average, and a dramatic jump in automated proving success when these tactics are supplied to a modern proof assistant.

Behind the numbers lies a broader promise: formal verification can become more scalable, resilient, and approachable. If researchers can cultivate a thriving TDG-driven ecosystem—where proof engineers contribute and refine tactic libraries just as software teams share and maintain libraries—we may see a future where formal proofs are not the province of a few specialists but a collaborative enterprise. A future where proofs are written not only to be correct, but to be modular, extensible, and reusable across domains—much like well-structured code.

In the end, TacMiner is not just about making proofs shorter. It’s about making the act of proving more human in its pace, more collaborative in its growth, and more confident in its reach. The TDG framework catches the quiet geometry of reasoning—the way ideas depend on one another and build toward a final result—then teaches machines to recognize and reuse that geometry. If the math of tomorrow looks more like a well-crafted library than a lone, heroic deduction, this work may be the kind of spark that helps light the path.