Rule systems have a certain quiet magic. They let computers reason about complex relationships—who trusts whom, who depends on whom, which paths connect a network, or how a policy chain flows through a maze of constraints. At their heart lies a deceptively simple question: can you express the connections as rules and then let a machine figure out the rest? That question becomes fiendishly hard when the relationships twist and loop back on themselves, creating transitive chains that can run infinitely deep. In such cases, the way you ask the question matters as much as the data you feed it.
Last spring, a team from Stony Brook University—Yanhong A. Liu, Scott D. Stoller, John Idogun, and Yi Tong—took a close look at three classic ways to answer these rule-based questions: query-driven, ground-and-solve, and fact-driven inference. They didn’t just compare which was faster on a few toy graphs. They built a precise mathematical map of how fast each method can be, for every major variant of the rules and every shape of input graph. And they tested that map against real-world rule engines that are already famous for performance: XSB for query-driven evaluation, clingo for ground-and-solve, and Soufflé for fact-driven evaluation. The result is both a guidebook and a stadium roar—a detailed, quantified argument about when to reach for which tool, and why the shape of the data can swing the outcome in surprising ways.
What makes this work feel timely is less the elegance of the theory and more its pragmatic clarity. In a world where data sprawls across systems and teams, where graph-like knowledge graphs, access-control policies, and program-analysis pipelines keep getting more intricate, knowing which reasoning engine is best suited to a given task can save time, money, and energy. The authors’ central achievement is to nail down exact optimal time complexities for a broad family of problems—namely, transitive-closure analyses—under a spectrum of rule variants and input graph types. They also run meticulous experiments that show how the theory lines up with real runtimes, from the heavy lifting of grounding to the final act of writing results. It’s the rare moment when a high-precision theoretical study feels genuinely useful for building better software today.
Three doors to reasoning with rules
The paper surveys three dominant families of rule-based inference, each with its own internal logic, strengths, and blind spots. The first is query-driven inference, sometimes called top-down evaluation. It starts from the query you care about and climbs through rules to see if it can derive the answer. The second is ground-and-solve, the hallmark of answer-set programming. Here you first instantiate all variables across the rules into a fixed, fully grounded set of facts, then search for solutions that satisfy all constraints. The third is fact-driven inference, or bottom-up evaluation, the classic Datalog approach. It begins with the known facts and repeatedly applies rules to generate new facts until nothing new falls out.
These three families aren’t just different knobs on the same machine. They embody distinct ways of organizing work. In query-driven systems like XSB, knowledge is pursued in a goal-oriented fashion, with a natural fit for applications where you want to pull out specific results without materializing everything up front. Ground-and-solve systems such as clingo lean into constraint solving. They can be spectacular when you need a global, optimized solution that respects a web of constraints. Fact-driven systems like Soufflé emphasize throughput and closed-form code generation, often producing standalone code that can run outside a solver environment entirely.
Crucially, each method has a different way of handling recursion and joins, which are the engines behind transitive analysis. Transitive closure—determining whether a path exists from one node to another through a chain of edges—serves as the canonical testbed. The rules are deceptively simple: path(X,Y) is true if there is a direct edge X->Y, and path(X,Y) is also true if there is an edge X->Z and a path from Z to Y. It’s a clean recipe for recursion, and it becomes a proving ground for how efficiently each inference style can chase down all possible paths of any length.
Precision that predicts performance
The core of the paper is a principled, exact accounting of how many rule “combinations” each method must consider to compute all possible path facts, under a variety of rule variants and input graph shapes. The authors don’t rely on rough benchmarks or intuition. They derive closed-form formulas for the number of combinations each rule variant would generate, across twelve distinct graph types ranging from complete graphs to cycles, grids, and trees. This level of precision matters because, in rule-based reasoning, the number of combinations is the driver of time complexity. More combinations usually mean more time, and sometimes it means completely different growth rates for different methods.
Two ideas from the paper stand out in how they reframed the problem. First, incrementalization with minimum increment provides a path to optimal time complexity. Instead of evaluating all potential rule firings in a naive, batch fashion, the fastest approach ensures that each meaningful combination is considered once, in constant time, with indexing to find the next candidate quickly. Second, the authors do not stop at the obvious suspects—left recursion, right recursion, and double recursion. They count, exactly, how many combinations arise under each variant for every graph type, confirming that for many graphs, left and right recursions behave similarly, while double recursion can be more expensive in practice, especially in semi-naive evaluation where you end up processing many facts repeatedly.
To ground their theory, the researchers ran exhaustive experiments with the three leading systems in their respective camps: XSB for query-driven, clingo for ground-and-solve, and Soufflé for fact-driven. They measured not only the core computation time, but also how long it took to load rules, read input data, ground rules, solve the search, and write out results. The numbers come from realistic hardware and well-controlled setups, and they cover a wide swath of graph shapes and sizes. The upshot is not a single “winner” but a map of where each system shines, depending on what you want to optimize.
One striking takeaway is the degree to which the shape of the input graph matters. For some graphs, all three recursion variants may yield the same theoretical complexity, while for others, the difference is dramatic. The authors highlight cases where the same problem yields different practical runtimes simply because of how the problem is encoded. In other words, performance can hinge on rule organization as much as on data size. This isn’t just a cautionary note for researchers. It’s a practical nudge for engineers building analysis pipelines: a small change in how you write the rules can ripple through the entire system’s efficiency.
What it means for real-world AI and knowledge graphs
Beyond the elegance of the math, the paper speaks to a world where data is increasingly interlinked and governed by rules. Think of knowledge graphs that power search and reasoning, or security policies that must be checked against dynamic constraints, or software analysis tools that reason about code dependencies and data flows. In all these domains, you want to reason about transitive relations—reachability, influence, causality, and ordering—at scale and with correctness guarantees. The authors’ exact complexity formulas and systematic comparisons provide a rare form of guidance: they tell you not just what might work well in theory, but what is likely to work best in practice given the kind of graph you’re dealing with and the kind of inference you trust to run.
The study makes a few clear, actionable conclusions for practitioners. If you don’t need standalone code or constraint solving, and you want raw speed for reachability queries, query-driven evaluation with tabling (as in XSB) tends to be the fastest across many typical graphs. If you do need strong constraint solving on top of your reasoning, and you’re willing to pay the ground-and-solve overhead, a modern ASP system like clingo can be the right fit—especially when its grounding is highly optimized. If your top priority is generating standalone, high-throughput code from rules—think embedded analysis pipelines or accelerated deployment—then a Datalog-based approach like Soufflé can shine, though certain recursive variants may incur extra cost unless the implementation is finely tuned for the particular pattern of recursion you’re using.
The authors are careful to note that no single system is a universal answer. In some scenarios you may want a hybrid approach, or a bespoke transformation that reorders rules to mimic a more efficient recursion, or to apply a targeted incrementalization pass before grounding or evaluation. This is more than a pedantic optimization. It’s a reminder that rule-based analysis sits at the intersection of theory and engineering: precise mathematics can illuminate where the practical bottlenecks lie, but realizing those insights often requires careful system design and sometimes custom tooling.
Where this work lands most directly is in the design of future rule engines and in the decision-making process about which tools to adopt for a given project. The authors emphasize a practical mapping: for standalone code and constraint solving, Soufflé can be excellent; for constraint-rich but code-light tasks, clingo shines; and when you want the fastest, most flexible query engine and you’re able to write a bit of scripting around it, XSB is often unbeatable. And if you truly need both standalone code and heavy reasoning, you may need a custom or hybrid solution that borrows tricks from all three worlds.
The study also serves as a reminder of how nuanced performance can be. It isn’t merely about asymptotic formulas or big-O notation. It’s about constants, data layout, indexing, and the subtle dance of recursion with joins. The researchers’ insistence on precise, closed-form complexity calculations and their validation against real runtimes is a hopeful sign that the field is moving toward a more predictive science of rule-based analysis. When you can predict where the bottlenecks will come from, you can design systems that avoid them, or at least anticipate the cost of different design choices before you ship code to production.
This work was conducted at Stony Brook University, led by Yanhong A. Liu and Scott D. Stoller, with contributions from John Idogun and Yi Tong. Their collaboration brings together decades of experience in logic programming, databases, and program analysis, and it shows in the clarity of the results as well as in the ambition of the questions asked. It’s the kind of research that feels like stepping back from a specific application to understand the skeleton of a family of problems—the transitive closures that undergird so much of modern computation—and then bringing that understanding to bear on the messy, data-heavy reality of today’s software systems.
Lead researchers and institution: The study was conducted by researchers at Stony Brook University—Yanhong A. Liu, Scott D. Stoller, John Idogun, and Yi Tong—with the paper noting their affiliations and roles. The work exemplifies how rigorous theory can translate into practical guidance for building smarter, faster rule-based analyses in a data-rich era.