In the world of computer science, some problems feel abstract until you realize they govern everyday systems—databases, search queries, and even the quirks of music generation. One such problem asks: how many words of a given length does a non-deterministic finite automaton (NFA) accept? It sounds like a tiny, technical corner of theory, but it sits at the crossroads of language, data, and how we reason about uncertainty in computation. This is the heart of #NFA counting, a problem so stubborn that even the best approximate solutions historically felt out of reach for real-world use. The reason is simple and humbling: counting is computationally hard in general, and previous fully polynomial randomized approximation schemes (FPRAS) tended to hide behind monstrous running times that made them impractical outside of a lab.
The new work by Kuldeep S. Meel (University of Toronto and Georgia Tech) and Alexis de Colnet (TU Wien) offers a different path. It doesn’t pretend to circumvent the deep complexity outright; instead, it designs a practical FPRAS with a much more friendly runtime, framing the problem through a fresh lens on randomness, sampling, and dependence. It’s a reminder that in algorithm design, the difference between theoretical possibility and practical viability often hinges on how creatively you reuse information, not just on how many samples you throw at a problem. The research is anchored in the real-world ambitions of probabilistic query evaluation and regular path queries—areas where counting paths through a graph translates to answering crucial questions about data—and the authors explicitly tie their work to those applications. The study is a collaborative effort emerging from the University of Toronto, Georgia Tech, and TU Wien, with Meel and de Colnet as the lead researchers named on the paper.
From theory to practice in counting words of NFAs
NFAs are a flexible way to model computations and patterns. If you imagine a machine that processes a binary string and can follow many potential paths at once, you’re imagining an NFA. The counting problem asks for |L_n(A)|, the number of length-n words accepted by the automaton A. The arithmetic is clean, but the landscape is treacherous: #NFA is #P-hard, and even though there are FPRAS algorithms, their raw runtime is so large that they feel academic rather than usable. The paper surveys the evolution of these ideas, from the early, slower schemes to the more recent improvements that still struggled to scale to real datasets or be implemented efficiently on modern hardware.
What changes with the new approach is not just a faster shuffle of constants, but a rethinking of how randomness is used. The authors introduce a template for a countNFA algorithm that works by unrolling the automaton into a layered, acyclic graph, Au, with n+1 levels. This unrolled structure makes the language L_n(A) accessible as a layer-by-layer object, where each layer holds copies of the original states and edges connect adjacent layers. The key insight is to estimate the size of the layer-specific languages |L(q_ℓ)| and to build sample sets S(q_ℓ) that capture words ending at each state q after ℓ steps. By walking bottom-up through the layers, they construct an estimate for the final count |L(A_u)|, where A_u is the unrolled automaton for length n.
In practical terms, evaluating whether a word belongs to a layer’s language—i.e., a membership check—takes time proportional to n times the number of states, m. That means a straightforward, literal implementation would still be tethered to a not-so-small polynomial. The breakthrough here is that the authors achieve a running time that is sub-quadratic in this membership-check cost: explicitly, O(n^2 m^3 log(nm) ε^-2 log(δ^-1)). In other words, even though each check is expensive, the overall scheme scales more gracefully than previous FPRAS constructions. It’s not magic; it’s a design that pinpoints where the heavy lifting occurs and reshapes the workflow around that insight so the expensive checks are not the bottleneck they used to be.
What makes the approach tick is a careful balance between sampling and dependence. The method relies on a bottom-up assembly of estimates, where each state q receives an estimate p(q) that approximates |L(q)|^-1. It then builds a family of sample sets S_r(q) and uses a median-of-means strategy to stabilize the final estimate. The surprising twist is that the authors explicitly allow, and importantly control, dependence among samples. Rather than forcing independence—which would force a combinatorial explosion of samples—they embrace a structured dependence coupling guided by the automaton’s derivation paths. The analysis hinges on understanding how runs through the unrolled DAG share prefixes and how this shared history affects the probability that two words end up in the same sample set. This is where a lot of the novelty lives: dependence is not a bug to be avoided but a feature to be bounded and exploited.
The authors tease apart a central technical construct they call derivation runs. Each word w in L(q) has a unique derivation run through Au, and the last common prefix state lcps(run(w,q), run(w′,q)) for another word w′ serves as a natural way to measure how the two words’ samples might overlap. This idea lets them bound the variance of their estimator and, crucially, shows that reuse of samples across layers can be both safe and advantageous if analyzed with the right tools. It’s a shift that feels almost like discovering that a crowd-sourced approach to problem-solving, if carefully controlled, can outpace a disciplined, “single-use” sampling strategy. The upshot is a practical algorithm whose core ideas are both elegant and surprisingly tangible.
The heart of the idea: reusing samples and embracing dependence
Historically, FPRAS methods for #NFA relied on generating fresh, independent samples at every step to avoid complicated correlations. The ACJR line of work, and subsequent improvements, showed the path forward was possible, but the price was high: a cascade of sampling and union-of-sets steps that multiplied runtime dramatically. The Meel–de Colnet approach flips the script. It embraces dependence across layers and uses a structured mechanism to propagate information upward through the unrolled graph. In effect, they turn sampling into a collaborative process: a word’s chances of appearing in a sample at layer ℓ depend on the samples at layer ℓ−1, but in a way that can be tightly quantified and bounded.
Why is this a big deal? Because independence typically requires a vast number of independent samples to achieve the same accuracy, especially when the goal is to approximate counts across a combinatorial space as large as the NFA’s. By allowing samples to be reused, the algorithm cuts down on redundant work and opens the door to practical implementations. The cost is a more delicate mathematical accounting for variance and bias, but the authors show a careful path to keeping the estimator reliable. Their framework leverages a self-reducibility property that expresses the language of length-ℓ words ending at a given state as a union of sublanguages corresponding to predecessor states. This recursive structure is exactly where sample reuse shines, and where the worst-case blow-ups in dependence can be tamed with the right probabilistic tools.
The practical takeaway is both philosophical and pragmatic: sometimes the most powerful way to tame a hard problem is not to chase perfect independence, but to choreograph dependence so that the pieces of the puzzle reinforce each other. The authors back this up with a thorough analysis of how overlap in derivation runs contributes to variance, and how one can keep this variance within bounds while still achieving a sub-quadratic overall runtime with respect to the expensive membership checks. It’s a telling reminder that in algorithms, the geometry of a problem—how many paths, how they share prefixes, how they branch—can be as important as raw counting power.
From theory to practice: caching, matrix tricks, and hardware potential
A critical piece of turning this theoretical advance into something usable is how the algorithm handles the heavy lifting of membership checks across unrolled layers. The authors propose sophisticated caching strategies that turn what would be a repetitive slog into a set of matrix-style computations. They describe two caching schemes that transform the problem of checking whether a word belongs to L(qℓ) into a sequence of matrix operations, a trick that plays nicely with modern CPUs and GPUs that excel at linear algebra and batch processing. The upshot is a design that can leverage vectorized computations and high-throughput hardware, which is how a practical implementation could scale to large automata and longer word lengths.
Another practical lever is the use of a median-of-means estimator, a robust statistical trick that helps keep the final estimate tight despite the heavy dependence structure. The authors lay out how to structure the estimation into batches, compute batch averages, and take a robust median, all while acknowledging that their samples are not perfectly independent. The careful probabilistic analysis then shows how these choices translate into rigorous guarantees: with high probability, the returned estimate falls within (1±ε) of the true count, for any desired confidence δ>0.
All of this is not just theoretical ornament. The work maps cleanly to applications in probabilistic query evaluation, where evaluating the likelihood of a SQL-like query against a probabilistic database can be reduced to counting paths in an NFA. The authors emphasize that a practical FPRAS for #NFA would have immediate implications there, and for counting answers to regular path queries in graphs—areas that sit at the core of how databases and graph analytics operate in the real world. In that sense, the research doesn’t just advance a theoretical boundary; it pins a potential new tool to be used by data engineers and researchers wrestling with large-scale, uncertain data.