When people imagine AI thinking, they often picture long, careful chains of thoughts sprawling across text. In practice, the most capable systems today are data-hungry monoliths that bend their will to huge corpora and massive compute, with reasoning often leaking out in the form of vague patterns rather than transparent steps. A new study from Sapient Intelligence in Singapore flips that script. It introduces a brain-inspired architecture that can dissect and pursue difficult problems with depth, all in a single forward pass and with surprisingly little data. The trick is not more data or bigger models, but thinking with a two-brain architecture that mirrors how our own cortex handles different tempos of processing.
The project, led by Guan Wang and Meng Lu at Sapient Intelligence, with collaborators across the team and contributing work from Sen Song at Tsinghua University, centers on what the authors call the Hierarchical Reasoning Model (HRM). Think of HRM as two intertwined minds inside a single machine: a high-level, slow, abstract planner that sketches strategy; and a low-level, fast, detail-oriented executor that carries out the actual calculations. The result, in the words of the researchers, is a model that attains remarkable computational depth without the training fragility or data hunger that plagues many large language models today. In other words: a machine that can think deeply, and efficiently, without needing to memorize a mountain of data first.
A two-brain architecture for reasoning
HRM centers on two recurrent modules that cooperate to solve tough tasks. The high-level module (H) governs slow, abstract planning, shaping what the model considers important and how it should approach the problem. The low-level module (L) chases the nitty-gritty details—tight, fast computations that execute the plan in real time. The two share the same backbone, but they play different roles. In each forward pass, HRM runs through N cycles, each consisting of T steps in the L-module. After the cycle, the H-module updates once and resets the context for the next cycle. In other words, the network builds a deep chain of reasoning by repeating cycles of careful planning and quick execution, nested inside one another.
The authors coin the way this unfolds as “hierarchical convergence.” The L-module gradually converges toward a local equilibrium within a cycle, given the current H-state. Only after those steps does the H-module adjust, injecting new guidance and steering the next wave of local refinements. This arrangement prevents the common problem in recurrent nets: a single fixed point forms too early, stalling the computation and capping the effective depth. Instead, HRM is designed to converge slowly and iteratively, with the high-level module periodically restarting the deeper search with a fresh context. It’s a bit like a seasoned chess player who keeps re-evaluating the board from a higher vantage point after each clever sequence of moves.
From a training standpoint, HRM breaks with the standard practice of backpropagation through time (BPTT). Instead, it uses a one-step gradient approximation that leverages fixed points of the recurrent components. The math is anchored in the idea of Deep Equilibrium Models: you can differentiate through a fixed point without unrolling every intermediate state. The payoff is a constant memory footprint and a training story that aligns more closely with how biological networks might learn, via local adjustments rather than sweeping credit assignment across long sequences. The result is a model that can learn deep reasoning with relatively modest data and without the crutch of explicit chain-of-thought transcripts.
Beyond the gradient trick, HRM adds two more ideas that feel almost instinctual in human problem-solving. Deep supervision exposes the model to intermediate targets at multiple segments of a forward pass, reinforcing learning without dragging gradients through every past step. And Adaptive Computation Time (ACT) lets the model decide how many segments to run: some problems demand quick, decisive thinking; others deserve longer, more careful contemplation. A tiny Q-learning head weighs whether to halt or continue, trading off compute for accuracy in a dynamic, task-sensitive way. Put simply, HRM tries to think, fast and slow, just like us, and it adjusts its effort on the fly.
All of this comes in a relatively small package: about 27 million parameters, trained from scratch with roughly 1,000 examples, and without any pretraining or external chain-of-thought supervision. The architecture is implemented with encoder blocks drawn from modern Transformer variants, but the magic lies in how the two modules interact, not in raw size. The team also constructs a simple, robust inference-time scaling story: you can push for more depth at test time by increasing the maximum computation budget, Mmax, without changing the training setup.
Why data efficiency matters in AI
The HRM story isn’t just about clever tricks; it’s about rethinking what “deep reasoning” might look like inside a machine. In an era dominated by enormous models trained on vast data, HRM asks whether you can achieve robust, long-horizon reasoning with far less. The results speak to that possibility. On a battery of challenging tasks that stress symbolic search, planning, and structured reasoning, HRM shines with a fraction of the data and a fraction of the parameters of many state-of-the-art CoT-enabled models. The authors emphasize that HRM achieves near-perfect performance on complex Sudoku puzzles and optimal pathfinding in large mazes, all without relying on pretraining or chain-of-thought data. In the language of the paper, a 27M-parameter HRM, trained on about 1,000 examples, outperforms much larger models that use longer context windows and CoT strategies on difficult benchmarks.
To show what counts as “difficult,” the authors lean on three benchmarks that stress different flavors of reasoning. Sudoku-Extreme is a 9×9 puzzle with a unique solution and a record of backtracking for hard instances; Maze-Hard tests optimal path finding on a 30×30 grid; ARC-AGI is a broader, inductive-reasoning challenge that seeks to measure flexible intelligence across tasks, beyond language. In all three cases, HRM delivers results that feel almost paradoxical: tiny, fast components orchestrating a surprisingly deep, principled search. In the ARC-AGI suite, the model trained from scratch with 1,000 examples and a modest parameter count achieves competitive performance against large, pre-trained systems that rely on CoT prompts and long context windows. In Sudoku-Extreme, HRM reaches accuracy levels that dwarfed the baselines that depended on chain-of-thought strategies, sometimes solving problems that stump much larger models.
The practical upshot is modest but meaningful: you don’t need to pile on more data to get deeper reasoning. If you can marshal your computational budget into a brain-like structure that separates planning from low-level execution, a smaller model can do a lot more with less—especially on tasks that require search, backtracking, and symbolic manipulation. The study’s own narrative calls this a step toward universal computation—an architecture that could, in principle, learn and execute a broader class of algorithms without bespoke training for each new task.
To ground this claim in numbers, consider the ARC-AGI results. HRM, trained from scratch with about 1,000 examples and a 30×30 context window, reaches around 40% accuracy on ARC-AGI-2, surpassing several larger CoT-based models with far longer contexts. On Sudoku-Extreme Full—the hardest subset compiled for rigorous testing—HRM achieves substantial accuracy, and on Maze-Hard its performance remains robust where many baselines crumble. The paper emphasizes the contrast with a direct prediction baseline that uses a Transformer of comparable size but without HRM’s hierarchical structure; the difference is not subtle. The two-level thinking apparatus tilts the balance toward solving rather than merely guessing through patterns in data.
What the experiments reveal about thinking machines
Beyond outcome numbers, the authors provide a striking look at what the internal dynamics of HRM resemble. A central finding is the emergence of a dimensionality hierarchy between the two modules. When the network is trained on diverse tasks, the high-level state zH settles into a much higher-dimensional space (a participation ratio, or PR, of around 90) than the low-level state zL (PR around 30). In the same sweep, increasing the variety of tasks causes zH to expand further while zL remains kind of steady. The picture is reminiscent of the brain: higher-order regions like the prefrontal cortex operate with richer, more flexible representations, while lower areas keep computations grounded in the immediate sensory-motor domain.
The authors push the metaphor further with a brain-facing comparison. In the mouse cortex, the PR tends to rise from sensory to higher association areas, signaling a similar dimensionality hierarchy linked to cognitive flexibility. HRM’s emergent hierarchy—zH being more high-dimensional and zL staying compact—appears to mirror this biological principle. If you’re tempted to anthropomorphize, you could say HRM’s high-level module acts like a head-in-the-sky planner, roaming a high-dimensional landscape of possibilities, while the low-level module stays nimble and task-focused, wiring up the concrete steps that realize the plan.
Crucially, this separation isn’t baked into the architecture alone. The authors show a convincing control: an identical network with random weights (an untrained HRM) doesn’t develop the same hierarchical dimensionality. The high- and low-level modules in the untrained model stay similar in their effective dimensionality. That suggests the hierarchy is an emergent property of training on diverse tasks, not a superficial feature of the design. The high-to-low PR ratio of roughly 3:1 is evocative of real cortex organization and hints at why HRM can adapt its strategy across different kinds of problems—something a single, monolithic transformer has a hard time doing.
The results carry a broader philosophical message: if you want machines to reason like humans, you may need to organize computation the way the brain does, across layers of processing and across timescales, rather than simply cranking up depth or width in a single monolithic stack. HRM isn’t claiming to replicate the brain, but it is showing a practical path where the architecture itself fosters the kind of flexible, multi-stage reasoning that has long defined human intelligence.
There are caveats, of course. The authors are careful to label their findings as correlational at this stage: the dimensionality hierarchy aligns with better performance, but proving causation—linking the high-level dimensionality directly to improved reasoning—will require careful intervention experiments that are not trivial to run in deep networks. And while the 1-step gradient and ACT mechanism deliver stronger training stability and inference-time scaling, they are still areas of active study. HRM’s prowess on curated benchmarks is compelling, but it doesn’t yet prove that a small, brain-like architecture can conquer every kind of real-world reasoning task. Still, the trajectory is provocative: a machine that can learn deep reasoning from a modest data diet, guided by a structure that mirrors the brain’s own multi-timescale choreography.
The authors note that the broader significance goes beyond proving a single trick works. HRM challenges a long-standing pattern in AI research—the emphasis on ever-larger, ever-data-hungry models trained with heavy supervision. If the promise holds, HRM-like designs could become a new category of AI that pairs computational depth with data efficiency, making powerful reasoning available on smaller scales and, potentially, on devices closer to the edge. It’s a reminder that the brain’s blueprint—hierarchical processing, temporal separation, and recurrent refinement—may still be the most durable blueprint for robust, flexible intelligence in machines. And if we can bring a little more of that brain-like depth into everyday AI systems, the future may bring reasoning that feels less brittle, more adaptive, and ultimately more useful in the messy, real world.
Lead researchers and institutions: The study is conducted by Sapient Intelligence in Singapore, led by Guan Wang and Meng Lu, with key contributions from Jin Li, Yuhao Sun, Xing Chen, Changling Liu, Yue Wu, and Yasin Abbasi Yadkori; Sen Song is listed with affiliations to Tsinghua University. The team notes that the work exemplifies how brain-inspired design can unlock deep reasoning with modest data, offering a new direction for future AI systems that aim to think like humans without requiring human-like volumes of training data.
In the end, HRM isn’t a finished product, but a bold demonstration that you can architect learning and thought in ways that feel closer to biology—where depth is not just a matter of stacking more layers, but of layering thinking across timescales and contexts. If the brain is the ultimate reasoning machine, HRM is a principled step toward building machines that can use that lesson to reason with depth, adaptivity, and elegance, even when the training data is scarce.