Will AI's hidden rules finally reveal their thinking

Table of Contents

In a quiet corner of the AI world, a team from KU Leuven, IBM Research Zurich, and Università della Svizzera Italiana in Lugano has built something that reads like a mind-map for machines. The study, led by David Debot, Pietro Barbiero, Gabriele Dominici, and Giuseppe Marra, proposes a neural model that not only predicts answers but also shows its working in human-friendly terms. It matters because it moves interpretability from being a postscript to being the plot twist, visible from the first moment a prediction is made.

Traditional interpretable AI often explains decisions after the fact, or it peels back one layer of the onion at a time. The Hierarchical Concept Memory Reasoner, or H-CMR, treats concepts and tasks as parts of a shared web of ideas. It builds a tiny, learnable graph where each node is a concept, and edges are rules that describe how concepts relate to one another. The model uses neural attention to pick the most relevant rules for a given input, then applies those rules in a transparent, hierarchical way. The result is a system that can be evaluated not just by its final answer, but by the logical steps it used to get there.

What makes this particularly striking is the way the authors foreground human interaction. H-CMR is designed for three kinds of human-AI collaboration: (1) concept interventions at inference time, where a human can correct a concept and watch how its correction propagates through the graph; (2) model interventions at training time, where experts can inject or prune rules to align the model with domain knowledge; and (3) global interpretability, which means the entire rule memory can be inspected and verified against constraints. It’s a shift from “explainability on demand” to “reasoning that can be read and edited like a small, symbolic notebook.”

The paper is a collaborative product of prominent research ecosystems. KU Leuven’s computer science department anchors the work, with IBM Research Zurich and Università della Svizzera Italiana providing the cross-institutional perspective. The authors emphasize that H-CMR remains a universal binary classifier, able to match the accuracy of deep neural nets while delivering interpretable reasoning for both concepts and the downstream task. In other words, you don’t have to surrender performance to get transparency, and you don’t have to surrender interpretability to chase accuracy.

Two quick anchors help orient the novelty. First, H-CMR encodes a directed acyclic graph over concepts, learned from data, where each non-source concept is predicted via symbolic logic rules that reference other concepts. Second, the model’s memory stores multiple rules for every concept, and a neural attention mechanism selects which rule to apply for a given input. Once a rule is selected, the rest of the inference is a theoretically clean, human-readable sequence of logical checks. That combination—learned graph structure plus human-interpretable, rule-based execution—aims to fuse the strengths of deep learning with the clarity of symbolic reasoning.

A new kind of thinking machine

Imagine a machine that builds its own family tree as it learns. The nodes are concepts like “bird color,” “wing pattern,” or “digit appearing on a screen,” and the edges encode how those concepts relate. H-CMR doesn’t just guess a concept or a task; it tells you, input by input, which rules it used to decide that concept. It uses an encoder to predict a handful of source concepts directly from the input, plus a compact embedding that captures extra context. Then a decoder steps through a hierarchy: for each non-source concept, it selects a rule from a learned memory and applies it to the parent concepts to predict the child concept. The rules themselves are symbolic, built as logic like C3 is true if C1 and C2 are true, or C3 is true if C1 is false.

The trick lies in the rule memory. Each concept has its own set of candidate rules, stored as rule embeddings. A neural network serves as the selector, deciding which rule to apply for the current input. But there’s a crucial guardrail: the graph must be acyclic. The researchers enforce a topological order using a learnable node-priority vector, so you never end up predicting a concept from itself or creating cycles that would undermine interpretability. In practice, this means the model can evolve a DAG that reflects plausible, hierarchical dependencies among concepts, and it can be pruned or reshaped if a domain expert says, say, that some concept should be a source rather than a child.

As with many neurosymbolic approaches, the payoff isn’t just a pretty explanation. It’s a framework where you can watch the model reason. The paper’s diagrams show simple but telling examples: a high-level concept is inferred from two parents; another rule might infer a concept from the negation of a parent. The decoder’s rule selector, powered by attention, balances competing rules and contextual information encoded in the embedding. The result is a layered reasoning chain that feels almost quasicalcified, yet remains transparent. And because the memory stores multiple rules, the same concept can be explained in several equivalent ways depending on the input, a feature that supports robust explanations and potential corrections from human users.

Highlight: the model’s interpretability is not a veneer but a core mechanism. For each input, the logic rules that produced each concept and the final decision are visible and inspectable, not buried in a hidden layer.

Why interpretability matters

The practical appeal of H-CMR isn’t just scientific polish; it’s a pathway to safer, more collaborative AI. When a model can show its reasoning, a user can challenge it, test its assumptions, or inject domain expertise directly into the reasoning process. The paper emphasizes three forms of interaction that could reshape human-AI collaboration.

First, concept interventions at inference time let a human correct a mistaken concept and observe how the correction ripples through the hierarchy. Because the graph encodes dependencies, fixing one node can nudge many downstream predictions, sometimes dramatically. This isn’t merely a patch; it’s a way to debug a model by tracing the chain from a simple fix to its network of consequences. The authors compare this to how a clinician might revise a diagnosis by revisiting underlying indicators rather than just the final verdict, and show that H-CMR’s corrections propagate in meaningful, sometimes far-reaching, ways.

Second, model interventions at training time invite experts to edit the graph and its rules. The human can insert, delete or constrain relationships, potentially injecting substantial background knowledge into the model. In the MNIST-Addition experiments, for example, background rules about how digits relate to sums could be supplied to improve data efficiency when supervision is partial. In real-world terms, a domain expert—say, a radiologist, a bird-watcher, or a quality-control engineer—could imprint helpful priors into the model’s reasoning, reducing the data burden and guiding learning toward more credible structures.

Third, the interpretability of the rule memory opens up formal verification opportunities. Since the reasoning is expressed as propositional logic over a learned DAG, it can be checked against constraints such as “if black wings then not white wings” or other domain-specific invariants. That kind of verifiability is a rare luxury in neural models, where hidden-layer interactions often evade straightforward checks. In a world where AI systems touch more critical decisions, being able to audit the logic behind a prediction is not a luxury—it’s a prerequisite for public trust.

Highlight: H-CMR’s design makes human intervention practical, scalable, and meaningful, not as an afterthought but as part of the model’s fabric.

What this could mean for the future

The paper’s authors frame H-CMR as a universal binary classifier with strong interpretability for both concepts and tasks, provided each concept has at least two rules in memory. That’s a technical condition, but the practical takeaway is broader: you can tailor a model to a domain by choosing the right set of rules and letting the graph organize itself around them. This gives researchers and practitioners a flexible tool to combine data-driven learning with human knowledge, without sacrificing accuracy.

One of the most compelling implications is data efficiency. In their experiments, model interventions—where experts supply background rules—improve data efficiency in low-supervision regimes. That suggests a future in which AI systems can be deployed with less annotated data when domain expertise is available, a big win for fields where labeling is expensive or scarce. It also hints at a new kind of “interactive AI,” where experts shape a model’s reasoning not just its outputs. The authors even show that H-CMR can operate in a setting closer to neurosymbolic AI, where some concepts are fully supervised and others learn through background rules, with gradients flowing back to the sources to improve learning signals.

The universality claim—H-CMR’s ability to achieve high task accuracy across different concept sets—addresses a long-standing criticism of many CBMs: the final task performance can tank if your chosen concepts don’t capture the right signals. By tying task predictions to a manipulable graph of concepts, H-CMR maintains robust performance even as the concept set shifts. That’s a meaningful leap toward adaptable AI that can reconfigure itself to new domains, new data collections, or shifting human priorities without a costly redesign.

And what about the social and ethical stakes? The authors’ emphasis on transparency and verifiability aligns with growing calls for accountable AI. If a system’s reasoning can be inspected, challenged, and adjusted by people who understand the context, it becomes harder for AI to operate as a theoretical black box behind opaque guarantees. Of course, transparency is not a panacea; the quality of the rules and the data matters, and there are always trade-offs between interpretability, latency, and scalability. But the study offers a concrete blueprint for moving toward AI that is both capable and comprehensible, a combination many researchers have pursued for years.

Highlight: the work hints at a future where AI can be guided by human expertise with minimal friction, making models more data-efficient and more trustworthy without sacrificing performance.

Ultimately, H-CMR doesn’t pretend to solve every AI mystery. It doesn’t claim to infer causality or to replace all black-box methods with symbolic logic. What it does do is present a coherent, auditable, and scalable way for machines to reason in terms humans can follow. It offers a roadmap for turning the often opaque inner life of a neural net into a navigable, adjustable, human-friendly narrative. If the roadmap holds up in broader, messier real-world settings, we could be looking at AI systems that explain themselves not as a sidebar, but as the main text—the kind of thinking partner that can be read, questioned, and improved in collaboration with people who know the domain intimately.

As the paper’s authors note, their goal is not to claim causality but to reveal the chain of reasoning the model uses. That distinction matters. It’s about cognitive transparency—the sense that the model’s mind is not an inscrutable black box but a map you can read, critique, and, when needed, edit. In an era where AI decisions touch more facets of daily life, that clarity could be the difference between a tool we trust and a tool we fear.

In the end, H-CMR is a bold experiment in making the invisible mechanics of AI visible and usable. It’s a reminder that the best kind of intelligence—whether human or machine—is the kind that can be explained, examined, and improved through conversation, not just performance metrics. The study’s authors—David Debot, Pietro Barbiero, Gabriele Dominici, and Giuseppe Marra—have given us a new model of collaboration between minds and machines, one that treats thinking as a shared process rather than a secret kept behind a curtain.

Institutions behind this exploration—the Department of Computer Science at KU Leuven, IBM Research in Zurich, and Università della Svizzera Italiana—are betting that the future of AI lies not just in deeper networks but in clearer minds. If H-CMR is any guide, that future could be one where AI not only acts smart but thinks in a way humans can follow, question, and refine together.

Breast screening gaps mapped by data, not guesswork

Hidden Black Holes Shape the X-ray Sky’s Glow

Gaia unearths hidden dwarf carbon stars across the sky

Does a Warped Disk Hide a Black Hole’s Spin?

The Quiet Guardrails Keeping Self Driving Code Portable

Do Singular Matrices Harbor a Hidden Rule?

Will AI’s hidden rules finally reveal their thinking

A new kind of thinking machine

Why interpretability matters

What this could mean for the future

A new kind of thinking machine

Why interpretability matters

What this could mean for the future

Related News