Supercomputers’ Secret Lives: Visualizing the Data That Runs Them

The Challenge of Visualizing Supercomputer Data

Imagine a bustling city, its streets teeming with vehicles representing computing jobs, each vying for access to limited resources. That’s the complex landscape of supercomputer queue data – a rich source of information but also a tangled web of variables and processes. Scientists, machine learning researchers, and system maintainers all need to understand this data, but each has vastly different questions and analytical approaches. This presents a huge challenge for visualization designers: how to create a single tool that caters to such diverse needs, while avoiding an overly complex, overwhelming interface?

A Persona-Based Approach

Researchers at the University of Utah’s Scientific Computing and Imaging Institute (SCI) and the National Renewable Energy Laboratory tackled this problem by embracing a novel approach: a persona-based design. Instead of focusing solely on discrete tasks, they crafted detailed profiles – personas – representing the different types of users who interact with supercomputer queue data. This involved extensive interviews and observations, allowing the researchers (Connor Scully-Allison, Kevin Menear, Kristin Potter, Andrew McNutt, Katherine E. Isaacs, and Dmitry Duplyakin) to understand not just what users do, but *why* they do it, their underlying goals, and their preferred ways of working. They identified three key personas: the HPC User (domain scientists running simulations), the Jobs Data Analyst (researchers studying system behavior), and the ML Researcher (those using machine learning to predict queue times).

Guidepost: A Multi-Persona Visualization

From this deep understanding of their users, the researchers designed Guidepost – an interactive visualization embedded directly within Jupyter notebooks, the popular environment favored by many data scientists. The genius of Guidepost lies in its ability to simultaneously offer a high-level overview of the data while seamlessly integrating with the user’s existing workflows. The core of the visualization is a configurable set of “pez plots,” which offer a flexible way to explore the relationships between different variables in the data, such as queue wait times, resource usage, and job characteristics.

For tasks shared across all user groups, such as comparing wait times across different queues, Guidepost provides intuitive, point-and-click interactions. But for tasks unique to a specific persona, such as in-depth statistical analysis of model predictions, Guidepost allows users to easily select subsets of data and export them directly to the Python environment of their notebook. This means that users are not locked into a single tool but can leverage their favorite scripting tools and libraries in conjunction with the visualization. The system acts as a dynamic ‘guidepost,’ directing users towards insights in a tailored and efficient manner.

Evaluation and Impact

The researchers evaluated Guidepost with nine expert analysts from various research institutions. The results were overwhelmingly positive. Participants were successful in completing a variety of tasks, using diverse strategies and demonstrating the flexibility of the tool. They effectively used all parts of the visualization, freely shifting between interactive exploration and code-based analysis. Moreover, many participants spontaneously opted to export data to perform deeper analyses, highlighting the seamless integration and the tool’s ability to support users beyond its core functionalities.

Beyond the Supercomputer: A Broader Vision

The success of Guidepost is more than just a technical achievement. It showcases a powerful approach to visualization design that can have wide-reaching implications. By understanding the diverse needs of different users, the researchers were able to create a tool that’s not only functional but also intuitive and empowering. This ‘persona-driven’ design methodology can be applied to many other domains where data visualization is critical. It reminds us that designing for people, not just for tasks, is key to creating truly impactful tools.

A New Standard for Multi-User Visualization

This research offers a significant advancement in how we design visualizations for complex datasets, especially in environments with multiple user groups. The Guidepost visualization, with its intelligent design, illustrates how to build a tool that’s powerful yet accessible, catering to diverse skill levels and analytical goals without sacrificing clarity. This methodology challenges the traditional, task-focused approach and encourages us to consider the broader context of user needs and workflows, ultimately leading to more effective and useful data visualization tools.