Workshops and Tutorials will be held on Monday, July 21
Workshops and tutorials are free of charge and included in your registration. We ask in registration that you select up to two half-day sessions OR on full day session. Note that some sessions may have limited space and preference will be given those who preregister.
Track, Panels, Plenaries, and BoFs will be held on Tuesday – Thursday, July 22 -22.
PEARC25 Full Day Tutorial (9:00 am - 5:00 pm)
Title & Abstract | Authors |
---|---|
Managing HPC Software Complexity with Spack Modern scientific software stacks consist of thousands of packages and rely on a wide range of technologies, including low-level libraries written in C, C++, and Fortran, as well as higher-level packages developed in interpreted languages like Python and R. Scientists often need to deploy these software stacks across diverse environments, from personal laptops to the world’s largest supercomputers, while adapting workflows to suit specific tasks. For example, developing a new feature in an application may require frequent rebuilds, quick execution of unit tests, or small-scale end-to-end tests. Debugging features are often enabled in such scenarios to facilitate rapid iteration. On the other hand, preparing an application for large-scale production runs on an HPC cluster demands a different approach. This typically involves leveraging complex, performance-critical libraries such as MPI, BLAS, and LAPACK, or integrating with vendor-provided libraries to maximize performance. In these cases, applications are often built with machine-specific optimizations to ensure efficiency. Managing these diverse requirements makes building and maintaining software stacks a significant challenge. The complexity of configuring software, resolving dependencies, and ensuring compatibility can hinder both development and deployment efforts. Spack is an open-source package manager designed to simplify the building, installation, customization, and sharing of HPC software stacks. It features a powerful and flexible dependency model, an intuitive Python-based syntax for writing package recipes, and a repository of over 8,300 packages maintained by a community of more than 1,400 contributors. Spack is widely used by individual researchers, developers, cloud platforms, and leading HPC centers worldwide. This tutorial introduces Spack’s core capabilities, including installing and authoring packages, integrating Spack into development workflows, and deploying optimized software on HPC systems. Attendees will gain foundational skills for automating routine tasks and acquire advanced knowledge to address complex use cases with Spack. This tutorial targets a broad audience, including users who simply want to install and run packages, developers who plan to author their own packages and automate their dependency manage- ment, and HPC facility staff who want to deploy large software stacks with packages and custom modules. Audience prerequisites: Attendeesshouldhavebasicfamiliaritywithcompilingandrunningprograms. Basic Python and shell skills are a plus. No knowledge of package managers or build systems is required. | Todd Gamblin, Gregory Becker and Alec Scott |
Programming and Profiling Modern Multicore Processors Modern processors, such as Intel’s Xeon Scalable line, AMD’s Genoa architecture, and ARM’s Grace design are scaling out rather than up and increasing in complexity. Because the base frequencies for the large core count chips hover somewhere between 2-3 GHz, researchers can no longer rely on frequency scaling to increase the performance of their applications. Instead, developers must learn to take advantage of the increasing core count per processor and learn how to extract more performance per core. To achieve good performance on modern processors, developers must write code amenable to vector- ization, be aware of memory access patterns to optimize cache usage and understand how to balance multi-process programming (MPI) with multi-threaded programming (OpenMP). This tutorial will cover serial and thread-parallel optimization including introductory and intermediate concepts of vectorization and multi-threaded programming principles. We will address CPU as well as GPU profiling techniques and tools and give a brief overview of modern HPC architectures. The tutorial will include hands-on exercises in parallel optimization, and profiling tools will be demonstrated on TACC systems. This tutorial is designed for intermediate programmers, familiar with OpenMP and MPI, who wish to learn how to program for performance on modern architectures. This tutorial is open to all attendees of PEARC25 conference. Particularly, this tutorial will target developers who are developing applications for mainstream HPC architectures but lack knowledge of all the deeper issues influencing performance, their interactions, and how to assess them. This tutorial will provide users with guidance in optimal usage of vector units, tasks, threads, and memory bandwidth. | Amit Ruhela, Matthew Cawood, Yinzhi Wang, Hanning Chen, and Zuzanna Jedlinska |
Reproducible ML Workflows and Deployments with Tapis This tutorial aims to provide researchers with a comprehensive introduction to the latest reproducible Machine Learning (ML) workflows and tools available through the NSF-funded Tapis v3 Application Programming Interface (API) and User Interface (UI). Through hands-on exercises, participants will gain experience in developing ML workflows and deploying them on Jetstream resources. Throughout the tutorial, we will focus on utilizing various Tapis core APIs, as well as more specialized APIs: the Tapis Workflows API and Tapis Pods API, all within the user-friendly TapisUI. These production-grade services are designed to simplify the creation and as facilitation of trustworthy, reproducible scientific workflows. This tutorial aims to empower researchers to efficiently develop, deploy, and maintain their own ML workflows. Additionally, it will introduce advanced topics in Tapis and ML, such as ML-Ops and benchmarks, by discussing the real-world applications and use cases. Target Audience: The audience for this workshop fits into three categories: • Researchers that utilize national, campus and local cyberinfrastructure resources and wish to do so in a reproducible, scalable and programmable manner. • Cyberinfrastructure specialists such as research software engineers (RSE), gateway providers/developers and infrastructure administrators. People in these roles can utilize open source technologies and state-of-the-art techniques to enable portable, reproducible computation. • Cyberinfrastructure directors, managers and facilitators that are looking for solutions to aid and educate their institutional researchers in order to better leverage local and distributed computational and cyberinfrastructure resources. Audience Prerequisites: Attendees must use their own laptop for the hands-on part of the tutorial. Attendees should have TACC accounts or use day-of training credentials. | Joe Stubbs, Anagha Jamthe, Nathan Freeman, Christian Garcia, Steve Black and Sean Cleveland |
PEARC25 Full Day Workshop (9:00 am - 5:00 pm)
Title & Abstract | Authors |
---|---|
2nd Workshop on Broadly Accessible Quantum Computing Building on last year’s success, the 2nd Workshop on Broadly Accessible Quantum Computing at PEARC25 will explore the latest advancements in quantum computing (QC) and its integration with high-performance computing (HPC) and related applications. This year’s edition expands discussions on practical applications, hybrid quantum-classical strategies, and funding opportunities. Through invited talks, panels, and community contributions, we will address workforce development, policy considerations, and strategies for making quantum resources more accessible. Designed for participants of all backgrounds, this workshop fosters collaboration and knowledge exchange to advance QC adoption in the broader research computing community. This workshop is designed for researchers, practitioners, decision-makers, and advanced cyberinfrastructure professionals who are generally interested in understanding the potential of quantum computing beyond its technical intricacies. No prior quantum computing experience is required, making it accessible to a diverse audience. Participants with a basic understanding of classical computing concepts—particularly in HPC, AI, and advanced cyberinfrastructure—will benefit from discussions on the latest advancements, applications, and integration strategies in quantum computing. | Bruno Abreu, Tommaso Macri, Santiago Nunez-Corrales and Yipeng Huang |
Building and Selling a Strategic Plan for your Research Computing and Data Program This workshop will bring together Research Computing and Data professionals to explore strategic planning practices and challenges, to learn how to identify the stakeholders crucial to realizing a strategic plan and how to use proven tools for influence mapping and crafting effective value propositions to win support among these stakeholders. The workshop will foster the establishment of peer mentoring relationships, and an active practice of leveraging these relationships to share leading practices around strategic planning. The workshop is open to RCD professionals who are familiar with issues around supporting Research Computing and Data and who want to build their skills for contributing to strategic planning that advances their program. The target audience for this workshop is campus Research Computing and Data professionals and leaders who are involved in or are exploring strategic planning for their programs, and are seeking more effective approaches to building support for their programs among key stakeholders. PEARC conferences have traditionally been an ideal venue for these sorts of discussions and engagement. | Patrick Schmitz, Dana Brunson, Lauren Michael, John Hicks and Timothy Middelkoop |
Collaborating Your Way to Sustainability (Focus Week@PEARC25) Digital projects—science gateways, data repositories, educational websites—deliver a great deal of value to users by widely sharing sophisticated tools, large data sets, or access to computing capabilities among those in the academic sector who really need them. However, they also share a common challenge: sustaining and scaling these projects in a way that ensures long-term growth and impact is notoriously difficult. This full-day, dynamic and exercise-based workshop offers training on sustainability strategies and practical tools to help those creating and maintaining gateways and other innovative projects, with a focus on understanding your audience and identifying useful partnerships. This workshop is ideally suited for participants who have built or are directly involved in creating or supporting innovative digital projects, such as science gateways, cyberinfrastructure, or other products and services. Teams joining us often find it very valuable to work through these exercises together. While our participants are tech-savvy, the content is not highly technical in nature. We welcome academic researchers, developers, students, administrators - anyone eager to learn more about how they can use collaboration and partnerships to strengthen the long-term prospects of their work. Even those without projects are welcome! We will be using hypotheticals for several exercises, so all can participate. | Claire Stirm, Nancy Maron, Juliana Casavan and Maytal Dahan |
PEARC25 Half Day AM Tutorials (9:00 am - 12:30 pm)
Title & Abstract | Authors |
---|---|
A Guideline to Writing a Successful Proposal for ACCESS and Other National Compute Resources Navigating the ever-changing national landscape of the national computing resources is difficult for any researcher may they be beginners and seasoned long-time users. During our long involvement in the review and allocation process of resources of the NAIRR, ACCESS, leadership-class programs and National Labs (Bridges-2, Expanse, Anvil, Delta, Frontera, Vista, etc) we have helped many scientists to succeed. Our submission at the PEARC 2025 conference that will address systematically the two most persistent problems that researchers face during the application process. Selecting the appropriate resource among the variety of choices offered and writing a successful application that translates a solid science project into a strong proposal ready to take on the competition. | Lars Koesterke and Ken Hackworth |
ChronoLog: Extreme-scale storage for activity and log workloads Data is generated at incredible rates (exceeding TB/S) that exceed the capacity of even the largest computing systems. This data explosion stems from the proliferation of modern sensors, scientific instruments (from microscopes to telescopes), Internet-of-Things (IoT) devices, and human activities such as web, mobile and edge computing, and others. Beyond simply storing data, one increasingly common trend is the need to store activity data, also known as log data, which describes things that happen rather than things that are. Activity data are generated not only by human actions, but also due to computer-generated actions, such as system monitoring, service call stack, error debugging, fault tolerance, replication techniques, and more. Many domains, including scientific applications, Internet companies, financial applications, and IoT are dependent on processing log data efficiently. This trend is further seen in modern architectures such as microservices, containers, and task-based computing. The volume, velocity, and variety of modern activity data requires new high performance logging methods. This tutorial will teach participants to deploy and use ChronoLog, a new scalable, high-performance distributed shared log store designed to handle the ever-growing volume, velocity, and variety of modern activity data. It is tailored for applications ranging from edge computing to high-performance computing (HPC) systems, offering a versatile solution for managing log data across diverse domains. ChronoLog allows users to organize data in Stories—a time-series data set that is composed of individual log events that are produced by clients running on various HPC nodes. ChronoLog’s highly scalable architecture, composed of ChronoKeepers running on HPC nodes to collect events, ChronoGraphers that merge events from many ChronoKeepers and store data using hierarchical storage, and the ChronoVisor that orchestrates and manages the ChronoLog deployment. Chronolog provides many capabilities that differentiate it from other data streaming platforms. For example, it can exploit multiple storage tiers (e.g., persistent memory, flash storage) to scale log capacity and optimize performance; supports multiple writers and multiple readers (MWMR) for efficient concurrent access to the log; enables efficient range queries for partial log processing, enhancing data exploration capabilities; guarantees strict ordering of log entries across distributed environments, eliminating the need for costly synchronization; employs physical time for log ordering, avoiding expensive synchronization operations; automatically and transparently moves log data across storage tiers based on age and access patterns; and adapts to varying I/O workloads, ensuring efficient resource utilization and performance. This tutorial will introduce participants to ChronoLog teaching them how to develop applications that leverage log storage. It will specifically teach participants how to set up and configure ChronoLog on their resources as a system administrator, interact with ChronoLog as a user, andintegrate it in their applications through a real-world example integration with a workflow application. The tutorial is aimed at the users and administrators of HPC systems. It will be suitable for beginners and experts alike. We orient the material enabling beginners to get ramped up on the core concepts and usage patterns, before digging into more advanced materials. | Anthony Kougkas and Kyle Chard |
Collaborative Cloud Science - Deploying The Littlest JupyterHub on Jetstream2 In this 3-hour hands-on tutorial, participants will set up an instance (aka virtual machine) on the Jetstream2 research cloud, and install The Littlest JupyterHub (TLJH) to create a shared computing system. Designed for researchers and educators with basic Linux skills, the session focuses on a simple, practical setup that they can repeat at their institutions. | Julian Pistorius and Stephen Bird |
Intelligible, Powerful Tools for Supercomputer Users Powerful supercomputers have played an important role in both large and small scientific research projects. However, the complexity of these systems can be overwhelming and can hide application underperformance and inefficient job and resource management. An inordinate amount of time and effort can be unnecessarily spent managing user environments, reproducing standard workflows, handling large-scale I/O work, profiling and monitoring jobs, as well as realizing, resolving, and balancing resource usage, evaluating and understanding GPU performance, etc. To help supercomputer users focus on the science of their research work and to minimize the workload for the consulting team, TACC has designed, developed, and maintains a collection of powerful tools for supercomputer users. These tools are portable and effective on almost all supercomputers and are now serving thousands of supercomputer users of TACC, ACCESS, and other institutions every day. These tools were developed by experienced HPC group members to address the daily needs of HPC users. Most of these tools are publicly available on the TACC GitHub and can be conveniently installed in user space on other HPC systems. We now include a couple of non-TACC tools we believe deserve attention. In this tutorial, we will present (and users will practice with) supercomputer tools specifically designed for complex user environments (Lmod, mkmod), tools for workflow management (ibrun, launcher, launcher-GPU, Pylauncher), tools for job monitoring and profiling (Remora, Peak, amask, etc.), and GPU tools (Nsight system and compute) and several other convenient tools. Attendees will learn the function and operation of these tools to make their supercomputer processing more comprehensible and understandable—to benefit from these intelligible, easy-to-use, powerful tools. Detailed do-on-your- own and/or follow-along exercises, developed from years of feedback, provide use-case scenarios for using these tools. Exercises will be performed on Vista and/or Frontera supercomputers at the Texas Advanced Computing Center (TACC). This tutorial is open to all attendees of PEARC25 conference. Particularly, this tutorial will help researchers and students who are interested in how to productively utilize modern supercomputers. Mastering these supercomputer tools will help them perform and evaluate their computational research work more efficiently and conveniently. This tutorial will also interest those who support users, including support staff, educators, and system administrators. We will present the basic design of these powerful tools and illustrate usage that will facilitate supercomputer users. Audience prerequisites: Tutorial attendees are expected to have some basic Linux experience, have some experience with multiprocessing (MPI/OpenMP), and be familiar with basic HPC CPU and GPU architectures. They are expected to bring a laptop with an SSH client to access TACC supercomputers to participate in the hands-on sessions. Training accounts on TACC supercomputers will be prepared before the conference and presented to attendees in the Introduction. All demonstrations and hands-on exercises will be carried out on compute nodes using TACC’s idev interactive interface. | Chun-Yaung Lu, Kent Milfeld, Yinzhi Wang and Wenyang Zhang |
Data Everywhere: Using and Sharing Scientific Data with Pelican While there are perhaps hundreds of petabytes of datasets available to researchers, instead of swimming in seas of data there is often a feel of sitting in a data desert: there’s a mismatch between what sits in carefully curated repositories around the world versus what’s accessible at the computational resources locally available. The Pelican Project (https://pelicanplatform.org/) aims to bridge the gap between repositories and compute by providing a software platform to connect the two sides. Pelican’s flagship instance, the Open Science Data Federation (OSDF), serves billions of objects and more than a hundred petabytes a year to national-scale resources. This tutorial, targeted at end-user data consumers and data providers, will cover the data access model of Pelican, guide participants to access and share data through an existing data federation, and consider how data movement via Pelican and the OSDF can enable their research computing. We propose a 3-hour tutorial. The target audience for this tutorial is domain users, CI professionals (e.g., research computing facilitators, project managers), and data curators/librarians: • Domain users will learn how to use Pelican clients to access public and authenticated data within an existing data federation, with emphasis on using the OSDF and within HTC computing environments. • CI professionals will learn basic use cases for sharing data within a data federation and how users access said data using Pelican clients. • Data librarians/curators will learn how to make data accessible via a data federation, with emphasis on the advantages of using Pelican and the OSDF. The OSDF is part of the coordinated services in NSF's national cyberinfrastructure; this tutorial may be of interest to PIs planning proposal submissions to the CC* program as joining the OSDF is mentioned as one way to fulfill the solicitation's [1] resource sharing requirements. This is an introductory level tutorial. Recommended that participants have beginner's level understanding of the Linux command line, Python coding, and submitting batch jobs to a computing cluster. To participate in the hands-on portion, they should bring their own laptop. | Christina Koch, Brian Bockelman and Andrew Owen |
From First Byte to Publication: Instrument Science Enabled by Globus The proliferation of instruments, e.g., cryogenic electron microscopes and nanopore sequencers, is driving the need for automated solutions to manage generated data throughout its lifecycle—especially as resolutions and datasets, continue to grow. The Globus platform is used in diverse scenarios to build solutions that increase instrument throughput and researcher productivity, and ensure these expensive devices remain highly utilized. This tutorial focuses on integrating multiple services in the Globus platform to build a scalable solution for automating instrument data management and computation, from the time the first byte of data is captured, to distribution and publication of final data and findings. Building on the success of our tutorial “Scaling Instrument Science in the FAIR Age” at PEARC24, we will present an overview of the relevant services and engage participants in a series of hands-on exercises to create and run automated flows that process data coming from an instrument. The material is geared primarily towards research data and computing professionals, including system administrators and research software engineers. Participants will develop an understanding of the various solution components and leave with a fully-working, small scale system— including an actual instrument(!)—that can serve as a starting point for their own development efforts. The tutorial is intended for research computing and data (RCD) professionals. The material is mostly at the intermediate and advanced levels, suitable for those with an understanding of data- and compute-intensive research tasks, with some exposure to development of tools and applications to support research. | Vas Vasiliadis and Rachana Ananthakrishnan |
Introduction to FABRIC FABRIC is an advanced, programmable global network testbed for research and education that enables experimentation, rapid prototyping, and validation of new network and distributed computing applications and services that are impossible or impractical in the current Internet. This tutorial will introduce and onboard attendees to the F ABRIC network and then take them through introductory and intermediate hands-on example use cases. Example topics will include: 1) Creating and deploying basic experiments, 2) Running intelligent big data computations across F ABRIC, and 3) Using F ABRIC's integrated measurement framework. This tutorial is designed to provide RCD professionals and research scientists with experience using FABRIC for their own research needs. The tutorial will be of particular value to RCD facilitators who assist researchers in dealing with the challenges of managing, accessing, and processing large data sets. This tutorial will be presented at the "Introductory" level. This tutorial is designed for users who have little or no experience with FABRIC. Basic experience with Linux command line and minimal programming background (preferably python) are desired. Users should also have some experience with remote login (e.g., ssh) and the use of (remote) virtual machines. | James Griffioen, Charles Carpenter and Mami Hayashida |
PEARC25 Half Day AM Workshop (9:00 am - 12:30 pm)
Title & Abstract | Authors |
---|---|
Building Agentic Workflows on AWS This workshop is designed for builders ready to create Agentic Workflows on AWS. Agentic Workflows enable large language models (LLMs) to act autonomously, performing tasks or providing assistance on behalf of users. In this workshop, you'll learn to: • Leverage capabilities in Amazon Bedrock to create both chatbot and non-chatbot workflows • Incorporate features such as code interpreter and long-term memory • Use Amazon Q Developer to assist in coding and debugging | Scott Friedman, Princ. BDM, Research Computing; Luke Coady, Solutions Architect; Abhilash Thallapally, Solutions Architect |
Cyberinfrastructure Community-wide Mentorship Network (CCMNet) Workshop This workshop is proposed for the PEARC Conference due to the strong alignment between its attendees and the goals of Cyberinfrastructure Community-wide Mentorship Network (CCMNet). PEARC attracts a diverse audience of cyberinfrastructure (CI) professionals, researchers, and educators who are well-positioned to contribute to and benefit from mentorship. Given the ongoing demand for skilled CI professionals, a structured mentorship network is vital to sustaining and growing the workforce. The CCMNet program offers a structured approach to recruiting, supporting, and incentivizing mentors while fostering a culture of continuous learning and professional development. By participating in this workshop, attendees will gain insight into mentorship best practices, establish valuable professional connections, and contribute to strengthening the broader CI community through mentorship. We welcome all interested individuals to engage, collaborate, and help shape the future of mentorship in CI. The workshop is designed for: • Cyberinfrastructure professionals (CIPs) interested in mentoring or seeking mentorship. • Managers supporting CI professional development through mentorship. • Representatives of new or existing mentorship programs looking to collaborate with CCMNet. • Researchers and technologists engaged in CI workforce development. This workshop is open to all backgrounds and skill levels. | Marisa Brazil, Kevin Brandt, Torey Battelle, Vikram Gazula, Laura Christopherson and Julie Ma |
Eighth Workshop on Strategies for Enhancing HPC Education and Training (SEHET25) Computing facilities face a common challenge of supporting a varied user base with varied skills and needs. There is a growing base of users familiar with GUI platforms and little or no LINUX experience, and who are new to the utilization of HPC resources. There are also needs for continuing education with those familiar with HPC systems as application development and system hardware are constantly evolving. Most training groups report that they have limited staff and resources, which results in increased community interest to utilize conference gatherings as a platform to share resources and materials, to identify opportunities for collaboration on content development and to discuss effective strategies for enhancing the breadth and depth of high quality training and education that can be offered. We are proposing a half day Workshop on Challenges to HPC Education and Training at the PEARC25 conference to highlight the collaborative efforts that are underway to develop and deploy HPC training and education, to identify new challenges and opportunities, and to foster new, enhanced and expanded collaborations to respond to the demands for a larger and more interdisciplinary HPC workforce in all sectors of society (e.g. academia, government agencies, business and industry). The workshop aims to bring together the community members engaged in education and training, therefore anyone involved or interested in education and training of broadly understood HPC skills is welcome. | Nitin Sukhija, Scott Lathrop, Susan Mehringer, Kate Cahill, Julia Mullen and Weronika Filinger |
Fine-Tuning and Deploying Domain-Specific AI Models with NVIDIA NeMo, BioNeMo, and NIM This workshop introduces participants to the NVIDIA NeMo framework for fine-tuning large language models (LLMs) and multimodal models. The session will focus on domain-specific applications through BioNeMo, with additional coverage on deployment strategies using NVIDIA NIM - a set of easy-to-use microservices designed to accelerate the deployment of generative AI models across the cloud, data center, and workstations. By the end of the workshop, attendees will: • Gain a solid understanding of the NeMo framework’s capabilities for fine-tuning models, leveraging techniques such as LoRA and p-tuning. • Explore BioNeMo as a domain-specific case study in biology and chemistry for fine-tuning AI models. • Learn how to deploy fine-tuned models efficiently, deploying NVIDIA NIM for high-performance inference. | Mahsa Lotfollahi, NVIDIA Kaleb Smith, NVIDIA Kristopher Keipert, NVIDIA |
Opportunities, benefits and challenges of sharing memory between CPUs and GPUs Click here for "Call for Proposals"In recent years, most performance increases in large scale computing have come from GPU technologies. However, many scientific applications cannot be fully ported to run completely on GPUs, requiring frequent data exchanges between the CPU and GPU components. The original, discrete GPU systems have separate memories for the CPU and the GPUs, requiring explicit data movement over a system bus, which is both expensive and tedious to program. This has effectively prevented a large fraction of scientific codes from making good use of such systems. Recently, both NVIDIA and AMD have begun to offer datacenter-class systems that allow for a unified view of the memory address space between the CPU and GPU cores. From an application point of view, this promises to make using both CPU and GPU resources in a single application drastically more effective and much easier to program. With several such systems becoming available to the scientific community, it is now a great time to learn about such systems and discuss their benefits, challenges, and drawbacks compared to discrete GPU systems. The architectural approach of NVIDIA and AMD is also significantly different, so understanding the advantages and disadvantages of the two will be useful in driving future scientific computing systems design and procurements. This workshop aims to achieve several key objectives: (1) Offer a comprehensive overview of next-generation platforms focusing on unified shared memory between CPU and GPU cores. (2) Highlight application-driven performance analysis across diverse HPC systems. (3) Share early insights and optimization techniques for workloads on distinct platforms from various vendors. The main goal of this workshop is to bring together resource providers, researchers, software engineers and coding science users that are interested in using and supporting applications that benefit from the concurrent use of CPU and GPU resources. The PEARC audience represents such a mix, making it an ideal venue for this workshop. The workshop material will be mildly advanced, so the participants are expected to have at least some experience with either developing or supporting software for GPU-based systems, or operating GPU-based HPC systems. | Igor Sfiligoi, Mahidhar Tatineni, Dan Stanzione, John Cazes and Amit Ruhela |
The ACM SIGHPC SYSPROS Symposium 2025 In order to meet the demands of researchers requiring high-performance computing (HPC) resources, large-scale computational and storage machines must be built and maintained. The HPC systems professionals who tend these systems include system engineers, system administrators, network administrators, storage administrators, and operations staff who face problems that are unique to HPC systems. While many separate conferences exist for the HPC field and for the system administration field, none exist that focus specifically on the needs of HPC systems professionals. Support resources can be difficult to find to help with the issues encountered in this specialized field. Often, systems staff turn to the community as a support resource and opportunities to strengthen and grow those relationships are highly beneficial. This Workshop is designed to share solutions to common problems, provide a platform to discuss upcoming technologies, and to present the state-of-the-practice techniques so that HPC centers will get a better return on their investment, increase performance and reliability of systems, and increase the productivity of researchers. Additionally, this Workshop is affiliated with the systems professionals’ chapter of the ACM SIGHPC (SIGHPC SYSPROS Virtual ACM Chapter). This session would serve as an opportunity for chapter members to meet face-to-face, discuss the chapter’s yearly workshop held at SC, and continue building our community’s shared knowledge base. This Workshop is targeted at HPC systems professionals. These are personnel who are directly or indirectly involved in the design, implementation, and operations of HPC and AI focused infrastructure. This includes large- and small-scale compute clusters, specialty resources (Jetstream, OpenStack, etc.), high-speed and/or low-latency network infrastructure, and massive storage resources. | Jay Blair and Mike Hartman |
Workshop: National Cyberinfrastructure Resources in the Classroom The "National Cyberinfrastructure Resources in the Classroom" workshop aims to demonstrate the value of leveraging NSF-funded shared cyberinfrastructure resources to enhance the educational experience for both instructors and students. By centralizing computational resources, software, and data, the workshop seeks to lower the technical burden on students and faculty, ensuring a level playing field for all students regardless of their background. Through a mix of informational sessions and demonstrations the workshop will highlight the benefits of using high-performance computing (HPC) and cloud resources such as: with uniform and consistent setups; centralized system and software maintenance; multitenancy; enhanced computational power; zero costs; and data protection. Co-led by the Pittsburgh Supercomputing Center and Indiana University, the workshop will provide participants with practical insights into using these resources for educational purposes, showcasing methods for two distinct platforms with differing capabilities, and provide a venue for rich discussion and recommendations. The workshop is primarily designed for university faculty, teaching assistants, instructors, and IT support staff involved in research computing and data-intensive courses. Participants are expected to have a basic understanding of popular computational tools utilized in classroom settings and an interest in learning how utilizing cyberinfrastructure in their teaching practices will benefit not only themselves but the lives of their students. While prior experience with high-performance computing is beneficial, it is certainly not required, as the workshop will cover foundational concepts and illuminate practical applications and popular use cases. | Stephen Deems, Jeremy Fischer, Tom Maiden, Julian Pistorius, Zachary Graber and Lena Duplechin Seymour |
PEARC25 Half Day PM Tutorials (1:30 pm - 5:00 pm)
Title & Abstract | Authors |
---|---|
ACES Tutorial for using Graphcore Intelligence Processing Units (IPUs) for AI/ML workflows The Accelerating Computing for Emerging Sciences (ACES) computing platform, funded by the National Science Foundation (NSF) and hosted at Texas A&M University, has been made available to the national cyberinfrastructure (CI) community through ACCESS (Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support) and NAIRR (National Artificial Intelligence Research Resource) Pilot. This computing platform features various innovative accelerators including the Graphcore Intelligence Processing Units (IPU). The Graphcore IPU offers a model zoo and other utilities to help researchers speed up their AI/ML computing workflows. Researchers participating in this tutorial will learn how to port their TensorFlow (Keras) and PyTorch models for use with the Graphcore IPUs on ACES and model replication and pipelining techniques to distribute workloads. Prerequisites: Basic Python programming skills, knowledge of deep learning frameworks, such as TensorFlow and PyTorch, and an NSF ACCESS ID are required (application available via https://identity.access-ci.org/new-user). Registrants with a valid ACCESS account will then be added to an ACES Educational allocation. Target Audience: The tutorial is mainly targeted towards AI/ML researchers interested in using novel hardware accelerators or seeking to speed up their computing workflows. Those who attended our PEARC24 tutorial will learn more about model replication and pipelining on IPUs. | Zhenhua He, Joshua Winchell, Richard Lawrence, Dhruva Chakravorty, Lisa Perez and Honggao Liu |
AI Workflows on ACCESS Resources Workflows are a key technology for enabling complex scientific computations. They capture the interdependencies between processing steps in data analysis and simulation pipelines as well as the mechanisms to execute those steps reliably and efficiently. Workflows can capture complex processes, promote sharing and reuse, and also provide provenance information necessary for the verification of scientific results and scientific reproducibility. Pegasus is a workflow system, and is now an integral part of the ACCESS Support offerings (https://support.access-ci.org/pegasus). ACCESS Pegasus provides a hosted workflow environment, based on Open Ondemand and Jupyter, which enables users to develop, submit and debug workflows using just a web browser. A provisioning system, HTCondor Annex, is used to execute the workflows on a set of ACCESS resources: PSC Bridges2, SDSC Expanse, Purdue Anvil, NCSA Delta, and IU Jetstream2. In general, Pegasus (https://pegasus.isi.edu) is being used in a number of scientific domains doing production grade science. In 2016 the LIGO gravitational wave experiment used Pegasus to analyze instrumental data and confirm the first detection of a gravitational wave. The Southern California Earthquake Center (SCEC) based at USC, uses a Pegasus managed workflow infrastructure called Cybershake to generate hazard maps for the Southern California region. In its most recent simulation campaign, SCEC CyberShake Study 24.8, which ran from September-November 2024, used Pegasus to execute approximately 28,000 jobs, and used about 180,000 node-hours on the Frontier system at Oak Ridge Leadership Computing Facility and Frontera at the Texas Advanced Computing Center. At peak, we utilized 44% of Frontier, which was the #1 system on the Top500 at the time. Our workflow tools managed about 1 PB of total data, including transferring 330 TB automatically from Frontier to Frontera, and staged 9 million output files totaling 36 TB back to archival storage on the University of Southern California’s Center for Advanced Research Computing systems. Pegasus orchestrated this execution using a combination rvGAHP and HTCondor Glideins. Pegasus is also being used in astronomy, bioinformatics, civil engineering, climate modeling, earthquake science, molecular dynamics and other complex analyses. Target Audience: Scientific Domain Application Developers, Application Scientists, System Architects doing large-scale scientific analysis. Attendee prerequisites: The participants will be expected to bring in their own laptops with the following software installed: Web Browser, PDF reader. We assume familiarity with working in a Linux environment, and some basic Python skills. | Karan Vahi and Mats Rynge |
Bootstrapping and Cluster DevOps with OpenCHAMI This tutorial will introduce the fundamentals of cloud-like system provisioning with OpenCHAMI and enable attendees to build on what they’ve learned by applying DevOps principles to manage OpenCHAMI clusters. OpenCHAMI is a relatively new open source, open governance project from partners: HPE, the University of Bristol, CSCS, NERSC, and LANL. It securely provisions and manages on-premise HPC nodes at any scale. With a cloud-like composable microservice architecture, and an emphasis on infrastructure as code, the tools to install and manage OpenCHAMI may not be familiar to many traditional HPC administrators. Having helped our own teams at LANL to make this transition, the presenters want to bring the same training to a broader audience. In this half-day tutorial, attendees will learn how each of the components in an OpenCHAMI system can be used to bootstrap their own virtual clusters in the first hour and then build on what they’ve learned to leverage DevOps workflows to automate the management of multiple clusters. | David Allen, Devon Bautista and Alex Lovell-Troy |
CI Usage and Performance Data Analysis with XDMoD and NetSage for Resource Providers In this interactive, hands-on tutorial, attendees will learn how to analyze the usage and performance of the NSF ACCESS-allocated cyberinfrastructure (CI) using the visualization and reporting capabilities of the ACCESS XDMoD and NetSage tools. ACCESS XDMoD provides system support personnel and center leadership with a wide variety of data on usage and job-level and system-level performance. These data can be reported on and ana- lyzed directly through the portal in real-time or through custom reporting that is set up once and then sent to the interested parties as a PDF file at a cadence of their choosing. NetSage is an open privacy-aware network measurement, analysis, and visualization service that provides near real-time monitoring and visualization of data transfers to help ensure that scientific workflows are operating at maximum efficiency. This tutorial will instruct attendees how to use these tools and the wide variety of metrics available to facilitate CI system management, support, and planning. Target Audience: Resource Provider Support Staff, System Administrators, and Center Management; Campus Champions | Aaron Weeden, Joseph P. White and Jennifer M. Schopf |
Deploy & Manage Kubernetes on Jetstream2 using OpenStack Magnum In this 3-hour hands-on tutorial, participants will learn how to use OpenStack Magnum to create and manage Kubernetes clusters on the Jetstream2 research cloud. Designed for research software engineers and IT support staff with intermediate Linux skills and a basic understanding of containers and container orchestration, this session provides a repeatable process to build a scalable, container-based research system for their institutions. Target Audience and Expected Backgrounds/Skill Levels Research software engineers and IT support staff interested in container orchestration systems and cloud computing, especially from institutions with limited computing resources. Intermediate skill in Linux system administration and container technologies. | Julian Pistorius and Stephen Bird |
Globus Compute: Federated Function as a Service for the Computing Continuum Growing data volumes, new computing paradigms, and increasing hardware heterogeneity are driving the need to execute computational tasks across a continuum of distributed computing resources. Such needs are motivated by the desire to compute closer to data acquisition sources, exploit specialized computing resources (e.g., hardware accelerators), provide real-time processing of data, reduce energy consumption (e.g., by matching workload with hardware), and scale simulations beyond the limits of a single computer. Globus Compute addresses these needs by delivering a hybrid cloud platform implementing the Function-as-a-Service (Faas) paradigm. Researchers first register their desired function with the cloud-hosted Globus Compute service, they can then request invocation of that function with arbitrary input arguments to be executed on remote cyberinfrastructure. Globus Compute manages the reliable and secure execution of the function, provisioning resources, staging function code and inputs, managing safe and secure execution (optionally using containers), monitoring execution, and asynchronously returning results to users via the cloud platform. Functions are executed by the Globus Compute endpoint software—an agent that may be installed by administrators and offered to user communities or installed by users anywhere they have access. The endpoint effectively turns any existing resource (e.g., laptop, cloud, cluster, supercomputer, or container orchestration cluster) into a FaaS endpoint. Over the last three years, Globus Compute has been used by thousands of researchers around the world to execute more than 50M tasks across more than 15,000 distributed computing endpoints. This tutorial builds upon the success of a similar tutorial hosted at PEARC 2024. That tutorial was attended by 20-25 people and was the first tutorial to close registration due to all slots being filled. The tutorial will discuss opportunities for FaaS in research computing, approaches for portable execution across endpoints, and the benefits of this approach (e.g., performance, energy efficiency). Further, it will directly relate to modern approaches in CI, for example enabling fine-grained and portable allocations in NSF ACCESS and as a common interface for remote computing in DOE’s integrated research infrastructure. The tutorial will extend existing tutorial materials that have been delivered at many international venues. We target both users and administrators of HPC and cloud resources, for example, domain scientists, Research Software Engineers, and HPC administrators. We expect that anyone who uses remote computing resources will benefit from this tutorial. Audience Prerequisites Basic programming experience (ideally with Python). Basic Linux experience to install endpoint software. The tutorial can be conducted entirely in a hosted Jupyter notebook and using a cloud-hosted tutorial endpoint, requiring no local installation. | Kyle Chard and Reid Mello |
Implementing Opensource LLMs in Research Computing: From Model Selection to On-Premises Deployment This comprehensive tutorial will guide participants through the landscape of opensource large language models (LLMs), providing both theoretical knowledge and practical implementation strategies for research computing environments. The session is designed to bridge the gap between understanding LLM capabilities and successfully deploying them in high-performance computing infrastructures. | Dr. Christopher S. Simmons, Cambridge Computer |
Open OnDemand Overview, Customization, and App Development Developed by the Ohio Supercomputer Center and funded by the U.S. National Science Foundation, Open OnDemand (openondemand.org) is an open-source portal that enables web-based access to HPC services. Clients manage files and jobs, create and share apps, run GUI applications and connect via SSH, all from any device with a web browser. Open OnDemand empowers students, researchers, and industry professionals with remote web access to supercomputers. From a client perspective, key features are: requires zero installation (since it runs entirely in a browser); easy to use (via a simple interface); compatible with any device (even a mobile phone or tablet). From a system administrator perspective, key features are: provides a low barrier to entry for users of all skill levels; is open source and has a large community behind it; is configurable and flexible for user’s unique needs. The session leaders, all part of the Open OnDemand development team, will begin the tutorial with a short overview of Open OnDemand. The first part demos the features of Open OnDemand. The next gives examples of customizing Open OnDemand and configuring interactive apps. The end is an overview of the development roadmap for Open OnDemand and a discussion regarding community Needs. Target audience: HPC system administrators and user support personnel, Intermediate level. | Alan Chalker, Julie Ma, Emily Moffat Sadeghi, Travis Ravert, Dhruva Chakravorty and Marinus Pennings |
The Streetwise Guide to Jupyter Security Jupyter is software that i) permits arbitrary code to be run interactively; ii) provides an accessible web interface to powerful shared computational resources; and iii) facilitates code sharing for ease of use and scientific reproducibility. Jupyter notebooks and multi-user “JupyterHub” installations have become ubiquitous in scientific, research, and educational communities. However, the Jupyter paradigm alters the threat landscape and presents unique challenges for infrastructure operators. Security concerns are even more pronounced in the heterogeneous environments of modern, complex, data-driven science and workflows. Configuration errors, identity and access management, isolation issues, auditing and logging, and software dependency bugs can all affect the security and operation of JupyterHub deployments or conflict with organizational security requirements and policies. This tutorial – presented by a member of the Jupyter Security Subproject and a professor at a state university with a large JupyterHub deployment – will provide an overview of the Jupyter ecosystem and how it is most effectively used before diving into hands-on exercises covering the current best practices for securing and operating a JupyterHub installation in different environments. Along the way, we will explain the security risks and tradeoffs in deploying Jupyter and sharing notebooks, and how to manage those risks. The content of this tutorial is targeted to people looking to understand security in deploying and running Jupyter, with an emphasis on multi-user JupyterHub servers. Examples include: • Researchers looking for guidelines on sharing Jupyter notebooks; • Research software engineers facilitating a JupyterHub deployment at their campus research computing center; • System administrators and user support staff that want to improve the security of their existing or planned JupyterHub installation; • Security engineers who have been asked to review Jupyter deployments; • People interested in how the JupyterHub architecture could be used as a template for the secure deployments of other interactive computational tools. Audience Prerequisites Attendees should bring a laptop with WiFi, a web browser, and an installed SSH client. Attendees will be provided with a virtual machine running on Amazon Web Services for hands-on exercises in running and securing Jupyter and JupyterHub. The instructors will provide guided steps to help attendees who are less familiar with Jupyter and SSH usage. If Internet connectivity is unavailable, instructors will use local screen recordings to demonstrate the interactive steps. | Rick Wagner and Robert Beverly |
PEARC25 Half Day PM Workshop (1:30 pm - 5:00 pm)
Title & Abstract | Authors |
---|---|
Campus Champions and NAIRR: Empowering AI Research Facilitation Through Collaboration The Campus Champions (CC) community has been functioning as an independent entity while partnering with other entities in the Research Computing and Data (RCD) ecosystem since the end of the XSEDE era. These partnerships foster a dynamic and connected community of advanced research computing professionals that promote leading practices at the frontiers of research, scholarship, teaching, and industry application. A recent EAGER grant from the National Science Foundation enables Campus Champions to partner with the National Artificial Intelligence Research Resource (NAIRR) Pilot, including organizing this NAIRR-focused workshop at PEARC25. We propose a half-day workshop that empowers attendees to facilitate AI research by sharing opportunities through the Campus Champions and spreading awareness of the NAIRR Pilot’s resources. The target audience includes active Campus Champions, new Campus Champions, those interested in the Campus Champions and our collaborative community, and those interested in learning how to facilitate AI research through NAIRR, all of which are relevant to PEARC’s research computing and data professional attendees. | Michael D. Weiner, Forough Ghahramani, Cyd Burrows-Schilling, Marina Kraeva, Nitin Sukhija, Chuck Pavloski, Mike Renfro, Jason Simms and Juan Jose Garcia Mesa |
Collaborating with K12 Schools: Supporting Secondary Students and Teachers in Computing Collaborations between institutions of higher education and P12 schools fosters deeper understanding about each the work done by teachers and professors as well as administrators at both institutions. The students in P12 will become college students and part of the workforce. Thus collaboration between instructors and administrators in all levels of the education process benefits all of society. The collaboration between instructors can enhance the curriculum and instructional processes and increase access to resources. One way to reach out to students in PK12 and their parents is to provide learning experiences beyond the scope of the PK12 curriculum is to offer summer camps. In this session, presenters and the audience will discuss these opportunities and share their successful experiences so that we can learn from each other. Target Audience Campus computing professionals interested in outreach to local middle and high school teachers and students to cultivate an interest in artificial intelligence, computing, and/or cybersecurity. | Bruno Abreu, Tommaso Macri, Santiago Nunez-Corrales and Yipeng Huang |
Expanding ACCESS: Tools and Innovations for the Broader Cyberinfrastructure Community In this session, members of the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS) will highlight current tools, services, and initiatives designed to benefit the broader advanced cyberinfrastructure community—extending beyond the direct scope of ACCESS resources. The session will feature a series of brief presentations focused on tools and resources that may be of interest to the wider community, with the goals of: a) sparking interest and encouraging further engagement with ACCESS, and/or b) offering solutions that can be adopted and implemented at your home institution. These discussion areas are: • Speaker: Vipin Chaudhary, Case Western Reserve University: System logs are a vital source of diagnostic information in large-scale computing environments, enabling automated anomaly detection (AD) for early fault identification and root-cause analysis. However, existing log AD methods face limitations: brittle reliance on log parsers, rigid embedding-specific pipelines, limited interpretability, and poor early detection capabilities. These challenges are especially pronounced in dynamic, heterogeneous systems such as HPC clusters and cloud platforms. We present Anomaly Nexus, a unified, parser-free, and embedding-agnostic framework for unsupervised log anomaly detection. Anomaly Nexus integrates lightweight preprocessing with a flexible embedding interface and leverages representation-level typicality estimation for point-wise anomaly scoring—extending recent advances in out-of-distribution detection to the log domain. • Speaker: Vikram Gazula, University of Kentucky: Attendees will learn about the ACCESS recommender system and the ACCESS Software Documentation Service and the ways in which potential and existing users can leverage these services to make effective use of ACCESS resources. In addition, attendees will learn about the ACCESS Question and Answer (Q&A) service including its range of capabilities and ways it can be incorporated into other web pages and services. • Speaker: David Hart, National Center for Atmospheric Research (NCAR): Experimenting with the Variable Marketplace. This work will present an experiment we're planning with collaborators from Harvard Business School and willing Resource Providers using the features recently deployed as part of our Variable Marketplace innovative pilot activity. Over a period of several months, HBS researchers will be changing exchange rates on a daily basis with randomized values to statistically validate the impacts that different discounts or surcharges have on researchers' decisions to choose from among the available resources. • Speaker: David Hart, National Center for Atmospheric Research (NCAR): Building On-Ramps to the ACCESS Ecosystem. The Allocations team will describe their On-Ramps product, a JavaScript-based embeddable component that any institution or organization can deploy on their website to share information about the ACCESS ecosystem of resources to their local community. Designed to empower Campus Champions and other cyberinfrastructure facilitators, On-Ramps can augment institutional websites with dynamically updated descriptions of the national-scale resources currently available in the ACCESS ecosystem. • Speaker: Joseph White, University at Buffalo, State University of New York: Members of the ACCESS Infrastructure Portfolio Expansion Standing Committee will share information about what infrastructure is currently part of the ACCESS ecosystem as well as the kinds of infrastructure the ACCESS team is working to expand into. This discussion is intended for both researchers who use the ACCESS infrastructure and are wondering which resources might best meet their needs as well as organizations who are considering integrating their infrastructure into the federation. | Vipin Chaudhary, Vikram Gazula, David Hart, and Joseph White |
How computational infrastructures can support scalable AI-Readiness of data to power collaboration IThe goal of this workshop is to discuss the synergy between making data AI-ready and the implementation of FAIR principles. Engaging a deep dialogue with the PEARC25 attendees should lead to recommendations for practitioners who develop and utilize scientific and commercial digital ecosystems to advance the creation of trustworthy and productive AI and reusable data infrastructure. These recommendations will be integrated into the stakeholder outreach strategies of the organizing institutions and shared with the advanced computing and data science communities served by the PEARC conference series. Target audience and expected background and/or skill levels. The target audience for this workshop consists of disciplinary researchers and educators, as well as infrastructure developers with an active interest in machine learning and data science. No specific skills or background are expected. | Sergiu Sanielevici, Laurette Dubé, Christine Kirkpatrick, Raghu Mahiraju, Erik Schultes and Amitava Majumdar |
HPC-as-a-Service: Enabling Self-Service Research Compute for the Masses As research organizations increasingly require on-demand, scalable, and user-friendly high-performance computing (HPC), traditional manual provisioning can slow innovation. This workshop introduces Ganana Cluster Manager, a platform that simplifies deploying and managing HPC clusters on-premises and in the cloud. Ganana empowers users with self-service provisioning, secure multi-user access, and unified job management—enabling true HPC-as-a-Service. The workshop also covers best practices for selecting the right cloud providers and compute resources using robust multi-criteria decision-making (MCDM) methodologies. Real-world case studies showcase how Ganana accelerates breakthroughs in fields like aerodynamics, AI-driven drug discovery, genomics, and medical imaging. Participants will gain practical strategies for deploying hybrid HPC clusters, integrating HPC with AI and data engineering, and leveraging managed services to ensure scalable, production-ready research computing. This workshop is ideal for HPC professionals, research computing facilitators, system architects, and institutional leaders aiming to deliver next-generation, self-service research infrastructure that democratizes access to HPC and accelerates scientific discovery. | Saurabh Mittal, Michael Hicks, Asit Sahoop, and Alok Pandey SHI |
WHPC: Collaboration, Community, Careers The WHPC Workshop at PEARC25 (Columbus, OH, USA) has the primary objective of nurturing a diverse and inclusive Research Computing and Data (RCD) and High-Performance Computing (HPC) community. Our overarching goal is to cultivate competencies geared towards appreciating the value of a diverse workforce and establishing an inclusive environment that welcomes all Humans. This workshop builds upon successful event held during PEARC24. Our agenda focuses on bringing the communities together through shared values of diversity and inclusion, advocating for all people from underrepresented groups. The focal points of the WHPC at PEARC25 workshop include: • Enhancing diversity and inclusion across the entire RCD and HPC workforce. • Facilitating a deeper comprehension of the nuances of diversity, equity, and inclusion across various demo-graphic groups. • Strategies for recruitment, retention, and success. • Promoting community building through interactive networking opportunities. • Emphasizing the importance of learning from and valuing diverse experiences and career trajectories. Target Audience In general, WHPC workshops traditionally focus on attracting attendees such as: • those who identify as an individual from an underrepresented group in the RCD and/or HPC community • those who are allied with underrepresented groups within RCD and HPC • early-career and mid-career individuals seeking guidance • individuals established in their career wanting to contribute their experience • folks who are open to meeting and supporting a diverse and inclusive RCD and HPC community | Elsa Gonsiorowski, Gladys Andino, Amanda Black, Subhashini Sivagnanam, Claire Stirm and Janna Nugent |