Ian Lumsden

Graduate Research Assistant at Global Computing Lab

I am a Ph.D. student in Computer Science advised by Dr. Michela Taufer at the University of Tennessee, Knoxville. I received my Bachelors in Computer Science in Spring 2020 from UTK. My research interests are developing novel techniques for HPC performance data analysis and developing tools to enable and enhance scientific computing workflows. Outside of HPC, I enjoy discovering new cultures, places, and people.

University of Tennessee

May 2019 - Present

Knoxville, TN

Research Assistant

May 2019 - Present

Responsibilities:

Developing a benchmark to study data movement motifs in scientific workflows and assess the performance of data movement tools in these motifs
Developing novel techniques and tools based on Flux to assist in data movement for scientific workflows
Developing novel techniques to identify causes of interesting performance phenomena in High-Performance Computing applications using Hatchet
Examining the performance and implications of in-situ and in-transit data analysis of molecular dynamics simulations through the Analytics4MD project

Lawrence Livermore National Laboratory

May 2020 - Aug 2024

Livermore, CA

Graduate Computing Summer Intern

May 2024 - Aug 2024

Responsibilities:

Developed a benchmark to study data movement motifs in scientific workflows and assess the performance of data movement tools in these motifs
Conducted a performance study of different CPUs and GPUs with different types of memory using the RAJA Performance Suite with other members of the Thicket team
Refactored the topdown analysis service in LLNL’s Caliper performance monitoring tool and added support for Intel Sapphire Rapids CPUs
Developed Python bindings for Caliper and used them to develop performance annotations for LLNL’s Hatchet and Thicket tools

Graduate Computing Summer Intern

May 2023 - Aug 2023

Responsibilities:

Developed an approach to automatically detect all accessible levels of the storage hierarchy and model it as a bipartite graph
Integrated this approach into LLNL’s Dynamic and Asynchronous Data Streamliner (DYAD)
Improved the structure and performance of DYAD and the test suite developed during the previous summer

Graduate Computing Student Intern

May 2022 - Aug 2022

Responsibilities:

Augmented the workflow framework from the Analytics4MD project to make use of LLNL’s Dynamic and Asynchronous Data Streamliner (DYAD)
Created a test suite for DYAD based on workflows from the Analytics4MD project
Examined the performance of DYAD using our test suite and using the performance data collection tool PerfFlow Aspect

Graduate Computing Student Intern

Jun 2021 - Aug 2021

Responsibilities:

Developed a data analysis workflow that uses profiles of LLNL’s MARBL multi-physics simulation tool to predict what compiler MARBL should be built with to get the best performance for a particular simulation workload

Undergraduate Computing Student Intern

May 2020 - Aug 2020

Responsibilities:

Designed a new graph-based filtering query language for the Hatchet data analysis tool to enable relationship-based analysis of profiling data
Implemented the query language and integrated it into Hatchet’s data analysis capabilities
Used the query language to perform novel analysis of the performance of different MPI calls in HPC benchmark applications
Presented the work associated with this internship at the ACM Student Research Competition at the annual Supercomputing (SC) conference, where it won the 1st place award in the Undergraduate category
Presented an expanded version of this work at the ACM Student Research Competition Grand Finals

Oak Ridge National Laboratory

May 2017 - Dec 2018

Oak Ridge, TN

HERE Intern

May 2017 - Dec 2018

Responsibilities:

Developing GUIs using Python and JavaScript for user data analysis
Updating code to support both Python 2.7 and 3
Developing a Python package to convert XML representations of constructive solid geometry into OpenSCAD code for visualization and 3D-printing purposes
Parallelizing a Monte-Carlo neutron ray-tracing software package with CUDA

University of Tennessee

2020-Present

University of Tennessee

2016-2020

DYnamic and Asynchronous Data Streamliner (DYAD)

Developer and Researcher May 2022 - Present

DYAD (part of the Flux project) is a tool that enables computational science workflows that are using sequential, PFS-based data movement to transition to more state-of-the-art in situ and in transit data movement. It enables this using the tooling provided by LLNL’s Flux resource manager. My work on DYAD includes examining its performance impact on workflows and improving DYAD’s data movement using networking libraries like UCX.

gclab llnl

Thicket

Developer and Researcher May 2022 - Present

Thicket is a Python-based toolkit for analyzing ensemble performance data. It is also built on top of Hatchet, allowing for the same benefits that Hatchet provides. My work on Thicket centers around the Call Path Query Language I created for Hatchet as well as general software engineering work.

gclab llnl

Hatchet

Developer and Researcher May 2020 - Present

Hatchet is a Python library that enables users to analyze performance data generated by different HPC profilers. Its main advantage over other tools is that it is capable of ingesting data from different profilers into a common representation, allowing users to use the same code to analyze performance data from different sources. My work on Hatchet centers around the Call Path Query Language, which enables users to filter profiles based on caller-callee relationships.

gclab llnl

McVineGPU

Developer May 2018 - December 2018

McVineGPU is a proof-of-concept implementation of a GPU-powered version of MCViNE (a ray-tracing neutron scattering experiment simulation software).

ornl

SCADGen

Developer January 2018 - May 2018

SCADGen is a tool for converting XMLs used by MCViNE to represent constructive solid geometry into OpenSCAD code for visualization and STL generation purposes.

ornl

ipywe

Developer May 2017 - August 2017

ipywe is a library that provides a set of widgets/GUIs for performing neutron scattering data analysis within Jupyter notebooks.

ornl

RAJA Performance Suite: Performance Portability Analysis with Caliper and Thicket

2024 International Workshop on Performance, Portability & Productivity in HPC at SC'24 18 November 2024

Olga Pearce Jason Burmark Rich Hornung Befikir Bogale Ian Lumsden Michael McKinsey Dewi Yokelson David Boehme Stephanie Brink Michela Taufer Tom Scogland

Maintaining performant code in a world of fast-evolving computer architectures and programming models poses a significant challenge to scientists. Typically, benchmark codes are used to model some aspects of a large application code’s performance, and are easier to build and run. Such benchmarks can help assess the effects of code or algorithm changes, system updates, and new hardware. However, most performance benchmarks are not written using a wide range of GPU programming models. The RAJA Performance Suite provides a comprehensive set of computational kernels implemented in a variety of programming models. We integrated the performance measurement and analysis tools Caliper and Thicket into the RAJAPerf to facilitate performance comparison across kernel implementations and architectures. This paper describes the RAJAPerf, performance metrics that can be collected, and experimental analysis with case studies.

Performance Analysis

Details

DYAD: Locality-aware Data Management for accelerating Deep Learning Training

36th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) 13-15 November 2024

Hariharan Devarajan Ian Lumsden Chen Wang Konstantia Georgouli Tom Scogland Jae-Seung Yeom Michela Taufer

Deep Learning (DL) is increasingly applied across various fields to solve complex scientific challenges in modern high-performance computing (HPC) systems that are beyond the reach of traditional algorithms. Training DL models for scientific applications involves processing multi-terabyte datasets in each epoch. The data access behavior during DL training exposes optimization opportunities to cache these datasets in near-compute storage accelerators in HPC systems, enhancing I/O throughput. However, current middleware solutions employ near-compute storage accelerators primarily as exclusive caches, which limits the effectiveness of cache access locality. To address this problem, we introduce DYAD, a system designed to maximize sample locality in the cache, thereby significantly increasing I/O throughput in HPC systems.DYAD optimizes I/O for DL training based on three key features. First, DYAD boosts inter-node access speeds by using a novel streaming RPC with RDMA protocol, achieving a 1.25x performance gain over state-of-the-art solutions. Second, DYAD further enhances inter-node access by coordinating data movement, which mitigates network congestion and increases throughput for inter-node accesses by up to 8.78x. Last, DYAD uses smart metadata caching that outperforms traditional global metadata access methods by several orders of magnitude in terms of lookup throughput. We demonstrate how DYAD accelerates large-scale DL training on a high-end HPC cluster with 512 GPUs by up to 10.82x faster epochs compared to UnifyFS by performing locality-aware caching on near-compute storage accelerators.

I/O Artificial Intelligence

Details

Empirical Study of Molecular Dynamics Workflow Data Movement: DYAD vs. Traditional I/O Systems

2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) 27-31 May 2024

Ian Lumsden Hariharan Devarajan Jack Marquez Stephanie Brink David Boehme Olga Pearce Jae-Seung Yeom Michela Taufer

This experimental work examines data movement in molecular dynamics (MD) workflows, comparing the Dynamic and Asynchronous Data Streamliner (DYAD) middleware with traditional, industry-standard I/O systems such as XFS and Lustre. DYAD moves MD simulation frames to analytics processes, providing enhanced flexibility and efficiency for dynamic data transfers and in situ analytics. At the same time, traditional I/O storage systems provide durability and scalability for high-performance computing (HPC) systems. The study integrates MD workflows with common simulation codes, facilitating immediate capture and transfer of MD frames to a staging area. It explores various molecular models, from simple to complex, assessing data management performance and scalability. Different producer-consumer pairs, molecular models, and data transaction frequency enable testing across small to large-scale HPC scenarios, from single-node configurations to large, distributed environments. The findings reveal that adaptive mechanisms for minimizing synchronization, direct network communication between producer and consumer processes, and optimizations of both data movement and synchronization are crucial for performance and scalability in MD workflows.

I/O Performance Analysis

Details

Thicket: Seeing the Performance Experiment Forest for the Individual Run Trees

32nd International Symposium on High-Performance Parallel and Distributed Computing 20-23 June 2023

Stephanie Brink Michael McKinsey David Boehme Connor Scully-Allison Ian Lumsden Daryl Hawkins Treece Burgess Vanessa Lama Jakob Luettgau Katherine E. Isaacs Michela Taufer Olga Pearce

Thicket is an open-source Python toolkit for Exploratory Data Analysis (EDA) of multi-run performance experiments. It enables an understanding of optimal performance configuration for large-scale application codes. Most performance tools focus on a single execution (e.g., single platform, single measurement tool, single scale). Thicket bridges the gap to convenient analysis in multi-dimensional, multi-scale, multi-architecture, and multi-tool performance datasets by providing an interface for interacting with the performance data. Thicket has a modular structure composed of three components. The first component is a data structure for multi-dimensional performance data, which is composed automatically on the portable basis of call trees, and accommodates any subset of dimensions present in the dataset. The second is the metadata, enabling distinction and sub-selection of dimensions in performance data. The third is a dimensionality reduction mechanism, enabling analysis such as computing aggregated statistics on a given data dimension. Extensible mechanisms are available for applying analyses (e.g., top-down on Intel CPUs), data science techniques (e.g., K-means clustering from scikit-learn), modeling performance (e.g., Extra-P), and interactive visualization. We demonstrate the power and flexibility of Thicket through two case studies, first with the open-source RAJA Performance Suite on CPU and GPU clusters and another with a large physics simulation run on both a traditional HPC cluster and an AWS Parallel Cluster instance.

Performance Analysis

Details

Ubique: A New Model for Untangling Inter-task Data Dependence in Complex HPC Workflows

2022 IEEE 18th International Conference on eScience (eScience) 10-14 October 2022

Jae-Seung Yeom Dong Ahn Ian Lumsden Jakob Luettgau Silvina Caino-Lores Michela Taufer

Exploiting task parallelism is getting increasingly difficult for diverse and complex scientific workflows running on High Performance Computing (HPC) systems. In this paper, we argue that the difficulty rises from a void in the spectrum of existing data-transfer models for resolving inter-task data dependence within a workflow and propose a novel model to fill that gap: Ubique. The Ubique model combines the best from in-transit and in situ models in order for loosely coupled producer and consumer tasks to run concurrently and to resolve their data dependencies efficiently with little or no modifications to their codes, striking a balance between transparent optimization, productivity, and performance. Our preliminary evaluation suggests that Ubique can significantly outperform the parallel file system (PFS)-based model while offering automatic data transfer and synchronization which are the features lacking in many traditional models. It also identifies the performance characteristics of its key depending subsystems, which must be understood for further broadening its benefits.

I/O Workflows

Details

Enabling Call Path Querying in Hatchet to Identify Performance Bottlenecks in Scientific Applications

2022 IEEE 18th International Conference on eScience (eScience) 10-14 October 2022

Ian Lumsden Jakob Luettgau Vanessa Lama Connor Scully-Allison Stephanie Brink Katherine E. Isaacs Olga Pearce Michela Taufer

As computational science applications benefit from larger-scale, more heterogeneous high performance computing (HPC) systems, the process of studying their performance becomes increasingly complex. The performance data analysis library Hatchet provides some insights into this complexity, but is currently limited in its analysis capabilities. Missing capabilities include the handling of relational caller-callee data captured by HPC profilers. To address this shortcoming, we augment Hatchet with a Call Path Query Language that leverages relational data in the performance analysis of scientific applications. Specifically, our Query Language enables data reduction using call path pattern matching. We demonstrate the effectiveness of our Query Language in identifying performance bottlenecks and enhancing Hatchet’s analysis capabilities through three case studies. In the first case study, we compare the performance of sequential and multi-threaded versions of the graph alignment application Fido. In doing so, we identify the existence of large memory inefficiencies in both versions. In the second case study, we examine the performance of MPI calls in the linear algebra mini-application AMG2013 when using MVAPICH and Spectrum-MPI. In doing so, we identify hidden performance losses in specific MPI functions. In the third case study, we illustrate the use of our Query Language in Hatchet’s interactive visualization. In doing so, we show that our Query Language enables a simple and intuitive way to massively reduce profiling data.

Performance Analysis

Details

Usability and Performance Improvements in Hatchet

2020 IEEE/ACM International Workshop on HPC User Support Tools (HUST) and Workshop on Programming and Performance Visualization Tools (ProTools) 18 November 2020

Stephanie Brink Ian Lumsden Connor Scully-Allison Katy Williams Olga Pearce Todd Gamblin Michela Taufer Katherine E. Isaacs Abhinav Bhatele

Performance analysis is critical for pinpointing bottlenecks in parallel applications. Several profilers exist to instrument parallel programs on HPC systems and gather performance data. Hatchet is an open-source Python library that can read profiling output of several tools, and enables the user to perform a variety of programmatic analyses on hierarchical performance profiles. In this paper, we augment Hatchet to support new features: a query language for representing call path patterns that can be used to filter a calling context tree, visualization support for displaying and interacting with performance profiles, and new operations for performing analyses on multiple datasets. Additionally, we present performance optimizations in Hatchet’s HPCToolkit reader and the unify operation to enable scalable analysis of large datasets.

Performance Analysis

Details

Neutron Imaging Analysis using Jupyter Python Notebook

Journal of Physics Communications 30 August 2019

Jean-Christophe Bilheux Hassina Bilheux Jiao Lin Ian Lumsden Yuxuan Zhang

Independently of the image modality (x-rays, neutrons, etc), image data analysis requires normalization, a preprocessing step. While the normalization can sometimes easily be generalized, the analysis is, in most cases, specific to an experiment and a sample. Although many tools (MATLAB, ImageJ, VG Studio…) offer a large collection of pre-programmed image analysis tools, they usually require a learning step that can be lengthy depending on the skills of the end user. We have implemented Jupyter Python notebooks to allow easy and straightforward data analysis, along with live interaction with the data. Jupyter notebooks require little programming knowledge and the steep learning curve is bypassed. Most importantly, each notebook can be tailored to a specific experiment and sample with minimized effort. Here, we present the pros and cons of the main methods to analyse data and show the reason why we have found that Jupyter Python notebooks are well suited for imaging data processing, visualization and analysis.

Physics Neutron Physics

Details

Recent Developments of MCViNE and its Applications at SNS

Journal of Physics Communications 12 August 2019

Jiao Lin Fahima Islam Gabriele Sala Ian Lumsden Hillary Smith Mathieu Doucet Matthew B Stone Douglas L Abernathy Georg Ehlers John F Ankner Garrett E Granroth

MCViNE is an open source, object-oriented Monte Carlo neutron ray-tracing simulation software package. Its design allows for flexible, hierarchical representations of sophisticated instrument components such as detector systems, and samples with a variety of shapes and scattering kernels. Recently this flexible design has enabled several applications of MCViNE simulations at the Spallation Neutron Source (SNS) at Oak Ridge National Lab, including assisting design of neutron instruments at the second target station and design of novel sample environments, as well as studying effects of instrument resolution and multiple scattering. Here we provide an overview of the recent developments and new features of MCViNE since its initial introduction (Jiao et al 2016 Nucl. Instrum. Methods Phys. Res., Sect. A 810, 86–99), and some example applications.

Physics Neutron Physics

Details

Grid-Based Volume Integration for Elasticity: Traction Boundary Integral Equation

Engineering Fracture Mechanics 1 May 2017

Ian Lumsden L.J. Gray Wenjing Ye

A volume integral algorithm for the non-homogeneous elasticity traction boundary integral equation is presented. The body force volume integral is exactly split into a relatively simple boundary integral, together with a remainder volume integral that can be evaluated using a regular grid of cuboid cells covering the problem domain. Of particular importance for (inelastic) fracture analysis is that the volume integral over the regular grid is computed without explicit knowledge of the domain boundary, including the fracture surface. A Galerkin approximation is employed, and the numerical implementation is validated by solving body force elasticity problems with known solutions.

Math

Details

Lead Student Volunteer

2024 ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis (SC) March 2024 - November 2024

Serving as a Lead Student Volunteer at SC24. Prior to the conference, I am working with the Posters Committee to help organize the Posters track of the conference.

Student Volunteer

2023 ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis (SC) November 2023

Served as a Student Volunteer at SC23. In this role, I helped ensure the sessions of the conference ran smoothly. Additionally, I performed other miscellaneous tasks, such as keeping track of the number of attendees in the sessions at which I was working.

Lead Student Volunteer

2022 ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis (SC) March 2022 - November 2022

Served as a Lead Student Volunteer at SC22. Prior to the conference, I was placed in charge of creating and managing the Students@SC Discord server, which was used for most communication (excluding volunteer shifts) in the Students@SC program

Lead Student Volunteer

2021 ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis (SC) March 2021 - November 2021

Served as a Lead Student Volunteer at SC21. Prior to the conference, I was placed in charge of creating and managing the Students@SC Discord server and conceptualizing a virtual board game tournament to help in-person and remote Students@SC particpants interact socially.

Student Volunteer

2020 ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis (SC) November 2020

Served as a Student Volunteer at (the virtual) SC20. In this role, I helped ensure the virtual sessions of the conference ran smoothly. Additionally, I performed other miscellaneous tasks, such as keeping track of the number of attendees in the sessions at which I was working.

Student Volunteer

2019 ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis (SC) November 2019

Served as a Student Volunteer at SC19. In this role, I helped ensure the sessions of the conference ran smoothly. Additionally, I performed other miscellaneous tasks, such as keeping track of the number of attendees in the sessions at which I was working.

ACM Student Research Competition Participant

2024 ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis (SC) November 2024

Participated in the Graduate Student track of the ACM Student Research Competition, presenting my poster “Benchmarking and Modeling of Producer-Consumer Data Movement Performance in Scientific Workflows”

ACM Travel Grant

Association of Computing Machinery November 2024

Recieved a travel grant from ACM to participate in the Graduate Student track of the ACM Student Research Competition at SC24

PhD Forum Participant

2024 IEEE International Parallel & Distributed Processing Symposium May 2024

Participated in the PhD Forum at IPDPS 2024 and presented my poster “Empirical Study of Molecular Dynamics Workflow Data Movement: DYAD vs. Traditional I/O Systems”

ACM Student Research Competition Participant

2023 ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis (SC) November 2023

Participated in the Graduate Student track of the ACM Student Research Competition, presenting my poster “Enabling Transparent, High-Throughput Data Movement for Scientific Workflows on HPC Systems”

ACM Travel Grant

Association of Computing Machinery November 2023

Recieved a travel grant from ACM to participate in the Graduate Student track of the ACM Student Research Competition at SC23

NSF Travel Scholarship

National Science Foundation October 2022

Received a travel scholarship from NSF to attend the 2022 eScience Conference and present my paper “Enabling Call Path Querying in Hatchet to Identify Performance Bottlenecks in Scientific Applications”

ACM Student Research Competition: 1st Place (Undergraduate)

2020 ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis (SC) November 2020

Won first place in the Undergraduate Student track of the ACM Student Research Competition, presenting my poster “Enabling Graph-Based Profiling Analysis Using Hatchet”

Tennessee Fellowship for Graduate Excellence

University of Tennessee, Knoxville August 2020 - Present

Awarded the Tennessee Fellowship for Graduate Excellence at the University of Tennessee, Knoxville

		University of Tennessee 2020-Present PhD in Computer Science (High-Performance Computing Concentration)
		University of Tennessee 2016-2020 BSc in Computer Science

Ian Lumsden

Ian Lumsden

Graduate Research Assistant at Global Computing Lab

Experiences

University of Tennessee

Research Assistant

Responsibilities:

Lawrence Livermore National Laboratory

Graduate Computing Summer Intern

Responsibilities:

Graduate Computing Summer Intern

Responsibilities:

Graduate Computing Student Intern

Responsibilities:

Graduate Computing Student Intern

Responsibilities:

Undergraduate Computing Student Intern

Responsibilities:

Oak Ridge National Laboratory

HERE Intern

Responsibilities:

Skills

Git

C/C++

Python

Python Data Analysis Tools

Rust

Shell Scripting

Flux

CMake

Autotools

Education

University of Tennessee

PhD in Computer Science (High-Performance Computing Concentration)

University of Tennessee

BSc in Computer Science

Projects

DYnamic and Asynchronous Data Streamliner (DYAD)

Thicket

Hatchet

McVineGPU

SCADGen

ipywe

Papers

RAJA Performance Suite: Performance Portability Analysis with Caliper and Thicket

DYAD: Locality-aware Data Management for accelerating Deep Learning Training

Empirical Study of Molecular Dynamics Workflow Data Movement: DYAD vs. Traditional I/O Systems

Thicket: Seeing the Performance Experiment Forest for the Individual Run Trees

Ubique: A New Model for Untangling Inter-task Data Dependence in Complex HPC Workflows

Enabling Call Path Querying in Hatchet to Identify Performance Bottlenecks in Scientific Applications

Usability and Performance Improvements in Hatchet

Neutron Imaging Analysis using Jupyter Python Notebook

Recent Developments of MCViNE and its Applications at SNS

Grid-Based Volume Integration for Elasticity: Traction Boundary Integral Equation

Professional Services

Lead Student Volunteer

Student Volunteer

Lead Student Volunteer

Lead Student Volunteer

Student Volunteer

Student Volunteer

Achievements, Honors, and Scholarships

ACM Student Research Competition Participant

ACM Travel Grant

PhD Forum Participant

ACM Student Research Competition Participant

ACM Travel Grant

NSF Travel Scholarship

ACM Student Research Competition: 1st Place (Undergraduate)

Tennessee Fellowship for Graduate Excellence