Author Image

Ian Lumsden

Ian Lumsden

Graduate Research Assistant at Global Computing Lab

I am a Ph.D. student in Computer Science advised by Dr. Michela Taufer at the University of Tennessee, Knoxville. I received my Bachelors in Computer Science in Spring 2020 from UTK. My research interests are developing novel techniques for HPC performance data analysis and developing tools to enable and enhance scientific computing workflows. Outside of HPC, I enjoy discovering new cultures, places, and people.

Experiences

1
University of Tennessee

May 2019 - Present

Knoxville, TN

Research Assistant

May 2019 - Present

Responsibilities:
  • Developing a benchmark to study data movement motifs in scientific workflows and assess the performance of data movement tools in these motifs
  • Developing novel techniques and tools based on Flux to assist in data movement for scientific workflows
  • Developing novel techniques to identify causes of interesting performance phenomena in High-Performance Computing applications using Hatchet
  • Examining the performance and implications of in-situ and in-transit data analysis of molecular dynamics simulations through the Analytics4MD project

Lawrence Livermore National Laboratory

May 2020 - Aug 2024

Livermore, CA

Graduate Computing Summer Intern

May 2024 - Aug 2024

Responsibilities:
  • Developed a benchmark to study data movement motifs in scientific workflows and assess the performance of data movement tools in these motifs
  • Conducted a performance study of different CPUs and GPUs with different types of memory using the RAJA Performance Suite with other members of the Thicket team
  • Refactored the topdown analysis service in LLNL’s Caliper performance monitoring tool and added support for Intel Sapphire Rapids CPUs
  • Developed Python bindings for Caliper and used them to develop performance annotations for LLNL’s Hatchet and Thicket tools
Graduate Computing Summer Intern

May 2023 - Aug 2023

Responsibilities:
  • Developed an approach to automatically detect all accessible levels of the storage hierarchy and model it as a bipartite graph
  • Integrated this approach into LLNL’s Dynamic and Asynchronous Data Streamliner (DYAD)
  • Improved the structure and performance of DYAD and the test suite developed during the previous summer
Graduate Computing Student Intern

May 2022 - Aug 2022

Responsibilities:
  • Augmented the workflow framework from the Analytics4MD project to make use of LLNL’s Dynamic and Asynchronous Data Streamliner (DYAD)
  • Created a test suite for DYAD based on workflows from the Analytics4MD project
  • Examined the performance of DYAD using our test suite and using the performance data collection tool PerfFlow Aspect
Graduate Computing Student Intern

Jun 2021 - Aug 2021

Responsibilities:
  • Developed a data analysis workflow that uses profiles of LLNL’s MARBL multi-physics simulation tool to predict what compiler MARBL should be built with to get the best performance for a particular simulation workload
Undergraduate Computing Student Intern

May 2020 - Aug 2020

Responsibilities:
  • Designed a new graph-based filtering query language for the Hatchet data analysis tool to enable relationship-based analysis of profiling data
  • Implemented the query language and integrated it into Hatchet’s data analysis capabilities
  • Used the query language to perform novel analysis of the performance of different MPI calls in HPC benchmark applications
  • Presented the work associated with this internship at the ACM Student Research Competition at the annual Supercomputing (SC) conference, where it won the 1st place award in the Undergraduate category
  • Presented an expanded version of this work at the ACM Student Research Competition Grand Finals
2

3
Oak Ridge National Laboratory

May 2017 - Dec 2018

Oak Ridge, TN

HERE Intern

May 2017 - Dec 2018

Responsibilities:
  • Developing GUIs using Python and JavaScript for user data analysis
  • Updating code to support both Python 2.7 and 3
  • Developing a Python package to convert XML representations of constructive solid geometry into OpenSCAD code for visualization and 3D-printing purposes
  • Parallelizing a Monte-Carlo neutron ray-tracing software package with CUDA

Skills

Education

University of Tennessee
2020-Present
PhD in Computer Science (High-Performance Computing Concentration)
University of Tennessee
2016-2020
BSc in Computer Science

Projects

DYnamic and Asynchronous Data Streamliner (DYAD)
DYnamic and Asynchronous Data Streamliner (DYAD)
Developer and Researcher May 2022 - Present

DYAD (part of the Flux project) is a tool that enables computational science workflows that are using sequential, PFS-based data movement to transition to more state-of-the-art in situ and in transit data movement. It enables this using the tooling provided by LLNL’s Flux resource manager. My work on DYAD includes examining its performance impact on workflows and improving DYAD’s data movement using networking libraries like UCX.

Thicket
Thicket
Developer and Researcher May 2022 - Present

Thicket is a Python-based toolkit for analyzing ensemble performance data. It is also built on top of Hatchet, allowing for the same benefits that Hatchet provides. My work on Thicket centers around the Call Path Query Language I created for Hatchet as well as general software engineering work.

Hatchet
Hatchet
Developer and Researcher May 2020 - Present

Hatchet is a Python library that enables users to analyze performance data generated by different HPC profilers. Its main advantage over other tools is that it is capable of ingesting data from different profilers into a common representation, allowing users to use the same code to analyze performance data from different sources. My work on Hatchet centers around the Call Path Query Language, which enables users to filter profiles based on caller-callee relationships.

McVineGPU
Developer May 2018 - December 2018

McVineGPU is a proof-of-concept implementation of a GPU-powered version of MCViNE (a ray-tracing neutron scattering experiment simulation software).

SCADGen
Developer January 2018 - May 2018

SCADGen is a tool for converting XMLs used by MCViNE to represent constructive solid geometry into OpenSCAD code for visualization and STL generation purposes.

ipywe
ipywe
Developer May 2017 - August 2017

ipywe is a library that provides a set of widgets/GUIs for performing neutron scattering data analysis within Jupyter notebooks.

Papers

Maintaining performant code in a world of fast-evolving computer architectures and programming models poses a significant challenge to scientists. Typically, benchmark codes are used to model some aspects of a large application code’s performance, and are easier to build and run. Such benchmarks can help assess the effects of code or algorithm changes, system updates, and new hardware. However, most performance benchmarks are not written using a wide range of GPU programming models. The RAJA Performance Suite provides a comprehensive set of computational kernels implemented in a variety of programming models. We integrated the performance measurement and analysis tools Caliper and Thicket into the RAJAPerf to facilitate performance comparison across kernel implementations and architectures. This paper describes the RAJAPerf, performance metrics that can be collected, and experimental analysis with case studies.

Deep Learning (DL) is increasingly applied across various fields to solve complex scientific challenges in modern high-performance computing (HPC) systems that are beyond the reach of traditional algorithms. Training DL models for scientific applications involves processing multi-terabyte datasets in each epoch. The data access behavior during DL training exposes optimization opportunities to cache these datasets in near-compute storage accelerators in HPC systems, enhancing I/O throughput. However, current middleware solutions employ near-compute storage accelerators primarily as exclusive caches, which limits the effectiveness of cache access locality. To address this problem, we introduce DYAD, a system designed to maximize sample locality in the cache, thereby significantly increasing I/O throughput in HPC systems.DYAD optimizes I/O for DL training based on three key features. First, DYAD boosts inter-node access speeds by using a novel streaming RPC with RDMA protocol, achieving a 1.25x performance gain over state-of-the-art solutions. Second, DYAD further enhances inter-node access by coordinating data movement, which mitigates network congestion and increases throughput for inter-node accesses by up to 8.78x. Last, DYAD uses smart metadata caching that outperforms traditional global metadata access methods by several orders of magnitude in terms of lookup throughput. We demonstrate how DYAD accelerates large-scale DL training on a high-end HPC cluster with 512 GPUs by up to 10.82x faster epochs compared to UnifyFS by performing locality-aware caching on near-compute storage accelerators.

This experimental work examines data movement in molecular dynamics (MD) workflows, comparing the Dynamic and Asynchronous Data Streamliner (DYAD) middleware with traditional, industry-standard I/O systems such as XFS and Lustre. DYAD moves MD simulation frames to analytics processes, providing enhanced flexibility and efficiency for dynamic data transfers and in situ analytics. At the same time, traditional I/O storage systems provide durability and scalability for high-performance computing (HPC) systems. The study integrates MD workflows with common simulation codes, facilitating immediate capture and transfer of MD frames to a staging area. It explores various molecular models, from simple to complex, assessing data management performance and scalability. Different producer-consumer pairs, molecular models, and data transaction frequency enable testing across small to large-scale HPC scenarios, from single-node configurations to large, distributed environments. The findings reveal that adaptive mechanisms for minimizing synchronization, direct network communication between producer and consumer processes, and optimizations of both data movement and synchronization are crucial for performance and scalability in MD workflows.

Thicket is an open-source Python toolkit for Exploratory Data Analysis (EDA) of multi-run performance experiments. It enables an understanding of optimal performance configuration for large-scale application codes. Most performance tools focus on a single execution (e.g., single platform, single measurement tool, single scale). Thicket bridges the gap to convenient analysis in multi-dimensional, multi-scale, multi-architecture, and multi-tool performance datasets by providing an interface for interacting with the performance data. Thicket has a modular structure composed of three components. The first component is a data structure for multi-dimensional performance data, which is composed automatically on the portable basis of call trees, and accommodates any subset of dimensions present in the dataset. The second is the metadata, enabling distinction and sub-selection of dimensions in performance data. The third is a dimensionality reduction mechanism, enabling analysis such as computing aggregated statistics on a given data dimension. Extensible mechanisms are available for applying analyses (e.g., top-down on Intel CPUs), data science techniques (e.g., K-means clustering from scikit-learn), modeling performance (e.g., Extra-P), and interactive visualization. We demonstrate the power and flexibility of Thicket through two case studies, first with the open-source RAJA Performance Suite on CPU and GPU clusters and another with a large physics simulation run on both a traditional HPC cluster and an AWS Parallel Cluster instance.

Ubique: A New Model for Untangling Inter-task Data Dependence in Complex HPC Workflows

Exploiting task parallelism is getting increasingly difficult for diverse and complex scientific workflows running on High Performance Computing (HPC) systems. In this paper, we argue that the difficulty rises from a void in the spectrum of existing data-transfer models for resolving inter-task data dependence within a workflow and propose a novel model to fill that gap: Ubique. The Ubique model combines the best from in-transit and in situ models in order for loosely coupled producer and consumer tasks to run concurrently and to resolve their data dependencies efficiently with little or no modifications to their codes, striking a balance between transparent optimization, productivity, and performance. Our preliminary evaluation suggests that Ubique can significantly outperform the parallel file system (PFS)-based model while offering automatic data transfer and synchronization which are the features lacking in many traditional models. It also identifies the performance characteristics of its key depending subsystems, which must be understood for further broadening its benefits.

Enabling Call Path Querying in Hatchet to Identify Performance Bottlenecks in Scientific Applications

As computational science applications benefit from larger-scale, more heterogeneous high performance computing (HPC) systems, the process of studying their performance becomes increasingly complex. The performance data analysis library Hatchet provides some insights into this complexity, but is currently limited in its analysis capabilities. Missing capabilities include the handling of relational caller-callee data captured by HPC profilers. To address this shortcoming, we augment Hatchet with a Call Path Query Language that leverages relational data in the performance analysis of scientific applications. Specifically, our Query Language enables data reduction using call path pattern matching. We demonstrate the effectiveness of our Query Language in identifying performance bottlenecks and enhancing Hatchet’s analysis capabilities through three case studies. In the first case study, we compare the performance of sequential and multi-threaded versions of the graph alignment application Fido. In doing so, we identify the existence of large memory inefficiencies in both versions. In the second case study, we examine the performance of MPI calls in the linear algebra mini-application AMG2013 when using MVAPICH and Spectrum-MPI. In doing so, we identify hidden performance losses in specific MPI functions. In the third case study, we illustrate the use of our Query Language in Hatchet’s interactive visualization. In doing so, we show that our Query Language enables a simple and intuitive way to massively reduce profiling data.

Performance analysis is critical for pinpointing bottlenecks in parallel applications. Several profilers exist to instrument parallel programs on HPC systems and gather performance data. Hatchet is an open-source Python library that can read profiling output of several tools, and enables the user to perform a variety of programmatic analyses on hierarchical performance profiles. In this paper, we augment Hatchet to support new features: a query language for representing call path patterns that can be used to filter a calling context tree, visualization support for displaying and interacting with performance profiles, and new operations for performing analyses on multiple datasets. Additionally, we present performance optimizations in Hatchet’s HPCToolkit reader and the unify operation to enable scalable analysis of large datasets.

Neutron Imaging Analysis using Jupyter Python Notebook

Independently of the image modality (x-rays, neutrons, etc), image data analysis requires normalization, a preprocessing step. While the normalization can sometimes easily be generalized, the analysis is, in most cases, specific to an experiment and a sample. Although many tools (MATLAB, ImageJ, VG Studio…) offer a large collection of pre-programmed image analysis tools, they usually require a learning step that can be lengthy depending on the skills of the end user. We have implemented Jupyter Python notebooks to allow easy and straightforward data analysis, along with live interaction with the data. Jupyter notebooks require little programming knowledge and the steep learning curve is bypassed. Most importantly, each notebook can be tailored to a specific experiment and sample with minimized effort. Here, we present the pros and cons of the main methods to analyse data and show the reason why we have found that Jupyter Python notebooks are well suited for imaging data processing, visualization and analysis.

MCViNE is an open source, object-oriented Monte Carlo neutron ray-tracing simulation software package. Its design allows for flexible, hierarchical representations of sophisticated instrument components such as detector systems, and samples with a variety of shapes and scattering kernels. Recently this flexible design has enabled several applications of MCViNE simulations at the Spallation Neutron Source (SNS) at Oak Ridge National Lab, including assisting design of neutron instruments at the second target station and design of novel sample environments, as well as studying effects of instrument resolution and multiple scattering. Here we provide an overview of the recent developments and new features of MCViNE since its initial introduction (Jiao et al 2016 Nucl. Instrum. Methods Phys. Res., Sect. A 810, 86–99), and some example applications.

Grid-Based Volume Integration for Elasticity: Traction Boundary Integral Equation

A volume integral algorithm for the non-homogeneous elasticity traction boundary integral equation is presented. The body force volume integral is exactly split into a relatively simple boundary integral, together with a remainder volume integral that can be evaluated using a regular grid of cuboid cells covering the problem domain. Of particular importance for (inelastic) fracture analysis is that the volume integral over the regular grid is computed without explicit knowledge of the domain boundary, including the fracture surface. A Galerkin approximation is employed, and the numerical implementation is validated by solving body force elasticity problems with known solutions.

Professional Services

Serving as a Lead Student Volunteer at SC24. Prior to the conference, I am working with the Posters Committee to help organize the Posters track of the conference.

Served as a Student Volunteer at SC23. In this role, I helped ensure the sessions of the conference ran smoothly. Additionally, I performed other miscellaneous tasks, such as keeping track of the number of attendees in the sessions at which I was working.

Served as a Lead Student Volunteer at SC22. Prior to the conference, I was placed in charge of creating and managing the Students@SC Discord server, which was used for most communication (excluding volunteer shifts) in the Students@SC program

Served as a Lead Student Volunteer at SC21. Prior to the conference, I was placed in charge of creating and managing the Students@SC Discord server and conceptualizing a virtual board game tournament to help in-person and remote Students@SC particpants interact socially.

Served as a Student Volunteer at (the virtual) SC20. In this role, I helped ensure the virtual sessions of the conference ran smoothly. Additionally, I performed other miscellaneous tasks, such as keeping track of the number of attendees in the sessions at which I was working.

Served as a Student Volunteer at SC19. In this role, I helped ensure the sessions of the conference ran smoothly. Additionally, I performed other miscellaneous tasks, such as keeping track of the number of attendees in the sessions at which I was working.

Achievements, Honors, and Scholarships

Participated in the Graduate Student track of the ACM Student Research Competition, presenting my poster “Benchmarking and Modeling of Producer-Consumer Data Movement Performance in Scientific Workflows”

ACM Travel Grant

Recieved a travel grant from ACM to participate in the Graduate Student track of the ACM Student Research Competition at SC24

Participated in the PhD Forum at IPDPS 2024 and presented my poster “Empirical Study of Molecular Dynamics Workflow Data Movement: DYAD vs. Traditional I/O Systems”

Participated in the Graduate Student track of the ACM Student Research Competition, presenting my poster “Enabling Transparent, High-Throughput Data Movement for Scientific Workflows on HPC Systems”

ACM Travel Grant

Recieved a travel grant from ACM to participate in the Graduate Student track of the ACM Student Research Competition at SC23

NSF Travel Scholarship

Received a travel scholarship from NSF to attend the 2022 eScience Conference and present my paper “Enabling Call Path Querying in Hatchet to Identify Performance Bottlenecks in Scientific Applications”

Won first place in the Undergraduate Student track of the ACM Student Research Competition, presenting my poster “Enabling Graph-Based Profiling Analysis Using Hatchet”

Tennessee Fellowship for Graduate Excellence

Awarded the Tennessee Fellowship for Graduate Excellence at the University of Tennessee, Knoxville