I am a Ph.D. student in Computer Science advised by Dr. Michela Taufer at the University of Tennessee, Knoxville. I received my Bachelors in Computer Science in Spring 2020 from UTK. My research interests are developing novel techniques for HPC performance data analysis and developing tools to enable and enhance scientific computing workflows. Outside of HPC, I enjoy discovering new cultures, places, and people.
May 2019 - Present, Knoxville, TN
May 2020 - Aug 2023, Livermore, CA
May 2023 - Aug 2023
May 2022 - Aug 2022
Jun 2021 - Aug 2021
May 2020 - Aug 2020
May 2017 - Dec 2018, Oak Ridge, TN
University of Tennessee2020-Present PhD in Computer Science (High-Performance Computing Concentration) | ||
University of Tennessee2016-2020 BSc in Computer Science |
DYAD (part of the Flux project) is a tool that enables computational science workflows that are using sequential, PFS-based data movement to transition to more state-of-the-art in situ and in transit data movement. It enables this using the tooling provided by LLNL’s Flux resource manager. My work on DYAD includes examining its performance impact on workflows and improving DYAD’s data movement using networking libraries like UCX.
Thicket is a Python-based toolkit for analyzing ensemble performance data. It is also built on top of Hatchet, allowing for the same benefits that Hatchet provides. My work on Thicket centers around the Call Path Query Language I created for Hatchet as well as general software engineering work.
Hatchet is a Python library that enables users to analyze performance data generated by different HPC profilers. Its main advantage over other tools is that it is capable of ingesting data from different profilers into a common representation, allowing users to use the same code to analyze performance data from different sources. My work on Hatchet centers around the Call Path Query Language, which enables users to filter profiles based on caller-callee relationships.
McVineGPU is a proof-of-concept implementation of a GPU-powered version of MCViNE (a ray-tracing neutron scattering experiment simulation software).
SCADGen is a tool for converting XMLs used by MCViNE to represent constructive solid geometry into OpenSCAD code for visualization and STL generation purposes.
ipywe is a library that provides a set of widgets/GUIs for performing neutron scattering data analysis within Jupyter notebooks.
Thicket is an open-source Python toolkit for Exploratory Data Analysis (EDA) of multi-run performance experiments. It enables an understanding of optimal performance configuration for large-scale application codes. Most performance tools focus on a single execution (e.g., single platform, single measurement tool, single scale). Thicket bridges the gap to convenient analysis in multi-dimensional, multi-scale, multi-architecture, and multi-tool performance datasets by providing an interface for interacting with the performance data. Thicket has a modular structure composed of three components. The first component is a data structure for multi-dimensional performance data, which is composed automatically on the portable basis of call trees, and accommodates any subset of dimensions present in the dataset. The second is the metadata, enabling distinction and sub-selection of dimensions in performance data. The third is a dimensionality reduction mechanism, enabling analysis such as computing aggregated statistics on a given data dimension. Extensible mechanisms are available for applying analyses (e.g., top-down on Intel CPUs), data science techniques (e.g., K-means clustering from scikit-learn), modeling performance (e.g., Extra-P), and interactive visualization. We demonstrate the power and flexibility of Thicket through two case studies, first with the open-source RAJA Performance Suite on CPU and GPU clusters and another with a large physics simulation run on both a traditional HPC cluster and an AWS Parallel Cluster instance.
Exploiting task parallelism is getting increasingly difficult for diverse and complex scientific workflows running on High Performance Computing (HPC) systems. In this paper, we argue that the difficulty rises from a void in the spectrum of existing data-transfer models for resolving inter-task data dependence within a workflow and propose a novel model to fill that gap: Ubique. The Ubique model combines the best from in-transit and in situ models in order for loosely coupled producer and consumer tasks to run concurrently and to resolve their data dependencies efficiently with little or no modifications to their codes, striking a balance between transparent optimization, productivity, and performance. Our preliminary evaluation suggests that Ubique can significantly outperform the parallel file system (PFS)-based model while offering automatic data transfer and synchronization which are the features lacking in many traditional models. It also identifies the performance characteristics of its key depending subsystems, which must be understood for further broadening its benefits.
As computational science applications benefit from larger-scale, more heterogeneous high performance computing (HPC) systems, the process of studying their performance becomes increasingly complex. The performance data analysis library Hatchet provides some insights into this complexity, but is currently limited in its analysis capabilities. Missing capabilities include the handling of relational caller-callee data captured by HPC profilers. To address this shortcoming, we augment Hatchet with a Call Path Query Language that leverages relational data in the performance analysis of scientific applications. Specifically, our Query Language enables data reduction using call path pattern matching. We demonstrate the effectiveness of our Query Language in identifying performance bottlenecks and enhancing Hatchet’s analysis capabilities through three case studies. In the first case study, we compare the performance of sequential and multi-threaded versions of the graph alignment application Fido. In doing so, we identify the existence of large memory inefficiencies in both versions. In the second case study, we examine the performance of MPI calls in the linear algebra mini-application AMG2013 when using MVAPICH and Spectrum-MPI. In doing so, we identify hidden performance losses in specific MPI functions. In the third case study, we illustrate the use of our Query Language in Hatchet’s interactive visualization. In doing so, we show that our Query Language enables a simple and intuitive way to massively reduce profiling data.
Performance analysis is critical for pinpointing bottlenecks in parallel applications. Several profilers exist to instrument parallel programs on HPC systems and gather performance data. Hatchet is an open-source Python library that can read profiling output of several tools, and enables the user to perform a variety of programmatic analyses on hierarchical performance profiles. In this paper, we augment Hatchet to support new features: a query language for representing call path patterns that can be used to filter a calling context tree, visualization support for displaying and interacting with performance profiles, and new operations for performing analyses on multiple datasets. Additionally, we present performance optimizations in Hatchet’s HPCToolkit reader and the unify operation to enable scalable analysis of large datasets.
Independently of the image modality (x-rays, neutrons, etc), image data analysis requires normalization, a preprocessing step. While the normalization can sometimes easily be generalized, the analysis is, in most cases, specific to an experiment and a sample. Although many tools (MATLAB, ImageJ, VG Studio…) offer a large collection of pre-programmed image analysis tools, they usually require a learning step that can be lengthy depending on the skills of the end user. We have implemented Jupyter Python notebooks to allow easy and straightforward data analysis, along with live interaction with the data. Jupyter notebooks require little programming knowledge and the steep learning curve is bypassed. Most importantly, each notebook can be tailored to a specific experiment and sample with minimized effort. Here, we present the pros and cons of the main methods to analyse data and show the reason why we have found that Jupyter Python notebooks are well suited for imaging data processing, visualization and analysis.
MCViNE is an open source, object-oriented Monte Carlo neutron ray-tracing simulation software package. Its design allows for flexible, hierarchical representations of sophisticated instrument components such as detector systems, and samples with a variety of shapes and scattering kernels. Recently this flexible design has enabled several applications of MCViNE simulations at the Spallation Neutron Source (SNS) at Oak Ridge National Lab, including assisting design of neutron instruments at the second target station and design of novel sample environments, as well as studying effects of instrument resolution and multiple scattering. Here we provide an overview of the recent developments and new features of MCViNE since its initial introduction (Jiao et al 2016 Nucl. Instrum. Methods Phys. Res., Sect. A 810, 86–99), and some example applications.
A volume integral algorithm for the non-homogeneous elasticity traction boundary integral equation is presented. The body force volume integral is exactly split into a relatively simple boundary integral, together with a remainder volume integral that can be evaluated using a regular grid of cuboid cells covering the problem domain. Of particular importance for (inelastic) fracture analysis is that the volume integral over the regular grid is computed without explicit knowledge of the domain boundary, including the fracture surface. A Galerkin approximation is employed, and the numerical implementation is validated by solving body force elasticity problems with known solutions.
Serving as a Lead Student Volunteer at SC24. Prior to the conference, I am working with the Posters Committee to help organize the Posters track of the conference.
Served as a Student Volunteer at SC23. In this role, I helped ensure the sessions of the conference ran smoothly. Additionally, I performed other miscellaneous tasks, such as keeping track of the number of attendees in the sessions at which I was working.
Served as a Lead Student Volunteer at SC22. Prior to the conference, I was placed in charge of creating and managing the Students@SC Discord server, which was used for most communication (excluding volunteer shifts) in the Students@SC program
Served as a Lead Student Volunteer at SC21. Prior to the conference, I was placed in charge of creating and managing the Students@SC Discord server and conceptualizing a virtual board game tournament to help in-person and remote Students@SC particpants interact socially.
Served as a Student Volunteer at (the virtual) SC20. In this role, I helped ensure the virtual sessions of the conference ran smoothly. Additionally, I performed other miscellaneous tasks, such as keeping track of the number of attendees in the sessions at which I was working.
Served as a Student Volunteer at SC19. In this role, I helped ensure the sessions of the conference ran smoothly. Additionally, I performed other miscellaneous tasks, such as keeping track of the number of attendees in the sessions at which I was working.