About Me

I am a fifth-year Ph.D. candidate in Computer Science at Georgia Institute of Technology, working at SSLAB coadvised by Taesoo Kim and Anand Iyer. Before Georgia Tech, I graduated from The Chinese University of Hong Kong with a Bachelor's degree in Computer Science with first-class honours in 2019. My research interests include systems for deep graph learning and general machine learning. I am currectly exploring systems aspects of training/serving dynamic GNNs and GraphLLMs.


Selected Projects

System for Dynamic Graph Neural Networks at Scale

Existing systems for processing static GNNs either do not support dynamic GNNs, or are inefficient in doing so. In this project, we are building a system that supports dynamic GNNs efficiently. Based on the observation that existing proposals for dynamic GNN architectures combine techniques for structural and temporal information encoding independently, we propose novel techniques that enable cross optimizations across various tasks such as traffic forecasting, anomaly detection, and epidemiological forecasting.

May. 2021-present


Cytom: Processing Billion-scale Dynamic Graphs on a Single Machine

Current systems to process dynamic graphs show a number of problems in terms of throughput of updates to the graph structure, latency to reflect new graph updates in the algorithmic result, the storage space needed for the dynamic graph. To tackle these problems, we proposed Cytom — a cell-based streaming graph processing engine for dynamic graphs — which is based on a subgraph-centric graph representation using a cell-based abstraction. This approach effectively reduces the storage overhead of state-of-the-art systems and allows for a highly parallel process when updating the graph structure.

Jan. 2020-Jul. 2021


Automating Massively Parallel Heterogeneous Computing

In this project, we modeled input program as a hierarchical data flow graph (HDFG) to perform a set of graph-based operations and transformations for automatic optimization and parallelization. Automatic type inference was enabled based on both static and dynamic analysis. We also extended these techniques to data analytics applications using Pandas library. Moreover, a set of optimization rules were designed and implemented that can rewrite input code snippets to be more efficient in terms of I/O performance, memory footprint as well as computation workload.

Jan. 2020-May. 2021


System for Serving ML Inference Pipelines with Heterogenous Models

We proposed a system for serving complex ML pipelines within end-to-end latency constraints. My contribution in this project is container-based model development for providing environment and resource isolation across models.

Sep. 2019-Dec. 2019


Distributed Online Analytical Processing (OLAP)

We supported distributed online analytical processing on Husky, which is a general-purpose distributed computing system developed by the system lab at CUHK. Specifically, we used Husky platform to implement the "By-layer" cubing algorithm in Apache Kylin for building data cube in a distributed manner. We also speed up query processing by optimizing query parsing and execution process. Code repo: https://github.com/husky-team/husky-kylin.

May. 2018-Apr. 2019


Publications and Preprints

  1. HetTree: Heterogeneous Tree Graph Neural Network. [paper]

  2. Mingyu Guan, Jack W. Stokes, Qinlong Luo, Fuchen Liu, Purvanshi Mehta, Elnaz Nouri, Taesoo Kim
    arXiv preprint
    arXiv:2402.13496, Feb 2024

  3. DynaGraph: Dynamic Graph Neural Networks at Scale [paper]

  4. Mingyu Guan, Anand Padmanabha Iyer, Taesoo Kim
    GRADE-NDA'22: Proceedings of the 5th ACM SIGMOD Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA)
    Philadelphia, PA, USA, June 2022


Services

  • External Review Committee, 2024 USENIX Annual Technical Conference (ATC '24).