I am a fifth-year Ph.D. candidate in Computer Science at Georgia Institute of Technology, working at SSLAB coadvised by Taesoo Kim and Anand Iyer. Before Georgia Tech, I graduated from The Chinese University of Hong Kong with a Bachelor's degree in Computer Science with first-class honours in 2019. My research interests include systems for deep graph learning and general machine learning. I am currectly exploring systems aspects of training/serving dynamic GNNs and GraphLLMs.
Existing systems for processing static GNNs either do not support dynamic GNNs, or are inefficient in doing so. In this project, we are building a system that supports dynamic GNNs efficiently. Based on the observation that existing proposals for dynamic GNN architectures combine techniques for structural and temporal information encoding independently, we propose novel techniques that enable cross optimizations across various tasks such as traffic forecasting, anomaly detection, and epidemiological forecasting.
May. 2021-present
Current systems to process dynamic graphs show a number of problems in terms of throughput of updates to the graph structure, latency to reflect new graph updates in the algorithmic result, the storage space needed for the dynamic graph. To tackle these problems, we proposed Cytom — a cell-based streaming graph processing engine for dynamic graphs — which is based on a subgraph-centric graph representation using a cell-based abstraction. This approach effectively reduces the storage overhead of state-of-the-art systems and allows for a highly parallel process when updating the graph structure.
Jan. 2020-Jul. 2021
In this project, we modeled input program as a hierarchical data flow graph (HDFG) to perform a set of graph-based operations and transformations for automatic optimization and parallelization. Automatic type inference was enabled based on both static and dynamic analysis. We also extended these techniques to data analytics applications using Pandas library. Moreover, a set of optimization rules were designed and implemented that can rewrite input code snippets to be more efficient in terms of I/O performance, memory footprint as well as computation workload.
Jan. 2020-May. 2021
We proposed a system for serving complex ML pipelines within end-to-end latency constraints. My contribution in this project is container-based model development for providing environment and resource isolation across models.
Sep. 2019-Dec. 2019
We supported distributed online analytical processing on Husky, which is a general-purpose distributed computing system developed by the system lab at CUHK. Specifically, we used Husky platform to implement the "By-layer" cubing algorithm in Apache Kylin for building data cube in a distributed manner. We also speed up query processing by optimizing query parsing and execution process. Code repo: https://github.com/husky-team/husky-kylin.
May. 2018-Apr. 2019
Mingyu Guan, Jack W. Stokes, Qinlong Luo, Fuchen Liu, Purvanshi Mehta, Elnaz Nouri, Taesoo Kim
arXiv preprint
arXiv:2402.13496, Feb 2024
Mingyu Guan, Anand Padmanabha Iyer, Taesoo Kim
GRADE-NDA'22: Proceedings of the 5th ACM SIGMOD Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA)
Philadelphia, PA, USA, June 2022