Posts by Collection



What is the state of memory saving for model training?

Published in Sixth Conference on Machine Learning and Systems (under submission), 2009

Large neural networks can improve the accuracy and generalization on tasks across many domains. However, this trend cannot continue indefinitely due to limited hardware memory. As a result, researchers have devised a number of memory saving methods (MOMs) to alleviate the memory bottleneck, such as gradient checkpointing, quantization, and swapping. In this work, we study memory saving methods and show that, although these strategies indeed lower peak memory usage, they can actually decrease training throughput by up to 9.3×. To provide practical guidelines for practitioners, we propose a simple but effective performance model PAPAYA to quantitatively explain the memory and training time trade-off. PAPAYA can be used to determine when to apply the various memory optimization methods in training different models. We outline the circumstances in which memory saving techniques are more advantageous based on derived implications from PAPAYA. We assess the accuracy of PAPAYA and the derived implications on a variety of machine models, showing that it achieves over 0.97 R score on predicting the peak memory/throughput, and accurately predicts the effectiveness of MOMs across five evaluated models on vision and NLP tasks.

Recommended citation: Xiaoxuan Liu, Chuyan Zhu, Jialun Lyu, Zhuohan Li, Xiaoyang Liu, Daniel Kang, Alvin Cheung



CSC3150 Opearating Systems

Undergraduate course, SDS, The Chinese University of Hong Kong, 2022

I was part of the teaching team of CSC3150 Operating System instructed by Professor Yeh-Ching Chung as a undergraduate teaching fellow at CUHK, Shenzhen, China. The course introduces the architectures and functions of operating systems. By looking at the overall architecture and individual components of different types of operating systems, major design issues, algorithms and design trade-off are discussed. I am responsible for teaching the tutorials, holding office hours, and project grading.

14-741 Information Security

Graduate course, Carnegie Mellon University, 2024

I am part of the teaching team of 14-741 Information Security at CMU as a teaching assitant. The course covers technical and policy foundations of information security, with main objective to enable students to reason about information systems from a security engineering perspective. Topics include elementary cryptography; access control; common software vulnerabilities; common network vulnerabilities; digital rights management; policy and export control law; privacy; management and assurance; and special topics. I am responsible for Office Hours, grading, and recitations.