Princeton University
Computer Science Department

Computer Science 598D

Systems and Machine Learning

Spring 2021



In this graduate seminar, we plan to read and discuss recent research papers in the intersection between systems and machine learning (ML). We plan to focus on papers in two categories: systems for ML and ML for systems. The category of systems for ML includes papers about building efficient hardware such as accelerators and software systems such as parallel, distributed, secure, privacy-preserving systems for ML. The category of ML for systems include papers on how to apply ML approaches to software and hardware designs including new data structures and optimization methods.


This course is open to graduate students. For seniors who are interested in taking this course, it requires permission from the instructors.  This course has no P/D/F option. All students are required to present papers and participate discussions, and complete three programming assignments and a final project.   For final projects, students have the option to work in a small team of two and produce a project report and a project presentation.

Administrative Information

Computing Resources

We suggest students using Microsoft Azure computational resources to complete assignments and projects, as Microsoft provides educational credits for this course. The course projects require training deep neural networks. Finishing the projects may consume substantial amount GPU hours, which may not be feasible on free cloud service.

·      Microsoft Azure. We will provide each student a certain amount of Azure credits. If you need more credits, you can send a request to Dr. Xiaoxiao Li (xl32@princeton.edu). Please carefully manage the instance on Azure and stop the instance that is not running.  Here is the Azure tutorial: https://azure.microsoft.com/en-us/get-started/. Additional tips of using Azure will be provided in the courses.


Reading and Presentations

During each class meeting, we will have either a lecture by the instructors or invited speakers or presentations and discussions of two selected papers. 


Each student will write a very brief review for each paper (one or two sentences to summarize the paper, one or two sentences to summarize the strengths, weaknesses and future directions of the paper). Please download the template here.


Each paper will take a 40-minute time slot. To motivate students to read and discuss papers in depth, we ask 2-3 students to be responsible for three components of presentations and discussions, each by an individual student with a few slides:

·         Summary: discuss the problem statement, a brief overview of the state-of-the-art, and the key idea of the paper.

·         Strengths: a summary of the strengths of the paper.

·         Weaknesses and future directions: a summary of the weaknesses and future directions of the paper.

We suggest that summary presentation takes about 20 minutes, the discussions about strengths, weaknesses and future directions take about 20 minutes.  The three students should manage their time well and serve as a “panel” for the discussions of the paper.


In the tentative schedule below, we have planned the topics for each week and suggested papers.  The 2-3 students who signed up for a paper time slot can discuss with the instructors to select a different paper on the same topic and announce the paper in advance.


Warmup exercise

To get familiar with ML projects, we require each student to do a small warmup project to reproduce the results using LeNet-5 in a paper:


Gradient-Based Learning Applied to Document Recognition. Y. LeCun, L. Bottou, Y. Bengio and P. Haffner:, Proceedings of the IEEE, 1998. 


Feel free to use any deep learning framework you are familiar with. A warm up project about MNIST classification is available.  Our assignments examples are all in Pytorch.


MNIST dataset will be automatic downloaded if using  MNIST classification. If you use other platforms and need to download data separately, uou can download the MNIST dataset (by Yann LeCun, Corinna Cortes and Christopher Burges).


You are welcome to use other online help or getting help from other students.

You are encouraged to launch the job on Microsoft Azure and get familiar with the Microsoft Azure.

Programming Assignment 1 (Systems for ML)

There are two options for this programming assignment.  To maximize learning and minimize programming efforts for students, each option is related to a recently published paper that will be presented and discussed in the class and each has open-source codes.  You need to select one option and complete the assignment based on its requirements.  The two options are:

·      System for ML: Network pruning  (please click to see details)

·      System for ML: Binary ConvNet (please click to see details)

The detailed requirements for each option will be provided before the assignment starts.

Programming Assignment 2 (ML for Systems)

Similar to programming assignment 1, there are two options for this programming assignment.  Each option is also related to a recently published paper that will be presented and discussed in the class and each has open-source codes.  You need to select one option and complete the assignment based on its requirements.  The two options are:


·      ML for System: Auto Neural Arch Search (please click to see details)

·      ML for System: Adaptive learned Bloom filter (please click to see details)

The detailed requirements for each option will be provided before the assignment starts.


Final Project

For the final project, students will improve or investigate some future directions of one of the two programming assignments above.  Final project can be done by either one or two students.   Each student or each team of students should submit a brief project proposal, a final report and give a 10-minute final presentation.  The final reports are due at 11:59pm of the Dean's day, which is the deadline of the university.  The final presentations will be scheduled soon after Dean’s day.


For a two-student team, we suggest that you propose something more significant.  We expect you to state clearly who did what in the final report.


This graduate seminar will be graded roughly as follows:

Schedule (Tentative)




Suggested papers




Prof. Kai Li

Dr. Xiaoxiao Li


A New Golden Age in Computer Architecture: Empowering the Machine- Learning Revolution.
Jeff Dean, David Patterson, and Cliff Young, IEEE Micro, 38(2), 21-29.


Systems and Machine Learning Symbiosis (invited talk).

Jeff Dean.  SysML Conference. 2018



Prof. Jia Deng, Princeton
(Guest lecture)

Introduction to deep learning


Deep Learning.
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton, Nature 521, May 2015.


Online tutorial




Dr. Xiaoxiao Li

Introduction to deep learning framework

Video (you may need to log in)


Introduction to Pytorch and Tensorflow

Azure Tutorial/Demo

Start Warmup


Prof. Karthik Narasimhan, Princeton

(Guest lecture)

Introduction to reinforcement



Human-level control through deep reinforcement learning.

Mnih, V. et al, Nature, 2015.  (Earlier version).


Reinforcement Learning: An Introduction. (book)
Richard S. Sutton and Andrew G. Barto. MIT Press 2018.




Prof. Kai Li

Introduction to Systems for ML:

DistBelief, TensorFlow, XLA


Large Scale Distributed Deep Networks.

Jeffrey Dean, et al. NIPS 2012.

TensorFlow: A System for Large-Scale Machine Learning.

Martín Abadi, et al. OSDI 2016

XLA: Optimizing Compiler for Machine Learning.




Dr. Xiaoxiao Li


Network pruning


The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks, Jonathan Frankle, Michael Carbin. ICLR 2019. 


SNIP: Single-shot Network Pruning based on Connection Sensitivity, Lee et al. ICLR 2019


Submit Warmup


Start assignment (Systems for ML)


Felix, Alexander,


Josh, Grace


Network Pruning

Recording unavailable

Picking winning tickets before training by preserving gradient flow. Wang et al. 2020


Pruning neural networks without any data by iteratively conserving synaptic flow. Tanaka et al. 2020.

Submit Review


Dr. Zhao Song, Princeton

(guest lecture)

Learned Data Structures


Learning Space Partitions for Nearest Neighbor Search

Yihe Dong, Piotr Indyk, Ilya Razenshteyn, Tal Wagner, ICLR 2020


A Model for Learned Bloom Filters and Optimizing by Sandwiching. 

Michael Mitzenmacher, et al. NIPS 2018.


Kaiqi, Dongsheng,




Yi, Juan

Learned Data Structures


Learning Multi-dimensional Indexes.

Vikram Nathan, Jialin Ding, Mohammad Alizadeh, Tim Kraska

SIGMOD 2020.


ALEX: An Updatable Adaptive Learned Index

Jialin Ding et al. SIGMOD 2020.

Submit Review (click here, ddl is before the class)


Samyak, Juan,


Josh, Felix



Neural Architecture Search with Reinforcement Learning

Barret Zoph Quoc V. Le. ICLR (2017)


Darts: Differentiable architecture search. Liu, H., Simonyan, K. and Yang, Y., 2018.  

Submit Review


Dr. Safeen Huda,

Google Brain

(Guest lecture)

Google TPU


In-Datacenter Performance Analysis of a Tensor Processing Unit.  N. Jouppi et al. ISCA 2017


A domain-specific supercomputer for training deep neural networks N. Jouppi et al. CACM 2020


The Design Process for Google's Training Chips: TPUv2 and TPUv3.  T. Morrie, et al. IEEE MICRO 2021.


GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding.

D. Lepikhin, at al. 2021 (under review)

Submit assignment (Systems for ML)

Start assignment (ML for systems)


Samyak, Dongsheng



Yue, Yi

Computer Architecture


High-Performance Deep-Learning Coprocessor Integrated into x86 SoC with Server-Class CPUs

Glenn Henry, et al 2020.


The Architectural Implications of Facebook’s DNN-based Personalized Recommendation

Gupta et al. 2020.

Submit Review


Kelvin Zou, ByteDance

&Princeton Alumnus
(Guest lecture)

Systems at ByteDance


A Generic Communication Scheduler for Distributed DNN Training Acceleration.
Yanghua Peng (The University of Hong Kong), Yibo Zhu (ByteDance Inc.), Yangrui Chen (The University of Hong Kong), Yixin Bao (The University of Hong Kong), Bairen Yi (ByteDance Inc.), Chang Lan (ByteDance Inc.), Chuan Wu (The University of Hong Kong), Chuanxiong Guo (ByteDance Inc.) SOSP 2019.

GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism.
Yanping Huang, Youlong Cheng, Ankur Bapna, Orhan Firat, Mia Xu Chen, Dehao Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V. Le, Yonghui Wu, Zhifeng Chen.
NeurIPS 2019.


Mesh-TensorFlow: Deep Learning for Supercomputers.

Noam Shazeer, Youlong Cheng, Niki Parmar, Dustin Tran, Ashish Vaswani, Penporn Koanantakool, Peter Hawkins, HyoukJoong Lee, Mingsheng Hong, Cliff Young, Ryan Sepassi, Blake Hechtman. NeurIPS 2018.


IPS: Unified Profile Management for Ubiquitous Online Recommendations.
Rui Shi, Yang Liu, Jianjun Chen, Xuan Zou, Yanbin Chen, Minghua Fan, Zhihao Cai
Guanghui Zhang, Zhiwen Li, Yuming Liang. ICDE 2021.



Dr. Xiaoxiao Li

Introduction to Federated Learning


Federated learning: Collaborative machine learning without centralized training data.
H. Brendan McMahan and Daniel Ramage., 2017.


Inverting Gradients - How easy is it to break privacy in federated learning? (Privacy)

Jonas Geiping, Hartmut Bauermeister, Hannah Dröge, Michael Moeller, NeurIPS 2020




Juan, Kaiqi,


Samyak, Alexander,

Privacy Preservation


Membership inference attacks against machine learning models. Shokri, Reza, et al.

2017 IEEE Symposium on Security and Privacy (SP). IEEE, 2017.


The secret sharer: Evaluating and testing unintended memorization in neural networks.

Carlini, Nicholas, et al. USENIX, 2019.


Submit Review

Submit assignment (ML for systems)


Check suggested final project


Prof. Danqi Chen, Princeton

(Guest lecture)


NLP and Transformers


- Devlin et al., 2018: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 

- Liu et al., 2019: RoBERTa: A Robustly Optimized BERT Pretraining Approach

- Joshi & Chen et al., 2019: SpanBERT: Improving Pre-training by Representing and Predicting Spans

- Karpukhin et al., 2020: Dense Passage Retrieval for Open-Domain Question Answering

- Lee et al., 2020: Learning Dense Representations of Phrases at Scale



Felix, Alexander,



Josh, Kaiqi




Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity, William Fedus et al. 2020.


Big Bird: Transformers for Longer Sequences

Manzil Zaheer Guru Prashanth Guruganesh Avi Dubey Joshua Ainslie Chris Alberti Santiago Ontanon Philip Minh Pham Anirudh Ravula Qifan Wang Li Yang Amr Mahmoud El Houssieny Ahmed, NeurIPS (2020).


Submit Review

Submit final project proposal

(finish discussing with instructors)


Zhenyu Song, Princeton

(Guest lecture)



Learning Cache Replacement with CACHEUS
Liana Valdes, et al. FAST21.


Learning Relaxed Belady for Content Distribution Network Caching

Zhenyu Song, et al. NSDI 2020.





Yue, Dongsheng,




Yi, Grace


Flashield: a Hybrid Key-value Cache that Controls Flash Write Amplification

Assaf Eisenman, Asaf CidonEvgenya Pergament, Or Haimovich, Ryan Stutsman, Mohammad Alizadeh, Sachin Katti. NSDI 2019.


An Imitation Learning Approach for Cache Replacement

Evan Z. Liu Milad Hashemi Kevin Swersky Parthasarathy Ranganathan Junwhan Ahn.

ICML 2020.

Submit Review



Prof. Song Han, MIT

(Guest lecture)



MCUNet: Tiny Deep Learning on IoT Devices
J. Lin, W. Chen, Y. Lin, J. Cohn, C. Gan, S. Han.  NeurIPS’20.


Tiny Transfer Learning: Reduce Memory, not Parameters for Efficient On-Device Learning,

Cai, H., Gan, C., Zhu, L. and Han, S. NeurIPS’20


Differentiable Augmentation for Data-Efficient GAN Training

Zhao, S., Liu, Z., Lin, J., Zhu, J.Y. and Han, S.,NeurIPS’20





Yue, Josh,


Felix, Yi

Parallel and distributed training


Memory-Efficient Pipeline-Parallel DNN Training

Narayanan et al. 2020.


PyTorch Distributed: Experiences on Accelerating Data Parallel Training.
Shen Li, et al.
VLDB 2020.

 Submit Review



Cerebras, Inc.

Dr. Mike Ringenburg



Title: Accelerating Deep Learning with a Purpose-built Solution: The Cerebras Approach

Abstract:  The new era of chip specialization for deep learning is here. Traditional approaches to computing can no longer meet the computational and power requirements of this important workload. What is the right processor for deep learning? To answer this question, this talk will discuss computational requirements of deep learning models and the limitations of existing hardware architectures and scale-out approaches. Then we will discuss Cerebras' approach to meet computational requirements of deep learning with the Cerebras Wafer Scale Engine (WSE) – the largest computer chip in the world, and the Cerebras Software Platform, co-designed with the WSE. The WSE provides cluster-scale resources on a single chip with full utilization for tensors of any shape – fat, square and thin, dense and sparse – enabling researchers to explore novel network architectures and optimization techniques at any batch size.





Yue, Kaiqi,

Dongsheng, Grace




Neural-Enhanced Live Streaming: Improving Live Video Ingest via Online Learning

Jaehong Kim, et al, SIGCOMM 2020


Server-Driven Video Streaming for Deep Learning Inference

Kuntai Du, et al., SIGCOMM 2020


Submit Review


Prof. Dawn Song, UC Berkeley

Prof. Ruoxi Jia, Virginia Tech

(guest lecture)

Data Valuation

Part 1

Part 2

Efficient Task-Specific Data Valuation for Nearest Neighbor Algorithms.

Ruoxi Jia, David Dao, Boxin Wang, Frances Ann Hubis, Nezihe Merve Gurel, Bo Li, Ce Zhang, Costas J. Spanos, Dawn Song, PVLDB 2019


Towards Efficient Data Valuation Based on the Shapley Value

Ruoxi Jia*, David Dao*, Boxin Wang, Frances Ann Hubis, Nick Hynes, Nezihe Merve Gurel, Bo Li, Ce Zhang, Dawn Song, Costas Spanos, International Conference on Artificial Intelligence and Statistics (AISTATS), 2019


 Wang, Tianhao, et al. "A Principled Approach to Data Valuation for Federated Learning." Federated Learning. Springer, Cham, 2020. 153-167.




Alexander, Juan,



Samyak, Grace

Data Valuation

Understanding Black-box Predictions via Influence Functions

Pang Wei Koh, Percy Liang, ICML 2017.



Machine Unlearning

Lucas Bourtoule, Varun Chandrasekaran, Christopher A. Choquette-Choo, Hengrui Jia, Adelin Travers, Baiwu Zhang, David Lie, Nicolas Papernot, IEEE S&P 2021


Submit Review


(dean's day)



Submit Report (ddl: by the end of the day)




(9:30am –11am)

All students

Final project presentations

Pre 1: (Alexander and Felix) Partition learned bloom filter

Pre 2: (Yue) Sandwich bloom filter

Pre 3: (Josh) Comparing DARTS vs. Progressive DARTs

Pre 4: (Juan) Value Motivated Exploration

Pre 5: (Grace) Improved implementation of Binarized Neural Network

Pre 6: (Kaiqi) GraSP pruning for Binarized Neural Network

Pre 7: (Dongsheng) Linear Regression for FastLRB

Pre 8: (Samyak) NAS in Binarized Neural Network

Pre 9: (Yi) Data Parallelism in Unpruned and Pruned Neural Nets Training

The order of presentation follows the voluntary principle or we’ll generate it using random name picker.

Please submit your comments by the end of May 16th here.

The template can be downloaded here.