s

Princeton University
Computer Science Department

Computer Science 598D

Systems and Machine Learning

Spring 2021

 

Suggested Readings

Introductory Papers

A New Golden Age in Computer Architecture: Empowering the Machine- Learning Revolution.
Jeff Dean, David Patterson, and Cliff Young, 
IEEE Micro, 38(2), 21-29.

Deep Learning.

Yann LeCun, Yoshua Bengio, and Geoffrey Hinton, 

Nature 521, May 2015.

 

Human-level control through deep reinforcement learning.

Mnih, V. et al, Nature, 2015.  (Earlier version).

 

Reinforcement Learning: An Introduction. (book)
Richard S. Sutton and Andrew G. Barto.

MIT Press 2018.

Systems for ML

Library Framework

 

PyTorch: An Imperative Style, High-Performance Deep Learning Library.

Adam Paszke, et al.

TensorFlow: A System for Large-Scale Machine Learning.

Martín Abadi, et al. OSDI 2016

 

Mesh-TensorFlow: Deep Learning for Supercomputers.

Noam Shazeer, Youlong Cheng, Niki Parmar, Dustin Tran, Ashish Vaswani, Penporn Koanantakool, Peter Hawkins, HyoukJoong Lee, Mingsheng Hong, Cliff Young, Ryan Sepassi, Blake Hechtman. NeurIPS 2018.

 

Production Systems

A Brief Guide to Running ML Systems in Production

Carlos Villavieja, Salim Virji, 2020

 

Security and Machine Learning in the Real World

Ivan Evtimov et al. 2020

 

Architecture

In-Datacenter Performance Analysis Of A Tensor Processing Unit.
Norman P. Jouppi, et al. ISCA 2017.

 

A domain-specific supercomputer for training deep neural networks N. Jouppi et al. CACM 2020

 

The Design Process for Google's Training Chips: TPUv2 and TPUv3.  T. Morrie, et al. IEEE MICRO 2021.

 

A configurable cloud-scale DNN processor for real-time AI.

Jeremy Fowers, et al., ISCA, 2018.

 

High-Performance Deep-Learning Coprocessor Integrated into x86 SoC with Server-Class CPUs.  Glenn Henry, et al. ISCA 2020

 

The Architectural Implications of Facebook’s DNN-based Personalized Recommendation.

Gupta U. et al. HPCA 2020

 

Lynx: A SmartNIC-driven Accelerator-centric Architecture for Network Servers.

Maroun Tork, Lina Maudlej, Mark Silberstein.

 

Interstellar: Using Halides Scheduling Language to Analyze DNN Accelerators

Authors: Xuan Yang, Mingyu Gao, Qiaoyi Liu, Jeff Setter, Jing Pu, Ankita Nayak, Steven Bell Kaidi Cao, Heonjae Ha, Priyanka Raina, Christos Kozyrakis, Mark Horowitz.

 

RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing.

Liu Ke, Udit Gupta, Benjamin Youngjae Cho, David Brooks, Vikas Chandra, Utku Diril, Amin Firoozshahian, Kim Hazelwood, Bill Jia, Hsien-Hsin S. Lee, Meng Li, Bert Maher, Dheevatsa Mudigere, Maxim Naumov, Martin Schatz, Mikhail Smelyanskiy, Xiaodong Wang, Brandon Reagen, Carole-Jean Wu, Mark Hempstead, Xuan Zhang.

 

Network Pruning

 

The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks, Jonathan Frankle, Michael Carbin. ICLR 19. 

Proving the Lottery Ticket Hypothesis : Pruning is All You Need

Eran Malach, Gilad Yehudi, Shai Shalev-Shwartz and Ohad Shamir.  ICML 2020.

Picking winning tickets before training by preserving gradient flow. Wang et al. 2020

 

Pruning neural networks without any data by iteratively conserving synaptic flow. Tanaka et al. 2020

 

Comparing Rewinding and Fine-tuning in Neural Network Pruning

Alex Renda, Jonathan Frankle, Michael Carbin.  ICLR 2020

Multi-Dimensional Pruning: A Unified Framework for Model Compression
Jingyang Guo, Wanli Ouyang, Dong Xu, CVPR 2020

Systems and Optimizations

Memory-Efficient Pipeline-Parallel DNN Training.

Deepak Narayanan, Amar Phanishayee, Kaiyu Shi, Xie Chen, Matei Zaharia.

PyTorch Distributed: Experiences on Accelerating Data Parallel Training.
Shen Li, et al.
VLDB 2020.

A Unified Architecture for Accelerating Distributed DNN Training in Heterogeneous GPU/CPU Clusters.

Yimin Jiang, et al. OSDI 2020.

AntMan: Dynamic Scaling on GPU Clusters for Deep Learning

Wencong Xiao, et al. OSDI 2020

Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads

Deepak Narayanan, et al. OSDI 2020.

DeepRecSys: A System for Optimizing End-To-End At-Scale Neural Recommendation Inference.
Udit Gupta, Samuel Hsia, Vikram Saraph, Xiaodong Wang, Brandon Reagen, Gu-Yeon Wei, Hsien-Hsin S. Lee, David Brooks, Carole-Jean Wu.
ISCA 2020.

Capuchin: Tensor-based GPU Memory Management for Deep Learning.
Quan Peng Xuanhua Shi Hulin Dai Hai Jin Weiliang Ma Qian Xiong Fan Yang Xuehai Qian.  ASPLOS 2020.

 

A Generic Communication Scheduler for Distributed DNN Training Acceleration.
Yanghua Peng (The University of Hong Kong), Yibo Zhu (ByteDance Inc.), Yangrui Chen (The University of Hong Kong), Yixin Bao (The University of Hong Kong), Bairen Yi (ByteDance Inc.), Chang Lan (ByteDance Inc.), Chuan Wu (The University of Hong Kong), Chuanxiong Guo (ByteDance Inc.) SOSP 2019.

GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism.
Yanping Huang, Youlong Cheng, Ankur Bapna, Orhan Firat, Mia Xu Chen, Dehao Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V. Le, Yonghui Wu, Zhifeng Chen.
NeurIPS 2019.

 

IPS: Unified Profile Management for Ubiquitous Online Recommendations.
Rui Shi, Yang Liu, Jianjun Chen, Xuan Zou, Yanbin Chen, Minghua Fan, Zhihao Cai
Guanghui Zhang, Zhiwen Li, Yuming Liang. ICDE 2021.

FSNet: Compression of Deep Convolutional Neural Networks by Filter Summary.

Yingzhen Yang Jiahui Yu Nebojsa Jojic Jun Huan Thomas S. Huang.  ICLR 2020.

 

PipeDream: Generalized Pipeline Parallelism for DNN Training.

Deepak Narayanan, et al. SOSP 2019.

 

GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding.

D. Lepikhin, at al. 2021 (under review)

 

Transformers

Generating long sequences with sparse transformers.

Rewon Child, Scott Gray, Alec Radford, Ilya Sutskever. 2019.

Efficient transformers: A survey.
Yi Tay, Mostafa Dehghani, Dara Bahri, Donald Metzler. 2020.

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity.

Fedus, William, Barret Zoph, and Noam Shazeer. 2021.

Hat: Hardware-aware transformers for efficient natural language processing.

Hanrui Wang, Zhanghao Wu, Zhijian Liu, Han Cai, Ligeng Zhu, Chuang Gan, Song Han.

ACL 2020.

 

Reformer: The efficient transformer.

Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya. ICLR 2020.

 

Big Bird: Transformers for Longer Sequences

Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.

NeurIPS 2020.

 

Security

 

Deep neural networks are easily fooled: High confidence predictions for unrecognizable images.

Anh Nguyen, Jason Yosinski, and Je Clune. 2015.

In CVPR. 427–436.

 

Explaining and harnessing adversarial examples.

Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy.

In International Conference on Learning Representations (ICLR). 2015.

 

Towards deep learning models resistant to adversarial attacks.

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu.

ICLR 2018.

 

Adversarial training for free!

Ali Shafahi, Mahyar Najibi, Mohammad Amin Ghiasi, Zheng Xu, John Dickerson, Christoph Studer, Larry S. Davis, Gavin Taylor Tom Goldstein

NIPS 2019.

 

Attacking Binarized Neural Networks - Angus Galloway, Graham W. Taylor, and Medhat Moussa ICLR 2018

 

Multi-task Learning Increases Adversarial Robustness

Chengzhi Mao, Amogh Gupta, Vikram Nitin, Baishakhi Ray, Shuran Song, Junfeng Yang, Carl Vondrick, ECCV 2020

 

DeepFool: a simple and accurate method to fool deep neural networks - Seyed-Mohsen Moosavi-DezfooliAlhussein Fawzi, and Pascal Frossard.

 

Benchmarking Neural Network Robustness to Common Corruptions and Perturbations - Dan Hendrycks and Thomas Dietterich.

 

 

Privacy Preserving

 

Practical Secure Aggregation for Privacy-Preserving Machine Learning.

Aaron Segal, et al. CCS 2017.

 

Differential Privacy: A Survey of Results

Cynthia Dwork.  International Conference on Theory and Applications of Models of Computation. 2008.

 

Deep Learning with Differential Privacy.

Martín Abadi, et al. CCS 2016.

 

InstaHide: Instance-hiding Schemes for Private Distributed Learning

Yangsibo Huang, Zhao Song, Kai Li, Sanjeev Arora, ICML 2020

 

Auditing Data Provenance in Text-Generation Models 

Song and Shmatikov, KDD 2019

 

A Survey of Privacy Attacks in Machine Learning

Maria Rigaki, Sebastian Garcia, Arxiv 2020

 

Membership inference attacks against machine learning models. Shokri, Reza, et al.

2017 IEEE Symposium on Security and Privacy (SP). IEEE, 2017.

 

The secret sharer: Evaluating and testing unintended memorization in neural networks.

Carlini, Nicholas, et al. USENIX, 2019

 

Shredder: Learning Noise Distributions to Protect Inference Privacy

Fatemehsadat Mireshghallah, Mohammadkazem Taram, Prakash Ramrakhyani, Dean Tullsen, Hadi Esmaeilzadeh, ASPLOS 20202

 

 

Federated Learning

Federated learning: Collaborative machine learning without centralized training data.
H. Brendan McMahan and Daniel Ramage., 2017.

 

Federated optimization in heterogeneous networks. Li et al. 2018

 

Towards Federated Learning at Scale: System Design

Keith Bonawitz et al. SysML 2019.

 

Federated Optimization In Heterogeneous Networks.

Tian Li, Anit Kumar Sahu, Manzil Zaheer, Maziar Sanjabi, Ameet Talwalkar, Virginia Smith.

SysML 2020.

 

Advances and Open Problems in Federated Learning

Peter Kairouz, et al. 

 

FedSplit: an algorithmic framework for fast federated optimization (Acceleration)

Reese Pathak, Martin J. Wainwright, NeurIPS 2020

 

Attack of the Tails: Yes, You Really Can Backdoor Federated Learning (Privacy)

Hongyi Wang, Kartik Sreenivasan, Shashank Rajput, Harit Vishwakarma, Saurabh Agarwal, Jy-yong Sohn, Kangwook Lee, Dimitris Papailiopoulos, NeurIPS 2020

 

Inverting Gradients - How easy is it to break privacy in federated learning? (Privacy)

Jonas Geiping, Hartmut Bauermeister, Hannah Dröge, Michael Moeller, NeurIPS 2020

 

FetchSGD: Communication-Efficient Federated Learning with Sketching (Communication)

Daniel Rothchild, et al. ICML 2020

 

Data Valuation and Machine Unlearning

 

Understanding Black-box Predictions via Influence Functions

Pang Wei Koh, Percy Liang, ICML 2017

 

Data Shapley: Equitable Valuation of Data for Machine Learning

Amirata Ghorbani, James Zou.

ICML 2019

 

Towards Efficient Data Valuation Based on the Shapley Value

Ruoxi Jia, David Dao, Boxin Wang, Frances Ann Hubis, Nick Hynes, Nezihe Merve Gurel, Bo Li, Ce Zhang, Dawn Song, Costas Spanos.

AISTATS 2020.

 

Machine Unlearning

Lucas Bourtoule, Varun Chandrasekaran, Christopher A. Choquette-Choo, Hengrui Jia, Adelin Travers, Baiwu Zhang, David Lie, Nicolas Papernot, IEEE S&P 2021

 

Certified Data Removal from Machine Learning Models

Chuan Guo, Tom Goldstein, Awni Hannun, Laurens van der Maaten, ICML 2019

 

Forgetting Outside the Box: Scrubbing Deep Networks of Information Accessible from Input-Output Observations

Aditya Golatkar, Alessandro Achille, Stefano Soatto ECCV 202

 

ML for Systems

Learned Data Structures

 

A Model for Learned Bloom Filters and Optimizing by Sandwiching

Michael Mitzenmacher, et al. NIPS 2018.

 

Learning Multi-dimensional Indexes.

Vikram Nathan, Jialin Ding, Mohammad Alizadeh, Tim Kraska

SIGMOD 2020

 

Learning Space Partitions for Nearest Neighbor Search

Yihe Dong, Piotr Indyk, Ilya Razenshteyn, Tal Wagner

ICLR 2020

 

ALEX: An Updatable Adaptive Learned Index

Jialin Ding et al. SIGMOD 2020.

 

The Case for a Learned Sorting Algorithm

Ani Kristo et al. SIGMOD 2020

 

The Case for Learned Index Structures.

T. Kraska, A. Beutel, E. H. Chi, J. Dean, and N. Polyzotis.

SIGMOD 2018.  pages 489-504.

 

Learned Compilation and Execution

Compiler Auto-Vectorization with Imitation Learning
Charith Mendis, Cambridge Yang, Yewen Pu, Saman Amarasinghe, Michael Carbin
NIPS 2019

 

NeuroVectorizer: End-to-End Vectorization with Deep Reinforcement Learning.

A. Haj-Ali, N. K. Ahmed, T. Willke, S. Shao, K. Asanovic, and I. Stoica. CGO 2020.

 

Learning Execution through Neural Code Fusion.

Zhan Shi Kevin Jordan Swersky Danny Tarlow Parthasarathy Ranganathan Milad Hashemi

ICLR 2020

 

Neural Execution Engines: Learning to Execute Subroutines.

Yujun Yan, Kevin Swersky, Danai Koutra, Parthasarathy Ranganathan, Milad Hashemi.

 

Compiler-Based Graph Representations for Deep Learning Models of Code.

Alexander Brauckmann, Andrés Goens, Sebastian Ertel, Jeronimo Castrillon.

CC 2020.

 

ProGraML: Graph-based Deep Learning for Program Optimization and Analysis.

Chris Cummins, Zacharias V. Fisches, Tal Ben-Nun, Torsten Hoefler, Hugh Leather.

2020.

 

Networking

Neural Packet Classification

Eric Liang , Hang Zhu , Xin Jin , Ion Stoica

ACM SIGCOMM, 2019

 

Learning in situ: a randomized experiment in video streaming.

Francis Y. Yan, et al. NSDI 2020.

 

Neural-Enhanced Live Streaming: Improving Live Video Ingest via Online Learning

Jaehong Kim, et al, SIGCOMM 2020

 

Server-Driven Video Streaming for Deep Learning Inference

Kuntai Du, et al., SIGCOMM 2020

 

Caching and Access Patterns

 

Learning Cache Replacement with CACHEUS
Liana Valdes, et al. FAST21.

 

Learning Relaxed Belady for Content Distribution Network Caching

Zhenyu Song, et al. NSDI 2020.

 

Flashield: a Hybrid Key-value Cache that Controls Flash Write Amplification

Assaf Eisenman, Asaf CidonEvgenya Pergament, Or Haimovich, Ryan Stutsman, Mohammad Alizadeh, Sachin Katti. NSDI 2019.

 

An Imitation Learning Approach for Cache Replacement

Evan Z. Liu Milad Hashemi Kevin Swersky Parthasarathy Ranganathan Junwhan Ahn.  ICML 2020.

 

Applying Deep Learning to the Cache Replacement Problem
Z. Shi, X. Huang, and A. Jain, C. Lin.
MICRO 2019.

 

Scheduling and Resource Allocation

Autopilot: Workload Autoscaling at Google Scale.

Krzysztof Rzadca, et al. Eurosys 2020.

 

Auto ML

Neural Architecture Search with Reinforcement Learning

Barret Zoph Quoc V. Le.

ICLR (2017)

 

Proxylessnas: Direct Neural Architecture Search on Target Task And Hardware.
Han Cai, Ligeng Zhu, Song Han.  ICLR 2019.

 

Once For All: Train One Network and Specialize It for Efficient Deployment
H. Cai, C. Gan, T. Wang, Z. Zhang, S. Han.  ICLR 2020.

 

MCUNet: Tiny Deep Learning on IoT Devices
J. Lin, W. Chen, Y. Lin, J. Cohn, C. Gan, S. Han.  NeurIPS’20.

 

Neural Architecture Search with Reinforcement Learning

Barret Zoph Quoc V. Le, ICLR 2017.

 

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

Mingxing Tan, Quoc V. Le, ICML 2019

 

AutoML-Zero: Evolving Machine Learning Algorithms From Scratch

Esteban Real, Chen Liang, David R. So, Quoc V. Le, ICML 2020

 

Darts: Differentiable architecture search. Liu, H., Simonyan, K. and Yang, Y., 2018.  


Efficient neural architecture search via parameter sharing. Pham, H., Guan, M.Y., Zoph, B., Le, Q.V. and Dean, J., 2018.

 

ReLeQ: A Reinforcement Learning Approach for Automatic Deep Quantization of Neural Networks.

Ahmed Taha et al.  IEEE Micro (2020)

 

Deep Transformers with Latent Depth.

Xian Li, Asa Cooper Stickland, Yuqing Tang, and Xiang Kong.

NeurIPS 2020.