|
Computer Science 598D
Systems and Machine Learning
|
Spring
2021
|
A New
Golden Age in Computer Architecture: Empowering the Machine- Learning
Revolution.
Jeff Dean, David Patterson, and Cliff Young,
IEEE Micro, 38(2), 21-29.
Yann LeCun, Yoshua Bengio, and Geoffrey
Hinton,
Nature 521, May 2015.
Human-level control through deep
reinforcement learning.
Mnih,
V. et al, Nature, 2015. (Earlier version).
Reinforcement Learning: An
Introduction. (book)
Richard S. Sutton and Andrew G. Barto.
MIT Press 2018.
PyTorch: An Imperative Style,
High-Performance Deep Learning Library.
Adam Paszke, et al.
TensorFlow: A System for Large-Scale
Machine Learning.
Martín Abadi, et al. OSDI 2016
Mesh-TensorFlow: Deep Learning for Supercomputers.
Noam Shazeer, Youlong
Cheng, Niki Parmar, Dustin Tran, Ashish Vaswani, Penporn
Koanantakool, Peter Hawkins, HyoukJoong
Lee, Mingsheng Hong, Cliff Young, Ryan Sepassi, Blake Hechtman. NeurIPS 2018.
A Brief Guide to Running ML Systems in Production
Carlos Villavieja, Salim Virji,
2020
Security and Machine Learning in the
Real World
Ivan Evtimov et al. 2020
In-Datacenter Performance Analysis Of A Tensor Processing
Unit.
Norman P. Jouppi, et al. ISCA 2017.
A domain-specific supercomputer
for training deep neural networks N. Jouppi et
al. CACM 2020
The Design Process for Google's Training Chips:
TPUv2 and TPUv3. T. Morrie, et al. IEEE MICRO 2021.
A configurable cloud-scale DNN
processor for real-time AI.
Jeremy Fowers, et al., ISCA, 2018.
High-Performance Deep-Learning
Coprocessor Integrated into x86 SoC with Server-Class CPUs. Glenn Henry, et al. ISCA 2020
The Architectural Implications of Facebook’s DNN-based Personalized
Recommendation. ‘
Gupta U. et al. HPCA 2020
Lynx: A SmartNIC-driven
Accelerator-centric Architecture for Network Servers.
Maroun Tork, Lina Maudlej, Mark Silberstein.
Interstellar: Using Halides Scheduling Language to
Analyze DNN Accelerators
Authors: Xuan Yang, Mingyu Gao, Qiaoyi Liu, Jeff Setter, Jing Pu, Ankita Nayak, Steven Bell
Kaidi Cao, Heonjae Ha,
Priyanka Raina, Christos Kozyrakis, Mark Horowitz.
RecNMP: Accelerating Personalized Recommendation with
Near-Memory Processing.
Liu Ke, Udit Gupta, Benjamin Youngjae Cho, David Brooks, Vikas Chandra, Utku Diril, Amin Firoozshahian, Kim Hazelwood, Bill Jia, Hsien-Hsin S. Lee, Meng Li, Bert Maher, Dheevatsa
Mudigere, Maxim Naumov,
Martin Schatz, Mikhail Smelyanskiy, Xiaodong Wang, Brandon Reagen,
Carole-Jean Wu, Mark Hempstead, Xuan Zhang.
The Lottery
Ticket Hypothesis: Finding Sparse, Trainable Neural Networks, Jonathan Frankle, Michael Carbin. ICLR
19.
Proving the Lottery Ticket Hypothesis
: Pruning is All You Need
Eran Malach, Gilad Yehudi, Shai Shalev-Shwartz and Ohad Shamir. ICML 2020.
Picking winning tickets before training by preserving gradient flow. Wang
et al. 2020
Pruning neural networks without any
data by iteratively conserving synaptic flow. Tanaka et al.
2020
Comparing Rewinding and Fine-tuning in
Neural Network Pruning
Alex Renda, Jonathan Frankle, Michael Carbin. ICLR 2020
Multi-Dimensional Pruning: A Unified
Framework for Model Compression
Jingyang Guo, Wanli Ouyang,
Dong Xu, CVPR 2020
Memory-Efficient Pipeline-Parallel DNN Training.
Deepak Narayanan, Amar Phanishayee, Kaiyu Shi, Xie Chen, Matei Zaharia.
PyTorch Distributed: Experiences on
Accelerating Data Parallel Training.
Shen Li, et al. VLDB 2020.
Yimin Jiang, et al. OSDI 2020.
Wencong Xiao, et al. OSDI 2020
Deepak Narayanan, et al. OSDI 2020.
DeepRecSys: A System for Optimizing End-To-End At-Scale
Neural Recommendation Inference.
Udit Gupta, Samuel Hsia, Vikram Saraph, Xiaodong Wang, Brandon Reagen,
Gu-Yeon Wei, Hsien-Hsin S. Lee, David Brooks,
Carole-Jean Wu.
ISCA 2020.
Capuchin: Tensor-based GPU Memory
Management for Deep Learning.
Quan Peng Xuanhua Shi Hulin Dai Hai Jin Weiliang Ma Qian Xiong Fan Yang Xuehai Qian. ASPLOS 2020.
A
Generic Communication Scheduler for Distributed DNN Training Acceleration.
Yanghua Peng (The University of Hong Kong), Yibo Zhu (ByteDance Inc.), Yangrui Chen (The University of Hong Kong), Yixin Bao (The University of Hong Kong), Bairen Yi (ByteDance Inc.), Chang
Lan (ByteDance Inc.), Chuan
Wu (The University of Hong Kong), Chuanxiong Guo (ByteDance
Inc.) SOSP 2019.
GPipe: Efficient Training of Giant Neural Networks using
Pipeline Parallelism.
Yanping Huang, Youlong
Cheng, Ankur Bapna, Orhan Firat,
Mia Xu Chen, Dehao Chen, HyoukJoong
Lee, Jiquan Ngiam, Quoc V.
Le, Yonghui Wu, Zhifeng
Chen. NeurIPS 2019.
IPS: Unified Profile
Management for Ubiquitous Online Recommendations.
Rui Shi, Yang Liu, Jianjun Chen, Xuan Zou, Yanbin Chen, Minghua Fan, Zhihao Cai
Guanghui Zhang, Zhiwen Li, Yuming
Liang. ICDE 2021.
FSNet: Compression of Deep Convolutional
Neural Networks by Filter Summary.
Yingzhen Yang Jiahui
Yu Nebojsa Jojic Jun Huan Thomas S. Huang. ICLR 2020.
PipeDream: Generalized
Pipeline Parallelism for DNN Training.
Deepak Narayanan, et al. SOSP 2019.
GShard:
Scaling Giant Models with Conditional Computation and Automatic Sharding.
D. Lepikhin, at al. 2021
(under review)
Generating long sequences
with sparse transformers.
Rewon Child, Scott Gray, Alec Radford, Ilya Sutskever.
2019.
Efficient transformers: A survey.
Yi Tay, Mostafa Dehghani, Dara Bahri,
Donald Metzler. 2020.
Switch Transformers: Scaling to
Trillion Parameter Models with Simple and Efficient Sparsity.
Fedus, William, Barret Zoph,
and Noam Shazeer. 2021.
Hat: Hardware-aware transformers for
efficient natural language processing.
Hanrui Wang, Zhanghao
Wu, Zhijian Liu, Han Cai, Ligeng
Zhu, Chuang Gan, Song Han.
ACL 2020.
Reformer: The efficient
transformer.
Nikita Kitaev, Łukasz
Kaiser, Anselm Levskaya. ICLR 2020.
Big Bird: Transformers for Longer Sequences
Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula,
Qifan Wang, Li Yang, Amr Ahmed.
NeurIPS 2020.
Deep neural networks are easily
fooled: High confidence predictions for unrecognizable images.
Anh Nguyen, Jason Yosinski, and Je Clune. 2015.
In CVPR. 427–436.
Explaining and harnessing adversarial examples.
Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy.
In International Conference on Learning Representations (ICLR). 2015.
Towards deep learning models resistant to adversarial
attacks.
Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu.
ICLR 2018.
Adversarial training for free!
Ali Shafahi, Mahyar Najibi, Mohammad Amin Ghiasi, Zheng Xu, John Dickerson, Christoph Studer, Larry S. Davis, Gavin Taylor Tom Goldstein
NIPS 2019.
Attacking Binarized Neural Networks -
Angus Galloway, Graham W. Taylor, and Medhat Moussa ICLR 2018
Multi-task Learning Increases Adversarial Robustness
Chengzhi Mao, Amogh
Gupta, Vikram Nitin, Baishakhi Ray, Shuran Song, Junfeng Yang, Carl Vondrick, ECCV 2020
DeepFool: a simple and accurate method to fool
deep neural networks - Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi,
and Pascal Frossard.
Benchmarking Neural Network Robustness
to Common Corruptions and Perturbations - Dan Hendrycks and
Thomas Dietterich.
Practical Secure Aggregation for
Privacy-Preserving Machine Learning.
Aaron Segal, et al. CCS 2017.
Differential Privacy: A Survey of
Results
Cynthia Dwork. International Conference on Theory and Applications of Models of Computation. 2008.
Deep Learning with Differential
Privacy.
Martín Abadi, et al. CCS 2016.
InstaHide: Instance-hiding Schemes for Private Distributed
Learning
Yangsibo Huang, Zhao Song, Kai Li, Sanjeev
Arora, ICML 2020
Auditing
Data Provenance in Text-Generation Models
Song and Shmatikov, KDD 2019
A Survey of Privacy Attacks in Machine Learning
Maria Rigaki, Sebastian Garcia, Arxiv 2020
Membership inference attacks against
machine learning models. Shokri, Reza, et al.
2017 IEEE Symposium on Security and Privacy
(SP). IEEE, 2017.
The secret sharer: Evaluating
and testing unintended memorization in neural networks.
Carlini, Nicholas, et al.
USENIX, 2019
Shredder: Learning Noise Distributions to Protect Inference Privacy
Fatemehsadat Mireshghallah,
Mohammadkazem Taram, Prakash Ramrakhyani,
Dean Tullsen, Hadi Esmaeilzadeh, ASPLOS 20202
Federated learning: Collaborative
machine learning without centralized training data.
H. Brendan McMahan and Daniel Ramage., 2017.
Federated optimization in
heterogeneous networks. Li et al. 2018
Towards Federated Learning at Scale: System Design
Keith Bonawitz et al. SysML
2019.
Federated Optimization In Heterogeneous Networks.
Tian Li, Anit Kumar Sahu, Manzil Zaheer, Maziar Sanjabi, Ameet Talwalkar, Virginia Smith.
SysML 2020.
Advances and Open Problems in Federated Learning
Peter Kairouz, et al.
FedSplit: an algorithmic framework for fast federated
optimization (Acceleration)
Reese Pathak, Martin J. Wainwright, NeurIPS 2020
Attack of the Tails: Yes, You Really Can Backdoor Federated Learning
(Privacy)
Hongyi Wang, Kartik Sreenivasan,
Shashank Rajput, Harit Vishwakarma, Saurabh Agarwal,
Jy-yong Sohn, Kangwook Lee,
Dimitris Papailiopoulos, NeurIPS
2020
Inverting Gradients - How easy is it to break privacy in federated
learning? (Privacy)
Jonas Geiping,
Hartmut Bauermeister, Hannah Dröge,
Michael Moeller, NeurIPS
2020
FetchSGD:
Communication-Efficient Federated Learning with Sketching
(Communication)
Daniel Rothchild, et al. ICML 2020
Understanding Black-box Predictions via Influence
Functions
Pang Wei Koh, Percy Liang, ICML 2017
Data Shapley: Equitable Valuation of
Data for Machine Learning
Amirata Ghorbani, James Zou.
ICML 2019
Towards Efficient Data Valuation Based
on the Shapley Value
Ruoxi Jia, David Dao, Boxin Wang,
Frances Ann Hubis, Nick Hynes, Nezihe
Merve Gurel, Bo Li, Ce
Zhang, Dawn Song, Costas Spanos.
AISTATS 2020.
Lucas Bourtoule, Varun
Chandrasekaran, Christopher A. Choquette-Choo, Hengrui Jia, Adelin Travers, Baiwu Zhang, David Lie, Nicolas Papernot,
IEEE S&P 2021
Certified Data Removal from Machine
Learning Models
Chuan Guo, Tom Goldstein, Awni Hannun, Laurens van der Maaten,
ICML 2019
Aditya Golatkar,
Alessandro Achille, Stefano Soatto ECCV 202
A Model for Learned Bloom Filters and
Optimizing by Sandwiching.
Michael Mitzenmacher, et
al. NIPS 2018.
Learning Multi-dimensional Indexes.
Vikram Nathan, Jialin
Ding, Mohammad Alizadeh, Tim Kraska
SIGMOD 2020
Learning Space Partitions for Nearest
Neighbor Search
Yihe Dong, Piotr Indyk,
Ilya Razenshteyn, Tal Wagner
ICLR 2020
ALEX: An Updatable Adaptive Learned Index
Jialin Ding et al. SIGMOD 2020.
The Case for a Learned Sorting
Algorithm
Ani Kristo et
al. SIGMOD 2020
The Case for Learned Index Structures.
T. Kraska, A.
Beutel, E. H. Chi, J. Dean, and N. Polyzotis.
SIGMOD 2018. pages 489-504.
Compiler Auto-Vectorization with
Imitation Learning
Charith Mendis, Cambridge Yang, Yewen
Pu, Saman Amarasinghe, Michael Carbin
NIPS 2019
NeuroVectorizer: End-to-End Vectorization with Deep Reinforcement Learning.
A. Haj-Ali, N. K. Ahmed, T. Willke, S. Shao, K. Asanovic, and I. Stoica. CGO 2020.
Learning Execution through Neural Code Fusion.
Zhan Shi Kevin Jordan Swersky Danny Tarlow Parthasarathy Ranganathan Milad Hashemi
ICLR 2020
Neural Execution Engines: Learning to Execute
Subroutines.
Yujun Yan, Kevin Swersky, Danai Koutra, Parthasarathy Ranganathan, Milad Hashemi.
Compiler-Based Graph Representations for Deep Learning Models of Code.
Alexander Brauckmann, Andrés Goens, Sebastian Ertel, Jeronimo Castrillon.
CC 2020.
ProGraML: Graph-based Deep Learning for Program Optimization and Analysis.
Chris Cummins, Zacharias V. Fisches, Tal Ben-Nun, Torsten Hoefler, Hugh Leather.
2020.
Eric Liang , Hang Zhu , Xin Jin , Ion Stoica
ACM SIGCOMM, 2019
Learning in situ: a randomized
experiment in video streaming.
Francis
Y. Yan, et al. NSDI 2020.
Neural-Enhanced Live Streaming:
Improving Live Video Ingest via Online Learning
Jaehong Kim, et al, SIGCOMM
2020
Server-Driven Video Streaming for Deep
Learning Inference
Kuntai Du, et al., SIGCOMM
2020
Learning Cache Replacement with
CACHEUS
Liana
Valdes, et al. FAST21.
Learning Relaxed Belady
for Content Distribution Network Caching
Zhenyu Song, et al. NSDI 2020.
Flashield: a Hybrid
Key-value Cache that Controls Flash Write Amplification
Assaf
Eisenman, Asaf Cidon, Evgenya
Pergament, Or Haimovich, Ryan
Stutsman, Mohammad Alizadeh, Sachin Katti. NSDI 2019.
An Imitation Learning Approach for Cache
Replacement
Evan Z. Liu Milad Hashemi Kevin Swersky
Parthasarathy Ranganathan Junwhan Ahn. ICML 2020.
Applying Deep Learning to the Cache
Replacement Problem
Z. Shi, X. Huang, and A. Jain, C. Lin.
MICRO 2019.
Autopilot: Workload Autoscaling at Google Scale.
Krzysztof Rzadca, et al. Eurosys
2020.
Neural Architecture Search with
Reinforcement Learning
Barret Zoph Quoc V. Le.
ICLR (2017)
Proxylessnas: Direct Neural Architecture
Search on Target Task And Hardware.
Han Cai, Ligeng Zhu, Song
Han. ICLR 2019.
Once
For All: Train One Network and Specialize It for Efficient Deployment
H. Cai, C. Gan, T. Wang, Z. Zhang, S. Han. ICLR
2020.
MCUNet:
Tiny Deep Learning on IoT Devices
J. Lin, W. Chen, Y. Lin, J. Cohn, C. Gan, S. Han. NeurIPS’20.
Neural Architecture Search with Reinforcement Learning
Barret Zoph Quoc V. Le, ICLR 2017.
EfficientNet: Rethinking Model Scaling for
Convolutional Neural Networks
Mingxing Tan, Quoc V. Le, ICML 2019
AutoML-Zero: Evolving Machine Learning
Algorithms From Scratch
Esteban Real, Chen Liang, David R. So, Quoc V. Le, ICML 2020
Darts:
Differentiable architecture search. Liu, H., Simonyan,
K. and Yang, Y., 2018.
Efficient neural architecture search
via parameter sharing. Pham, H., Guan, M.Y., Zoph,
B., Le, Q.V. and Dean, J., 2018.
ReLeQ: A Reinforcement Learning Approach
for Automatic Deep Quantization of Neural Networks.
Ahmed Taha et al. IEEE Micro
(2020)
Deep
Transformers with Latent Depth.
Xian Li, Asa Cooper Stickland, Yuqing Tang, and Xiang Kong.
NeurIPS 2020.