Week 1: Course Overview




Week 2: Introduction to ultra-large-scale system

Guest lecture from SAIL researchers: Tse-Hsun (Peter) Chen and Mark Syer
Subsuming Methods: Finding New Optimisation Opportunities in Object-Oriented Software
David Maplesden, Ewan Tempero, John Hosking, John C. Grundy
[ASSIGNMENT]
Week 3: Log Analysis
Automatic Identification of Load Testing Problems
Zhen Ming Jiang, Ahmed E. Hassan, Parminder Flora, and Gilbert Hamann
Detecting Large-Scale System Problems by Mining Console Logs
Wei Xu, Ling Huang, Armando Fox, David Patterson, Michael Jordan
Analyzing Log Analysis: An Empirical Study of User Log Mining
S. Alspaugh, Beidi Chen and Jessica Lin; Archana Ganapathi, Marti A. Hearst and Randy Katz
Leveraging Existing Instrumentation to Automatically Infer Invariant-Constrained Models
Ivan Beschastnikh, Yuriy Brun, Sigurd Schneider, Michael Sloan, Michael D. Ernst
[READING]
Mining Invariants from Console Logs for System Problem
Jian-Guang LOU, Qiang FU, Shengqi YANG, Ye XU, and Jiang LI
[READING]
Characterizing Logging Practices in Open-Source Software
Ding Yuan, Soyeon Park, and Yuanyuan Zhou
[READING]
Where Do Developers Log? An Empirical Study on Logging Practices in Industry
Qiang Fu, Jieming Zhu, Wenlu Hu, Jian-Guang Lou, Rui Ding, Qingwei Lin, Dongmei Zhang, and Tao Xie
[READING]
Improving Software Diagnosability via Log Enhancement
Ding Yuan, Jing Zheng, Soyeon Park, Yuanyuan Zhou, and Stefan Savage.
[READING]
Week 4: Performance Counters and Measurements
Correlating instrumentation data to system states: a building block for automated diagnosis and control
Ira Cohen, Moises Goldszmidt, Terence Kelly, Julie Symons, Jeffrey S. Chase
Producing Wrong Data Without Doing Anything Obviously Wrong!
Todd Mytkowicz,Amer Diwan, Matthias Hauswirth, Peter F. Sweeney
Automatic Detection of Performance Deviations during Load Testing of Large Scale Systems
Haroon Malik, Hadi Hemmati, Ahmed E. Hassan
The Mystery Machine: End-to-end Performance Analysis of Large-scale Internet Services
Michael Chow, David Meisner, Jason Flinn, Daniel Peek, Thomas F. Wenisch
[READING]
Statistical Debugging for Performance Problems
Linhai Song Shan Lu
[READING]
Catch Me if You Can: Performance Bug Detection in the Wild
Milan Jovic, Andrea Adamoli, Matthias Hauswirth
[READING]
X-ray: Automating Root-Cause Diagnosis of Performance Anomalies in Production Software
Mona Attariyan, Michael Chow and Jason Flinn
[READING]
Week 5: System monitoring
Assignment update (10 min presentation)
Astrolabe: A robust and scalable technology for distributed system monitoring, management, and data mining
Robbert Van Renesse , Kenneth P. Birman , Werner Vogels
Detecting failures in distributed systems with the FALCON spy network
Joshua B. Leners, Hao Wu, Wei-Lun Hung, Marcos K. Aguilera, Michael Walfish
Understanding the behavior of database operations under program control
Juan M. Tamayo, Alex Aiken, Nathan Bronson, Mooly Sagiv
AjaxScope: A Platform for Remotely Monitoring the Client-side Behavior of Web 2.0 Applications
Emre Kiciman and Benjamin Livshits
[READING]
Lightweight, High-Resolution Monitoring for Troubleshooting Production Systems
Sapan Bhatia, Abhishek Kumar, Marc E. Fiuczynski and Larry Peterson
[READING]
Week 6: Assignment presentation
Assignment DUE -- (30 mins presentation + 10 page IEEE report)
Week 7: Configuration
Do Not Blame Users for Misconfigurations
Tianyin Xu, Jiaqi Zhang, Peng Huang, Jing Zheng, Tianwei Sheng, Ding Yuan, Yuanyuan Zhou, Shankar Pasupathy
Automated Diagnosis of Software Configuration Errors
Sai Zhang and Michael D. Ernst.
An Empirical Study on Configuration Errors in Commercial and Open Source Systems
Zuoning Yin, Xiao Ma, Jing Zheng, Yuanyuan Zhou, Lakshmi N. Bairavasundaram, Shankar Pasupathy
AutoBash: Improving Configuration Management with Operating System Causality Analysis
Ya-Yunn Su, Mona Attariyan, and Jason Flinn
[READING]
Project Proposal DUE (2 pages IEEE format)
Week 8: Project Proposal Presentations
Project Proposal Presentation (15 mins + 10 mins questions)
Week 9: Debugging ultra-large-scale systems
Debugging in the (Very) Large: Ten Years of Implementation and Experience
Kirk Glerum, Kinshuman Kinshumann, Steve Greenberg, Gabriel Aul, Vince Orgovan, Greg Nichols, David Grant, Gretchen Loihle, and Galen Hunt
Extrinsic Influence Factors in Software Reliability: A Study of 200,000 Windows Machines
Christian Bird, Venkatesh-Prasad Ranganath, Thomas Zimmermann, Nachiappan Nagappan, Andreas Zeller
Simple Testing Can Prevent Most Critical Failures: An Analysis of Production Failures in Distributed Data-Intensive Systems
Ding Yuan, Yu Luo, Xin Zhuang, Guilherme Renna Rodrigues, Xu Zhao, Yongle Zhang, Pranay U. Jain, and Michael Stumm
Performance Debugging in the Large via Mining Millions of Stack Traces
Shi Han, Yingnong Dang, Song Ge, Dongmei Zhang, and Tao Xie
[READING]
Week 10: Power
Refactoring android Java code for on-demand computation offloading
Ying Zhang, Gang Huang, Xuanzhe Liu, Wei Zhang, Hong Mei, Shunxiang Yang
Carat: Collaborative Energy Diagnosis for Mobile Devices
Adam J. Oliner, Anand P. Iyer, Ion Stoica, Eemil Lagerspetz, Sasu Tarkoma
Evaluating the Effectiveness of Model-Based Power Characterization
John C. McCullough and Yuvraj Agarwal, Jaideep Chandrashekar, Sathyanarayan Kuppuswamy, Alex C. Snoeren, and Rajesh K. Gupta
Green mining: A methodology of relating software change to power consumption
Abram Hindle
[READING]
Week 11: Project Presentations
Project Presentation DUE (20 mins presentation)


Project Report DUE (10 page IEEE report)