COS Independent Work Seminar:
Using Publicly Available Data to Learn, Explain, Evaluate and Improve

COS IW02
Spring 2018


General Information:

Feel free to email any of us with questions or to set up other meetings

Meeting time and place: Wednesday 1:30-2:50AM, CS 301
Links: Description, Schedule, Resources, FAQ, Piazza
 


Description:

The so-called big data revolution has led to the creation of data sets of various sizes that provide information about real world situations. Datasets of significant size are available in a variety of domains. These domains range from information about the operations of cities (including, for example, housing data, transportation data and police data in New York city among other urban centers) to health data (including epidemiological data on the spread of diseases and genomic data from thousands of individuals) to sports data (including information about virtually every pitch thrown in a baseball game since 1987). Given this wide availability of data, a challenge for the data scientist is to find effective ways to use the data to extend our knowledge of the situations represented by the data. This task involves exploring datasets, cleaning data, asking good questions, and presenting results in the most compelling fashion. The typical project will begin either with a question or with a dataset. In the former case, the goal will be to find datasets that help to answer the question and to explore. In the latter case, the goal will be to explore the data set to learn new and interesting things.
 

Schedule:

Date Topic
Feb 7 Course Introduction -- exploring public data and potential project ideas
Feb 8 Information meeting for all IW students (Convocation Room -- Friend Center), 12:30-1:30PM)
Feb 14 Develop project plans
Feb 21 Present initial project proposals -- bring 4 PowerPoint slides which define your project, tell what you've done and give a roadmap for the future
Feb 25 Written project proposals due
Feb 28 Proposal talks
Mar 7 Describe initial experiments with the data sets you will be using
Mar 11 Checkpoint form due
Mar 14 Discussion and feedback
Mar 21 Spring break
Mar 28 Description of Project Progress
April 3 (tentative) Attend "How to Give an IW Talk"
April 4-9 Sign up to give an oral presentation
April 10 (tentative) Attend "How to Write an IW Paper"
Apr 4 Discussion and feedback
Apr 11 Discussion and feedback
Apr 18 Demo Day
Apr 22 Submit Slides for an Oral Presentations
Apr 23-27 Give an Oral Presentations
May 7 Written final report due
May 13 Submit a Poster
May 14 Present at the Poster session

 

Potential Data Sets

This link points you at many potential data sets. Some of the data sets are meant to be one time downloads and others are sites that you would scrape with some frequency to gather updated date.

Before downloading a large data set, check with me to see if I already have it
 


Work of others

This link points you to blog posts, newspaper articles and research papers in the spirit of this course.
 

Frequently Asked Questions: