Independent Work Seminar Offerings - Spring 2022
COS IW 02 & 03: Machine Learning and Data Science
Instructor: Xiaoyan Li
Meeting Time: Tuesdays, 1:30pm-2:50pm (COS IW 02) | Tuesdays, 3:00-4:20pm (COS IW 03)
Location: CS 402
In this seminar, you will learn or review many machine learning methods, such as naïve Bayes classifier, Support Vector Machines, Decision Trees, AdaBoost, and Random Forest, etc. Students will choose at least one data set of interest and propose some questions that can be answered from the data set by applying two or more machine learning models and performing data analysis. A complete process of data analysis consists of raw data collecting, feature extraction, missing data imputation, feature selection, model fitting, making predictions for unseen data, performance evaluation, and error analysis, etc. While machine learning is a broad growing field, this seminar is focused on traditional machine learning methods, how to apply them in real-world data sets correctly, evaluate in different metrics, interpret the results, and identify important features.
There are no prerequisites beyond COS217 and COS226. However, it would be helpful if students know Python already because they will use existing machine learning packages in Python. But this is not required. There are some online Python tutorials and students can usually learn Python by themselves in a week.
This seminar will meet once a week. Class times are used to present machine learning methods and data analysis techniques, and to discuss students’ project progress. Each student will report their weekly progress on their project either in class or during a one-on-one meeting with the instructor. Each student will also be assigned one topic on machine learning methods or data analysis techniques to present in class with some guidance from the instructor.
We will help students find a topic during the first two classes. Each student should develop an individual project which is suitable for one semester work and may have the potential to extend to a senior thesis. A thorough solid project may also lead to publication in some conference or workshop in the field.
The two sections of the seminar are independent but will generally be quite similar.
COS IW 04: Hands-on Deep Learning for Language Understanding
Instructor: Danqi Chen
Meeting Time: Mondays, 11:00am-12:20pm
Location: CS 402
Recent advances in deep learning have ushered in extremely exciting developments in natural language understanding (NLU), which deals with computers’ capability of understanding and comprehension of human languages, one of the most challenging problems in artificial intelligence. This seminar aims to provide students the opportunity to learn the basics and acquire hands-on experience in building and learning about state-of-the-art NLU systems through a semester-long project. We will particularly focus on cutting-edge techniques including pre-trained language models (e.g., BERT, RoBERTa, T5/BART), fine-tuning/prompting, and recently-developed NLU benchmarks. Types of projects include developing better NLU models (more accurate, less compute, less storage, or more data-efficient), adapting existing approaches to new tasks of interest, analyzing and understanding the limitations of current systems, or developing better benchmarks and metrics.
Students are not required to have prior experience with NLP (but it is definitely useful) but are strongly encouraged to have taken COS324 before. Students may work in pairs if the project can be split so that each student has their own semester-size piece of the project. We will spend the first 2-3 meetings of the semester surveying the state-of-the-art in the field, brainstorming ideas, and developing project proposals. The remaining meetings will be used for project updates, formal student presentations, and discussions on how to perform background research, prepare a presentation, and write a final paper.
COS IW 05: Technology Policy
Instructor: Mihir Kshirsagar
Meeting Time: Fridays, 11:00 am-12:20 pm
Location: 315 Sherrard Hall
Abstract: In this IW seminar students get to work on crafting concrete policy responses to challenges posed by emerging computer and network technologies. There is a renewed sense of urgency to understand the implications of how these technologies are transforming public life and to craft practical solutions that address the difficult tradeoffs we need to make. Students in past seminars have worked on a variety of different projects, including those related to machine learning, social media, video game design, communication policy, competition, privacy, and cryptocurrencies, among other issues.
The first half of the seminar will focus on introducing students to policy challenges in different domains to help them explore potential topics for their final project. The second half of the seminar is devoted to workshopping the final projects and helping students develop workable proposals.
The final project will be student-driven, with the opportunity to create a real-world policy work product. Policymakers need thoughtful, technically sophisticated voices to help them develop evidence-based policies. This seminar helps students prepare to play that vital role. All students are expected to attend all weekly meetings and work collaboratively on shared projects. There are no prerequisites for taking this seminar.
COS IW 01 & 06: Digital Humanities
Instructor: Brian Kernighan
Meeting Time: Fridays, 11:00am-12:20pm (COS IW 06) | Fridays, 1:30-2:50pm (COS IW 01)
"Digital humanities" covers a wide variety of ways in which scholars in the humanities -- literature, languages, history, music, art, religion, and many other disciplines -- collect, curate, analyze and present information about their fields, using digital representations and technology.
Digital humanities data is intrinsically messy, and there is always a considerable effort devoted to cleaning it up even before study can begin. There is also much effort devoted to figuring out how to present it effectively and make it accessible to others.
This seminar is aimed at building tools and developing techniques thatwill help humanities scholars work more effectively with their data. This might include machine learning, natural language processing, data visualization, data cleaning, and user interface design for making the processes available to scholars just starting out in technology.
A typical project might begin with a humanities dataset or a focus on a CS technique. In the former case, the goal would be to explore the data to learn and present new and interesting things about it. In the lattercase, the goal would be to create or improve tools, languages, and interfaces that help scholars in the humanities.
No particular background is required beyond COS 217 and 226 and an interest in learning new things and applying that knowledge usefully.
COS IW 07: Mobile and Wearable Design for Sports and Assistive Technology
Instructor: Kyle Jamieson
Meeting Time: Wednesdays, 11:00am-12:20pm
Location: CS 301
Wearable/augmented-reality platforms, sports and medical sensors, and other technologies are creating new opportunities to help athletes excel in their sports, and help persons with disabilities perform the activities of daily living. At the same time, these exciting technologies also create new opportunities to help doctors, coaches, and therapists in the clinic and the home. Participants in this seminar will choose an assistive, sport, or medical application and develop a solution that will have real impact on peoples’ lives.
Possible assistive applications include hearing impairment, cognitive impairment (aphasia, autism spectrum, Parkinson's disease, prosopagnosia), and vision impairment. Other in-scope applications targeted on medical, sports, and psychological contexts include dermatology (melanoma detection and diagnosis, jaundice), pulmonary spirometry, physical/occupational therapy (stroke rehabilitation), sports medicine and performance analysis, and health monitoring (blood glucose, EKG, blood oxygenation). Possible hardware platforms include mobile devices, wearable health monitors, augmented reality devices (Microsoft HoloLens, Fove VR), 360-degree cameras (Google), wearable body-cams, and an array of new-to-market medical and sports performance sensor technology (Garmin, Apple, Polar, and others).
[CANCELED] COS IW 08: The IW Seminar About Nothing
COS IW 09: You Be the Prof
Instructor: David Walker
Meeting Time: Tuesdays, 11:00am-12:20pm
Location: CS 301
Ever want to take over a course and show the prof how it's done? Now's your chance. In this IW seminar, develop technology, tools or materials to help other students learn. Doing so could entail:
- Developing a creative assignment for an existing course (one you've taken already at Princeton) designed to teach students a concept in computer science;
- Creating an interactive, web-based tutorial on an interesting topic that you've wanted to learn about (like automata tutor: https://automata-tutor.model.in.tum.de/);
- Developing a tool, app or web-based platform to aid teaching computer science in some way; or
- Developing a tool for teaching concepts in some other discipline outside of computer science.
Examples of past projects in this seminar include:
- A tutorial on reinforcement learning using PyGame;
- TigerStudy: A web-based interface and database for automating the creation of student study groups at Princeton;
- An assignment for COS 326 involving Sudoku solving via SAT and theorem proving;
- A framework for hosting competitive COS 445 game theory assignments; and
- A practice tool to help Princeton students applying to naval flight school succeed on the ASTB-E (Aviation Selection Test Battery Exam)
If you've taken any class and can think of a way to improve it or know of some skill or idea or concept outside of computer science that you can use computer science to help teach, you can explore such a project in this seminar.
COS IW 10: Deep Learning with Small Data
Instructor: Ilia Sucholutsky
Meeting Time: Tuesdays, 1:30pm-2:50pm
Location: CS 401
Training giant models on giant datasets is an increasingly popular and successful deep learning framework, but one that requires vast data and computational resources. Unfortunately, most businesses and researchers have neither the quantity of data nor the computational power that are required to benefit from the deep learning revolution. Indeed by some estimates, for every large dataset within an organization, there are hundreds more small datasets. This is especially true in fields like medicine where it is prohibitively expensive or, simply impossible, to collect large datasets. Simply put, getting deep learning to work with small data would help democratize AI. (For a glimpse at just how extreme this problem is getting, google “GPT3 cost to train”)
While creating general data-efficient deep learning algorithms is an open research problem, in practice there is a surprisingly simple trick to make deep learning work in specific small-data settings: draw inspiration from humans. Humans are able to recognize new objects, understand new concepts, and complete new tasks after just a few examples or demonstrations, all thanks to our inductive biases. An inductive bias is a set of assumptions that a learner uses to make predictions for previously unseen inputs. For example, we assume that an object that has been rotated is still the same object (“rotation invariance”) so when we see someone doing a handstand, we still recognize them as being a person. By finding ways to encode these inductive biases into deep learning models, we can make the models almost as data-efficient as we are.
During this seminar, we will explore various popular inductive biases (e.g. Occam’s razor, translation invariance, rotation invariance, etc.) and methods of encoding them into deep learning models (e.g. data augmentation, convolutional layers, pre-training, transfer learning, etc.). Over the course of the semester, students will be encouraged to find a small-data task they find interesting and develop a deep learning model that can solve it. This process will involve thinking about how humans are able to solve that particular task (what assumptions we make, what prior knowledge we use, etc.) and then experimenting with different ways of injecting that into the models. This human-oriented approach is intended to be accessible and as a result, there are not many prerequisites. However, some prior exposure to machine/deep learning (e.g. COS 324 or equivalent) and a working knowledge of Python are strongly recommended. An interesting and well-executed project could lead to publication or presentation at a workshop or conference.
COS IW 11: Blockchain Data Science
Instructor: Pranay Anchuri
Meeting Time: Wednesdays, 11:00am-12:20pm
Location: CS 401
A decade ago, Bitcoin started as an alternative currency aiming to address the issues in central bank backed monetary systems. Today, Blockchain networks have reached a market capitalization of 2 trillion USD. These networks are being used to deploy alternatives of traditional financial instruments such as p2p payments, p2p lending etc. Governments and large financial institutions across the world are no longer ignoring these technological advancements, and the increased adoption of public blockchain networks.
Users of blockchain network interact with the network and other users via pseudo-anonymous identities. All these interactions are publicly visible. This public visibility allows anyone to analyze these user interactions without the dependence on companies to release datasets to the public. One of the main aims of this seminar is to analyze the data available on blockchain networks. Students will get to work on projects that apply ML/Data Science tools to blockchain datasets. Examples include: (1) Create high-quality and open blockchain datasets and document them by using tools such as datasheets. (2) Apply graph neural network methods for identifying blockchain usage patterns, identifying frauds, and address classification problems. (3) Apply time series methods for identifying trends and patterns in blockchain asset prices. (4) Analyze the rise of DeFi and NFTs by using social media data.
The first half of the seminar will be a few lectures focusing on the technical aspects of blockchain networks and will introduce students to various blockchain datasets. In the second half, students will work independently or in small groups to identify and work on an interesting blockchain data science project. Prior blockchain experience is not required. Programming experience with data analysis tools and libraries such as R/scikit/TensorFlow is recommended.