|   | Computer Science 435Information Retrieval, Discovery, and Delivery
 
 Andrea
                LaPaugh
 | Spring 2017 | 
      
    
    
      
Directory 
       General Information | Schedule
          and Assignments |  Project Page  | Announcements
    
    
 
    Course Summary
     
    
    This course examines the methods used to gather, organize and search
    for information in large digital collections (e.g. web search
    engines).   It also explores the discovery of information
    through the analysis of relationships between items of interest,
    including both information items and social objects.  We study
    classic techniques of indexing documents and searching text and also
    algorithms that exploit properties of the Web (e.g. links),  of
    social networks and of other digital collections, including
    multimedia collections. Techniques include those for relevance and
    ranking of documents, exploiting user history, clustering and
    network analysis. We also examine systems aspects of search
    technology: how distributed computing and storage are used to make
    information delivery efficient.
    Prerequisites
    COS  226
    
    and MAT 202.
    Administrative Information
    Meeting time:  Monday,
      Wednesday 1:30-2:50pm
    Meeting place:  Friend 111
    Extra meetings: If we need to make up a class due to my
    schedule, we may have a class during reading period and/or an
    evening class during the semester. Class participants will be
    consulted before any make-up class time is chosen.
    Professor: Andrea
        LaPaugh, aslp@cs. ...
      304 Computer Science Building, 258-4568
      Office hours: Tuesdays, 2:30 to 4:00pm or by
      appointment.  Easiest way to make an appointment is by email.
     Teaching Assistant: 
      
      Mayank Mahajan, mmahajan@ ...
      
      Office hours: 
      Mondays, 
      
      
      
      3:00 to 4:30pm in the Tea Room
    
    Course secretary: Mitra Kelly, 323 CS building, 258-4562,
      mkelly@cs. ...
    
    For email addresses specified above, "..." stands for princeton.edu
    
    
    
    Reading
    Required reading: 
    Options other than buying the printed books:  The print version
    of each of the three books below is available online through a
    Princeton University Library subscription to Safari books online.  
    You must access these from domain princeton.edu.  
    Also, each of the books has a version available for download as a
    pdf file.  Details are given below.
    
    Primary text book:
      
        - Manning, Christopher D.;  Raghavan, Prabhakar; Schütze,
          Hinrich, Introduction
to
              Information Retrieval, Cambridge University
          Press,  2008, reprinted 2009.  The above link is to
          the website for the book, which contains, among other things,
          links to complete html and pdf (6.6 MB) versions. The Safari
          books online version is available here.
        
We will also use selections from
        the following two books.
      
    
      
        - Rajaraman, Anand;
          Leskovec,
          Jure; Ullman,  Jeffrey
          D,   Mining
                  of Massive Data Sets.  Cambridge University
            Press. 2011.  You can download a pdf file (2.9
          MB) of the latest version of the book ( March 2014 as of this
          writing) from the book site. Safari books online offers the
          earlier printed version (2011) here. 
          I recommend the latest version.
 
- Easley, David; Kleinberg, Jon.  Networks,
              Crowds, and Markets:  Reasoning about a Highly
              Connected World, Cambridge University Press, July
          19, 2010. You can download a pdf file (18.6 MB) of a draft
            version dated June 10, 2010. The Safari books online
          version is available here.
Supplemental reading (check back for additions as we progress in the
        semester):
      
    On reserve at Engineering
        Library:
    
      
        - Croft, Bruce;  Metzler, Donald; Strohman, Trevor, 
          Search Engines: Information
            Retrieval in Practice, Addison Wesley, 2010.
 
- Langville, Amy N. and Meyer, Carl D. Google's PageRank and Beyond :
            the Science of Search Engine Rankings, Princeton
          University Press, 2006.
 
     
    
    
    
    
    
    Work of the Course 
    The course will have the following components weighted as indicated:
    
      - Problem sets 25%
- Midterm exam 15%
- Second exam  20%
 
- Class participation 5%
- Project 35%
 Problem sets
    There will be 6 problem sets distributed throughout the
    semester. 
    Exam
    There will two take-home exams during the semester, each covering
    roughly half the course material. There is no exam during final exam
    period.
    Project
    Students will do a final project in pairs.  The choice of topic
    is up to each pair, but must be related to the material of the
    course. The project must be approved in advance by the course
    instructor.  See  the project page 
    for more information and a list of suggested projects.
    Communication
    
    
    The course has a Piazza account: COS
        435 - Spring 2017.   All assignments will be
    made available on the
      Piazza course account.  Piazza will also be used for all
    course announcements and quick questions.   Students
      are responsible for registering on Piazza and adding themselves to
      the course account.  Students are also responsible
      for monitoring the postings on the Piazza cos 435 site for
      important course announcements.   Piazza is great for
    sharing questions and answers with the class (private questions
    addressed only to the instructors are also possible).  However,
    an old fashioned face-to-face meeting is still best for addressing
    deeper confusions and other technical discussions.
    
    The course schedule is on the Schedule and
      Assignments page of the course Web site. ``Handouts'' and
    copies of any slides used in class will be posted on this
    page.  Schedule changes will be made on Schedule
        and Assignments and announced on Piazza.  
    
    
    
    
    
    Syllabus 
      
    (This is the general list of topics and probably a superset of
      what we will have time to cover.  Please see Schedule and Assignments for specific
      topics and reading assignments as the semester progresses)
    
      - Modeling information objects
 
- Query models for searching
 
- Indexing and inverted files
 
- Ranking documents 
-  Using linking structure for Web content analysis
- personalized search
- recommender systems
 
- Social networks as sources of meta-information
- Discovering information from social network analysis
 
- Privacy issues
- Evaluating retrieval systems
- Web crawling 
- Document similarity 
- Clustering 
-  Non-text media search:  e.g. music, images
- Searching dynamic information sources
- System design of search engines:  distributed storage and
        computing
    A.S. LaPaugh content last changed  Tue Feb 28 12:31:11
      EST 2017