Princeton University
Computer Science Dept.

Richard L. Smith '70 Freshman Seminar

Google and Ye Shall Find???


Andrea LaPaugh

FRS 117

Fall 2007


Directory
                   General Information   |   Schedule and Assignments    |    Blog  (login for announcements)

 

General information pages for the remainder of the semester (subject to additions):




click here for weeks 4 through 6

Nov. 7:   No Class

Written assignment due this week:  Please visit the Assignment 5 page.

Remember: Final paper topic description due Monday, November 5, 2007 at 5pm
See Project Guidelines online.



Week 7, Nov. 14:

Guest instructors Prof. Edward Felten, Director of the Center for Information Technology Policy,  and David Robinson,  Associate Director of the Center for Information Technology Policy.

Topics:
Social  Issues
focus on privacy


Week 8 and first half Week 9, Nov. 21 and 28:  

Guest instructor Prof. Moses Charikar of the Computer Science Dept.

Topics:
Quality of search engine results
        trust in results
        quality of results versus goals of search
Improving search engine results
Comparing search engines

Class discussion:  Think about the quality of the searches you have done in the past year.  What kinds of searches  have been easiest?  What have been most difficult for you?  Do the easy searches share common features?  the difficult searches?  What improvements to search - either new options for specifying the search or new ways to rank or present the results - would be helpful?  We discussed trusting the results of search a bit earlier in the semester;  what issues in trusting results would you like to revisit or introduce?  

For our discussion of improving search engine results, we will start considering what search engines other than Google have to offer that is different, beginning with some well-known techniques that Google chooses not to implement.  We will eventually move to how we might fairly compare search engines.  In preparation for these topics, try out some of the search engines you don't usually use.  Certainly try some searches on Yahoo if you don't use it at least occasionally already.  Other well-know engines to try:  MSN Windows Live and Ask  (AOL Web search is "enhanced" by Google, so we really don't expect to see organic search results that are much different than from Google.  Of course the user experience and the ads may be different.)


Written assignment due this week:  NONE

Reading for discussion today:

*(Originally for week 6) Andrei Broder,  A taxonomy of web search, ACM Special Interest Group on Information Retrieval (SIGIR) Forum, Vol. 36 (2), Fall 2002, pp. 3-10.  This article is mentioned by Battelle in Chapter 2.   Is this paper, written over 5 years ago, still relevant today?

(Originally for week 6) Some articles on Wikipedia as we think about trusting search and sources:
*The word on Wikipedia: Trust but verify,  MSNBC and NBC News,  March 29, 2007.
*Why you can't cite Wikipedia in my class Viewpoint piece in Communications of the ACM, Vol.50(9), Sept. 2007, by the Middlebury College history professor referred to in the MSNBC article above.   This piece has some inaccuracies of its own (can you spot them?), but presents Professor Waters view in his own words.
*Wikipedia 2.0 - now with added trust , NewScientist.com News Service, 20 September 2007.
*Why Wikipedia Must Jettison Its Anti-Elitism, by lsanger, on site kuro5hin.org ,  Fri Dec 31, 2004.  This is the article by Larry Sanger referred to in the MSNBC article above.
*Exploring the Digital Universe, eLearn Magazine,  an answer to Wikipedia.

*About the Open Directory Project


References for technical material:
*Crafting Your Query by using Special Characters in GoogleGuide by Nancy Blachman
*Google Web Search Help Center: Advanced Search Made Easy
*Boolean Searching on the Internet: A Primer in Boolean Logic on Internet Tutorials, maintained by Laura Cohen, Web Support Librarian, State University of New York (SUNY) at Albany.


Second half Week 9, Nov. 28:

Guest instructor David Robinson,  Associate Director of the Center for Information Technology Policy.

Topics:
Intellectual Property and Copyright
Google Book Search

Class discussion:  Consider the claims of Google and those of the publishers and authors opposing Google's copying of copyrighted books without permission as discussed in the Salon article below.  What do you think and why?

Written assignment due this week:  NONE

Reading for discussion today:
Throwing Google at the book by Farhad Manjoo, Salon.com,  Nov 9, 2005.


Week 10, Dec. 5:

Topics:
Future of search
Understanding Intent
Human interventions
Concept-based search
Using the "Database of Intentions"

Class discussion:  Considering the new information you have from class last week and from the reading, what do you think are the most pressing needs for search improvement? What approaches do you think are most promising?

Written assignment due this week:  NONE

Reading for discussion today:

* Battelle, Chapter 11 ("Perfect Search")
* Some articles on new search engine features:
Danny Sullivan SEland blog: Yahoo Search Assist, July 25, 2007
BusinessWeek: crowd wisdom vs Google's genius, Dec 27, 2006.  Note that, as far as I can see, Wikiasari does not exist.
NEW!Update 1/7/08: Wikia Search launched: Wiki Citizens Taking on a New Area: Searching by Miguel Helft, The New York Times, January 7, 2008.
VentureBeat: Google’s first social search step — your vote, please, Nov. 29, 2007
Google page on experiment reported in VentureBeat

* Two clustering meta-search engines to try:
Clusty the clustering search engine
iBoogie - MetaSearch Document Clustering Engine and Personalized Search Engines Directory



Week 11, Dec. 12:

Topics:
The Semantic Web
Preserving digital content
Archiving the Web
Searching non-text media by features rather than keyword


Class discussion:  Be prepared to discuss the article "The Semantic Web" by Berners-Lee et. al. .  (Recall that Tim Berners-Lee is credited as the inventor of the World Wide Web.)  This article is over 6 years old.  Have you encountered any tools that approach the functionality described in this article?   What do you think would be needed to achieve the vision in the article?   Are you eager for such a tool?  What would you pay for it?  (The other readings point to some commercial projects.)

Consider the task of preserving knowledge in digital form.  Should everything be archived?  If not, then what?  This problem is older than the Web.  How many non-print forms of recording (any media) have you encountered that have disappeared or are disappearing?

We'll look at some methods of searching visual and audio media without depending on text labels.  Look at  and try out the sites listed below.


Written assignment due this week:  Please visit the Assignment 6 page.

Reading
for discussion today:
The Semantic Web:
*The Semantic Web by Tim Berners-Lee, James Hendler, and Ora Lassila.  Scientific American, May 2001.  The link is to a restricted copy of the pdf file on our course Web site.  Scientific American is available online through the Princeton University Library.
*What I Meant to Say Was Semantic Web, NY Times October 19, 2007.
*Connotate Technologies – Premium Web Data Extraction Solution for the “Predictive Web” (Is this offering internal semantic webs?)

Archiving Projects:
*Internet Archive: About
*The Internet gives birth to an 'official' online library,  Pittsburgh Post-Gazette Sunday, June 24, 2007.
*Library of Congress Web Archiving: MINERVA Home Page (Mapping the INternet Electronic Resources Virtual Archive)
*The Library of Congress: Web Capture
*The Library of Congress: Importance of Digital Preservation

Searching non-text media by non-text features:
*Content Based Visual Image Search : Tiltomo
*retrievr - search by sketch / search by image
*Back to the Drawing Board: Using Mouse-Made Sketches, Retrievr Searches Flickr Photos, (Time Waster column by Aaron Rutkoff), The Wall Street Journal Online, October 17, 2006

*Princeton 3D Model Search Engine

*Melodyhound: Search within the Music
watch for more on music




Week 12,  Thursday January 10, noon-2:50pm, Forbes Multi-purpose room  (across the hall from our usual room)

Topics:
The future of Google
Google serving all your needs
Monopoly of information
"Computing in the Clouds"
compare peer-to-peer systems


Class discussion:   What is Google emphasizing for the near future?  As a consumer, what would you like from Google in the future?

One direction receiving a lot of press is "Computing in the Clouds" -- and not only as a big part of Google's future, but as a big part of computing in general.  Do you use "cloud computing" services? What are the pros and cons?

Peer-to-peer networking has become famous for allowing users share content and infamous for allowing users to side-step copyright.  Do you use peer-to-peer systems (e.g. Kazaa, BitTorrent)?   Do you expect "peer-to-peer" services and "cloud computing" to conflict, co-exist, or complement each other?

From Google itself:  “Google's mission is to organize the world's information and make it universally accessible and useful.”   What is Google's long-term vision for achieving this mission? 


Written assignment due this week:  NONE


Reading for discussion today:
* Review Battelle, Chapters 10, 11 and Afterword
* Look over Google's offerings beyond search:  Google Services & Tools and Google Labs

Computing in the Clouds:
* Software via the Internet: Microsoft in ‘Cloud’ Computing by John Markoff, The New York Times, September 3, 2007.
Google Gets Ready to Rumble With Microsoft by Steve Lohr and Miguel Helft, The New York Times, December 16, 2007.
* I.B.M. to Push ‘Cloud Computing,’ Using Data From Afar by Steve Lohr, The New York Times, November 15, 2007.
*Computing in the Cloud? I’ll Keep my Data, Thank You, Dec. 17, 2007, blog by Michael Zimmer, 2007-2008 Microsoft Resident Fellow at the Information Society Project at Yale Law School.

Peer-to-peer:
* Peer-to-peer (P2P) and How Kazaa Works (Very brief introduction.)
* BitTorrent - What is it?  from BitTorrent Help. (Very brief introduction.)
* Perspective: The P2P mistake at Ohio University by Ashwin Navin, CNET News.com, May 7, 2007.  (by president and co-founder of BitTorrent.)


References for technical material:
* Computing in the Clouds, by Aaron Weiss.  netWorker, Volume 11, Issue 4 (Dec. 2007), ACM, pp. 16-25.
Peer-to-peer - Wikipedia, the free encyclopedia


Final paper due Tuesday, January 15, 2008 at 5pm !




last revised Mon Jan  7 12:05 EST 2008
Copyright  2007,  2008 Andrea S. LaPaugh