COS 435
Resources for Projects
Please help keep this list
useful! Suggest free, stable, preferably open-source
software and free, open data sets. It is best if you can
give a recommendation from personal experience, but we will take
other suggestions subject to further exploration. Thanks!
Data Sets
Here are some data sets of possible value. Some have been used
in past COS435 projects.
IMPORTANT NOTE: Other
members of the faculty have data sets that they are willing to
share. If you need something - either a specific data set or a
specific kind of data - ask, and we'll see if it is available in the
department.
-
UCI
Machine Learning Archive - University of California
at Irving data sets, primarily for data mining tasks, but also
useful for other information analysis/search tasks. 211
Data Sets as of February 2012. Some example data sets:
- NSF Research Awards Abstracts 1990-2003
- OpinRank Review Dataset including car reviews for model
years 2007-2009 and hotel reviews for 10 cities.
- LETOR
data set: From Microsoft. The site says "a package
of benchmark data sets for research on LEarning TO Rank. This
dataset contains standard features, relevance judgments, data
partitioning, evaluation tools, and several baselines, for the
OHSUMED data collection and the '.gov' data collection."
- Amazon
Web Services (AWS) Public Data Sets: "
a centralized repository of public data sets that can be
seamlessly integrated into AWS cloud-based applications"
Examples: Sloan Digital Sky Survey, a 5 billion Web page
crawl (60TB!). Many others.
- 4
universities data set: from CMU. CS
department Web pages from various universities, hand-classified
into 7 categories.
Software
Here are some free (as far as I know) software tools of possible
value. Some have been used in past COS435 projects.
lists last updated Fri
Feb 17 14:03:58 EST 2012
Copyright
2008-2012
Andrea S. LaPaugh