Information Extraction from Informal Texts

Lyle Ungar
CIS, University of Pennsylvania

Recognizing and labeling named entities such as people, places, and movies across the entire web requires highly scalable data mining methods. We use an unsupervised method to automatically extract from the web hundreds of millions of entity mentions labeled with dozens of types. The extracted data are used to train a supervised model for labeling all entity mentions on the web. This talk will describe the machine learning methods used for web scale named entity recognition, and then show one use of such extracted entities, doing comparative sentiment analysis with entities extracted from product discussion boards. (Joint work with Alex Kehlenbeck, Casey Whitelaw, Ronen Feldman, Moshe Fresko, Jacob Goldenberg, and Oded Netzer).

Web-Scale Named Entity Recognition, Casey Whitelaw, Alex Kehlenbeck, Nemanja Petrovic and Lyle Ungar, ACM 17th Conference on Information and Knowledge Management (CIKM), 2008

Extracting Product Comparisons from Discussion Boards, Ronen Feldman, Moshe Fresko, Jacob Goldenberg, Oded Netzer and Lyle Ungar, Seventh IEEE International Conference on Data Mining (ICDM), 469-474, 2007