CS 775: Seminar in Natural Language Understanding

Fall 1997

[ Topics | Resources | This Week ]

Topics

Friday, September 12
Zavrel and Daelemans. Memory-Based Learning: Using Similarity for Smoothing. Proceedings of ACL/EACL 97.
Friday, September 19
Mary Elaine Califf and Raymond J. Mooney. Relational Learning of Pattern-Match Rules for Information Extraction. Working Papers of ACL-97 Workshop on Natural Language Learning.

In addition to presenting the paper, Sam will make some brief comparisons with the CRYSTAL and AUTOSLOG systems. It's not necessary to read about these systems, but if you're interested, here are a couple of references:
- AUTOSLOG
- CRYSTAL
- Underlying ML framework for the Rapier paper.
Friday, September 26
M. Mitra, C. Buckley, A. Singhal, and C. Cardie. An Analysis of Statistical and Syntactic Phrases. 5TH RIAO Conference, Computer-Assisted Information Searching On the Internet, 1997.
Friday, October 3
Adwait Ratnaparkhi. A Linear Observed Time Statistical Parser Based on Maximum Entropy Models. EMNLP-2, 1997.
Friday, October 17
Eugene Charniak. Statistical parsing with a context-free grammar and word statistics. Proceedings of the Fourteenth National Conference on Artificial Intelligence. AAAI Press/MIT Press, Menlo Park, 1997.
Friday, October 24
CS 775 will meet at the Cognitive Studies Colloquium. The speaker is:

Leslie Pack Kaelbling, Computer Science Department, Brown University
Co-sponsored with Department of Computer Science
3:30 PM, 202 Uris Hall (Refreshments at 3:15)
"Planning and Learning in Uncertain Environments"

To prepare for the seminar, you can read about her research interests on Leslie's home page. The talk will probably be based on the papers listed below. Since we can't read all of them, pick a paper and skim it. Claire recommends the JAIR survey article.
- Leslie Pack Kaelbling, Michael L. Littman, and Anthony R. Cassandra. Planning and Acting in Partially Observable Stochastic Domains. Unpublished. Postscript (27 pages).
- Michael L. Littman, Anthony R. Cassandra, and Leslie Pack Kaelbling. Learning Policies for Partially Observable Environments: Scaling Up. Proceedings of the Twelfth International Conference on Machine Learning, 1995. Postscript (9 pages).
- Michael L. Littman, Thomas L. Dean, and Leslie Pack Kaelbling. On the Complexity of Solving Markov Decision Problems. Proceedings of the Eleventh International Conference on Uncertainty in Artificial Intelligence, 1995. Postscript (9 pages).
- An overview to the whole area of reinforcement learning is available from her home page as well:
  
  Leslie Pack Kaelbling, Michael L. Littman, and Andrew W. Moore. Reinforcement Learning: A Survey. Journal of Artificial Intelligence Research, Volume 4, 1996. Postscript (40 pages). HTML version.
Friday, October 31
CS 775 will meet at the Cognitive Studies Colloquium again. The speaker is:

Geoffrey Hinton, Department of Computer Science, University of Toronto
Co-sponsored with Department Computer Science and the Department of Psychology Developmental Training Grant
3:30 PM, Uris Auditorium (Refreshments at 3:15)
"Learning and Perceptual Inferences in Hierarchical Communities of Experts"

Also note that Fernando Pereira is giving a talk on Tuesday, October 28, at the AI Seminar.

"Aggregate and mixed-order Markov models for statistical language processing"
Fernando Pereira
AT&T Labs
Tuesday, Oct. 28, 4:30pm, UPSON 5130
Refreshments will be served!
Friday, November 7
Dave will be giving the long-awaited "wild card" talk about the Base NP component of EMPIRE. No reading for this week.
Friday, November 14
Ellen Riloff and Jessica Shepherd (University of Utah). A Corpus-Based Approach for Building Semantic Lexicons . EMNLP-2, 1997.

Semantic knowledge can be a great asset to natural language processing systems, but it is usually hand-coded for each application. Although some semantic information is available in general-purpose knowledge bases such as WordNet and Cyc, many applications require domain-specific lexicons that represent words and categories for a particular topic. In this paper, we present a corpus-based method that can be used to build semantic lexicons for specific categories. The input to the system is a small set of seed words for a category and a representative text corpus. The output is a ranked list of words that are associated with the category. A user then reviews the top-ranked words and decides which ones should be entered in the semantic lexicon. In experiments with five categories, users typically found about 60 words per category in 10-15 minutes to build a core semantic lexicon.

Resources

ACL97
35th Annual Meeting of the Association for Computational Linguistics.
EMNLP-2
Second Conference On Empirical Methods in Natural Language Processing.
Computation and Language E-Print Archive

David Pierce
pierce@cs.cornell.edu