CS 674/INFO 630, Fall 2007

CS 674/INFO 630, Fall 2007: Advanced Language Technologies (or, IR, NLP, and special guests;)

Prof. Lillian Lee (follow link for contact information and office hours)
~~TR 1:25-2:40, Hollister 312.~~
Final exam: Thursday, December 13, 2-4:30pm, Thurston 202.

Short description: This course is a graduate-level introduction to research fundamentals for information retrieval and natural language processing. Please see the Course description and policies handout for more information.

Tentative syllabus: Three fundamental paradigms in information retrieval: the vector-space model; the (Robertson-Spärck Jones) probabilistic retrieval paradigm; the language-modeling approach. Relevance feedback, explicit and implicit. Latent Semantic Indexing (LSI). Feature-based context-free grammars (CFGs). Tree adjoining grammars (TAGs). Parsing. The Expectation-Maximization (EM) algorithm. Maximum-entropy modeling.

Administrative info:

Cornell Code of Academic Integrity, and Acknowledging the Work of Others. I strongly encourage students to read the “Acknowledging ...” document carefully; it's quite informative.
Course description and policies (distributed 8/23/07). Includes course philosophy, course work requirements, exam dates, and tentative syllabus.
Example problem for lecture-guide preparation (distributed 8/28)
Sample midterm (from 2006) and brief solution notes (see section 2 of document)
Final(-)exam guide, including sample problems (distributed 11/29)

Resources: (reference texts, other lecture notes and slides, etc.)

The homepage for the previous running of this course may also be useful.

Lectures:

Quick links: starts of the units on: the vector space model (8/28/07); RSJ probabilistic retrieval (9/11/07); language-modeling approaches to IR (9/20/07); relevance feedback (10/2/07); implicit relevance feedback (10/11/07); the SVD and LSI (10/30/07); CFGs (11/8/07); TAGs (11/15/07); EM (11/27/07)

8/23/07: "Sense and Sensibility: Automatically Analyzing Subject and Sentiment in Human-Authored Texts"
(a prefatory lecture; students are not responsible for the material that was covered)
- Handouts: course description and policies [pdf]
- Main lecture references:
  - Rie Kubota Ando and Lillian Lee. Iterative residual rescaling: An analysis and generalization of LSI. SIGIR, pp. 154–162, 2001. [pdf]
  - Bo Pang and Lillian Lee. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. ACL, pp. 271–278, 2004. [pdf]
- Additional material and references:
  - slides for an hour-long talk on IRR: pdf (Part One), pdf (Part Two)
  - Rie Kubota Ando. The Document Representation Problem: An Analysis of LSI and Iterative Residual Rescaling. Ph.D. thesis, 2001. [pdf]
  - Cornell movie-review datasets and congressional-debate datasets for performing sentiment-analysis experiments on.
8/28/07: Basics of information retrieval; the vector-space model
- Lecture guide
- Handouts: Example problem for lecture-guide preparation [pdf]
- Lecture references:
  - TREC conference proceedings. [website]. See especially the overviews.
- Additional references:
  - David Durbin. The most influential paper Gerard Salton never wrote. Library Trends, Spring 2004. [html] An interesting sidelight on the origins of the vector space model, the assumptions underlying the VSM, and phantom citations.
  - Hui Fang, Tao Tao, and ChengXiang Zhai. A formal study of information retrieval heuristics. SIGIR, pp. 49–56, 2004. Best paper award. [pdf, pdf2]
  - Donna Harman. The history of IDF and its influences on IR and other fields. In Charting a New Course: Natural Language Processing and Information Retrieval: Essays in Honour of Karen Spärck Jones, pp. 69–79, 2005.
  - Amit Singhal. Modern information retrieval: A brief overview. IEEE Data Engineering Bulletin 24(4), pp. 35–43, 2001. [pdf]
  - Karen Spärck Jones. IDF term weighting and IR research lessons. Journal of Documentation 60(5):521–523 (2004). [pdf]
8/30/07: length normalization (who'da think?)
- Lecture guide
- Lecture references:
  - Amit Singhal, John Choi, Donald Hindle, David D. Lewis, Fernando Pereira. AT&T at TREC-7. [pdf, pdf.gz]
9/4/07: pivoted document-length normalization
- Lecture guide
- Lecture references:
  - Amit Singhal, Chris Buckley, and Mandar Mitra. Pivoted document length normalization. SIGIR 1996. [pdf, pdf2]
- Additional references:
  - Tze Leung Chung, Robert Wing Pong Luk, Kam Fai Wong, Kui Lam Kwok, Dik Lun Lee. Adapting pivoted document-length normalization for query size: Experiments in Chinese and English. ACM Transactions on Asian Language Information Processing (TALIP) 5(3):245--263 (2006). [pdf]
  - Jaap Kamps, Maarten de Rijke, and Börkur Sigurbjörnsson. Length normalization in XML retrieval. SIGIR 2004 [abstract, pdf, pdf of 2005 version in Information Retrieval]
9/6/07: remarks on evaluation (of SBM SIGIR '96, and beyond)
- Lecture references:
  - Chris Buckley and Ellen M. Voorhees. Retrieval evaluation with incomplete information. SIGIR, pp. 25–32, 2004. [pdf, pdf2]
  - Stefan Büttcher, Charles L. A. Clarke, Peter C. K. Yeung, and Ian Soboroff. Reliable Information Retrieval Evaluation with Incomplete and Biased Judgements. SIGIR, pp. 63–70, 2007 [pdf, pdf2]
  - Abdur Chowdury, M. Catherine McCabe, David Grossman, Ophir Frieder. Document normalization revisited. SIGIR poster (2002). [pdf, pdf2]
  - Tetsuya Sakai. Alternatives to Bpref. SIGIR, pp. 71–78, 2007. [pdf, pdf of talk slides]
  - Section 3.3.2 of Ellen Voorhees and Donna Harman. Overview of the Eighth Text REtrieval Conference (TREC-8) [good ps, poor pdf].
  - Justin Zobel. How reliable are large-scale information retrieval experiments? SIGIR 1998 [pdf, pdf2, ]
- Additional references:
  - Ben He, Iadh Ounis. On setting the hyper-parameters of term frequency normalization for information retrieval. TOIS 25(3), 2007. [pdf]. For a different retrieval paradigm, but still suggestive in the context of this lecture.
  - Mark Sanderson and Hideo Joho. Forming test collections with no system pooling. SIGIR 2004. [pdf]
9/11/07: Introduction to probabilistic retrieval (Robertson-Spärck Jones version)
- Lecture references:
  - Karen Spärck Jones, Steve Walker, and Stephen Robertson. A probabilistic model of information retrieval: Development and comparative experiments. Information Processing and Management, pp. 779–808, 809–840 (2000). [html of preface, pdf of Part 1, pdf of Part 2]
  - William S. Cooper. Some inconsistencies and misidentified modeling assumptions in probabilistic information retrieval. ACM Transactions on Information Systems (TOIS), pp. 100–111, 1995. [pdf]
  - Michael D. Gordon and Peter Lenk. When is the probability ranking principle suboptimal? Journal of the American Society for Information Science 43:1–14 (1992). [pdf]
  - Stephen Robertston and Karen Spärck Jones. Relevance weighting of search terms. Journal of the American Society for Information Science 27, 129-46 (1976). [pdf]
  - C. J. “Keith” van Rijsbergen. The emergence of probabilistic accounts of information retrieval. Pp. 23–38 in In John I. Tait. ed., Charting a New Course: Natural Language Processing and Information Retrieval: Essays in Honour of Karen Spärck Jones (2005)
- Additional references:
  - Fabio Crestani, Mounia Lalmas, C. J. Van Rijsbergen, and Iain Campbell. "Is this document relevant? ...probably": A survey of probabilistic models in information retrieval. ACM Computing Surveys 30(4) (1998). [pdf, pdf2]
  - Norbert Fuhr. Probabilistic models in information retrieval. The Computer Journal (1992) [pdf]
  - Bojidar Mateev, Elke Mittendorf and Peter Schäuble. Where the Linked Dependence Assumption Fails and How to Move Beyond It. 19th Annual BCS-IRSG Colloquium on IR. [pdf]
  - M. E. Maron. Probabilistic approaches to the document retrieval problem. SIGIR, pp. 98–107 (1982). [pdf] Discusses models for interpreting the probability of relevance, and also gives a personal historical perspective on information science (as Maron construes it).
  - Stephen Robertson. The probability ranking principle in IR. Journal of Documentation 33 (1977). Reprinted in Spärck Jones and Peter Willett, eds., Readings in Information Retrieval (1997). Appendix gives an example where ranking by probability of relevance is suboptimal.
9/13/07: Probabilistic retrieval with binary attribute variables: derivations of the IDF
- Lecture guide
- Lecture references:
  - W. Bruce Croft and D. Harper. Using probabilistic models of information retrieval without relevance information. Journal of Documentation 35: 285–295 (1979). Reprinted in Spärck Jones and Peter Willett, eds., Readings in Information Retrieval (1997).
  - L. Lee. IDF revisited: A simple new derivation within the Robertson-Spärck Jones probabilistic model. SIGIR pp. 751–752, 2007. [pdf, pdf2]
  - Stephen E. Robertson and Karen Spärck Jones. Relevance weighting of search terms. Journal of the American Society for Information Science (JASIS) 27(3):129–146, 1976. Reprinted in Peter Willett (ed.), Document Retrieval Systems, pp. 143–160, 1988. [pdf].
  - Stephen E. Robertson and Steve Walker. On relevance weights with little relevance information. SIGIR, pp. 16–24 (1997) [pdf].
- Additional references:
  - Kenneth W. Church and William A. Gale. Inverse document frequency (IDF): A measure of deviations from Poisson. Third Workshop on Very Large Corpora (WVLC), pp. 121–130, 1995. [pdf, ps]
  - Kishore Papineni. Why inverse document frequency? NAACL (2001). [pdf]
  - Stephen Robertson. Understanding inverse document frequency: On theoretical arguments for IDF. Journal of Documentation 60(5):503–520 (2004). [pdf]
  - Arjen P. de Vries and Thomas Roelleke. Relevance Information: A Loss of Entropy but a Gain for IDF? SIGIR (2005). [pdf, pdf2]
9/18/07: Probabilistic retrieval with attribute-count variables: Poisson-based models
- Lecture guide
- Lecture references:
  - Abraham Bookstein and D. R. Swanson. Probabilistic models for automatic indexing. Journal of the American Society for Information Science, 25(5):312–318 (1974). [pdf]
  - Stephen P. Harter. A probabilistic approach to automatic keyword indexing, part I: On the distribution of specialty words in a technical literature. Journal of the American Society for Information Science 26(4):197–206, 1975. [pdf]
- Additional references:
  - Eugene L. Margulis. N-Poisson document modelling. SIGIR, pp. 177–189 (1992). [pdf]
9/20/07: Derivation of BM/Okapi term weighting. Intro to the language-modeling paradigm for IR
- Lecture guides: part one, part two
  - John Lafferty and Chengxiang Zhai. Probabilistic relevance models based on document and query generation. In Croft and Lafferty, eds., Language Modeling and Information Retrieval (2003). [pdf]
  - Jay M. Ponte and W. Bruce Croft. A language modeling approach to information retrieval. SIGIR (1998). [pdf]
  - Stephen E. Robertson and Steve Walker. Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. SIGIR, pp. 232–241 (1994) [pdf, pdf2]. Reprinted in Spärck Jones and Peter Willett, eds., Readings in Information Retrieval (1997).
- Additional references:
  - S. E. Robertson and S. Walker and S. Jones and M. M. Hancock-Beaulieu and M. Gatford. Okapi at TREC-3. Proceedings of Trec 3, pp. 109–126, 1995. [pdf, ps.gz]
  - S. E. Robertson, S. Walker, and M. Beaulieu. Okapi at TREC-7: automatic ad hoc, filtering, VLC and interactive. Proceedings of TREC 7, pp. 253–264, 1999. [ps.gz (might be better),pdf.gz, pdf2]
9/25/07: More on the LM approach
- Lecture guide
- Lecture references:
  - John Lafferty and Chengxiang Zhai. Document language models, query models, and risk minimization for information retrieval. SIGIR (2001) [ps, pdf; long version]
  - John Lafferty and Chengxiang Zhai. Probabilistic relevance models based on document and query generation. In Croft and Lafferty, eds., Language Modeling and Information Retrieval (2003). [pdf, ps]
  - Victor Lavrenko and W. Bruce Croft. Relevance-based language models. SIGIR (2001) [pdf, pdf2]
  - ChengXiang Zhai and John Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. SIGIR (2001). [pdf, ps, pdf of long version (TOIS 22(2):179--214)]
- Additional references:
  - Language modeling and information retrieval. The Lemur project. [html]
  - Xiaoyong Liu and W. Bruce Croft. Statistical language modeling for information retrieval. Annual Review of Information Science and Technology 39 (2003). [pdf]
  - David MacKay and Linda Peto. A hierarchical Dirichlet language model. Natural Language Engineering 1(3):289–307 (1995). [pdf]
9/27/07: Completion of the LM approach
- Lecture references:
  - Djoerd Hiemstra and Wessel Kraaij. Twenty-One at TREC-7: Ad Hoc and Cross Language track. [pdf.gz]
  - ChengXiang Zhai and John Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. SIGIR (2001). [pdf, ps, pdf of long version (TOIS 22(2):179--214)]
10/2/07: Introduction to relevance feedback
- Lecture references:
  - Stephen Robertson. On term selection for query expansion. Journal of Documentation 46(4):359–364 (1990) [pdf]
  - J. J. Rocchio. Relevance feedback in information retrieval. In Gerard Salton, ed., The SMART retrieval system: Experiments in Automatic Document Processing, pp. 313–323 (1971).
  - Ian Ruthven and Mounias Lalmas. A survey on the use of relevance feedback for information access systems. Knowledge Engineering Review, 18(1) (2003) [pdf]
  - Jaime Teevan, Eytan Adar, Rosie Jones and Michael Potts. Information Re-Retrieval: Repeat Queries in Yahoo's Logs. SIGIR (2007) [pdf,pdf2]
- Additional references:
  - Y.K. Chang, C. Cirillo, J. Razon. Evaluation of feedback retrieval using modified freezing, residual collection, and test and control groups. In G. Salton, ed., The SMART retrieval system — experiments in automatic document processing, pp. 355–370 (1971).
  - Gerard Salton and Chris Buckley. Improving retrieval performance by relevance feedback. JASIS 41(4): 288–297 (1990). [pdf]
10/4/07: Automatic query expansion (AQE) and interactive query expansion (IQE)
- Lecture references:
  - Ian Ruthven. Re-examining the potential effectiveness of interactive query expansion. SIGIR, pp. 213–220 (2003). [pdf, pdf2] Best paper award.
10/11/07: Implicit feedback. Further notes on language models (in preparation for combining them with implicit feedback).
- Lecture references:
  - Diane Kelly and Jaime Teevan. Implicit feedback for inferring user preference: A bibliography. SIGIR Forum 37(2), pp. 18–28 (2003). [pdf, pdf2]
  - Masahiro Morita and Yoichi Shinoda. Information filtering based on user behavior analysis and best match text retrieval. SIGIR, pp. 272–281 (1994). [pdf]
  - Jarkko Salojärvi, Kai Puolamäki and Samuel Kaski. Implicit Relevance Feedback from Eye Movements. ICANN (2005). [pdf]
  - Jaime Teevan, Susan T. Dumais, and Eric Horvitz. Personalizing search via automated analysis of interests and activities. SIGIR, pp. 449–456 (2005). [pdf, pdf2]
- Additional references:
  - Diane Kelly and Nicholas J. Belkin. Display time as implicit feedback: Understanding task effects. SIGIR (2004). [pdf, pdf2]
  - Jarkko Salojärvi, Kai Puolamäki, Jaana Simola, Lauri Kovanen, Ilpo Kojo and Samuel Kaski. Inferring Relevance from Eye Movements: Feature Extraction. Technical Report, Computer and Information Science, University of Helsinki (2005). [pdf]. Associated PASCAL challenge, 2005 [homepage]
10/16/07: More on language models, in preparation for combining them with relevance feedback
- Lecture references:
  - Thomas Cover and Joy A. Thomas. Elements of information theory. Wiley (1991) [book homepage].
  - John Lafferty and ChengXiang Zhai. Document language models, query models, and risk minimization for information retrieval. SIGIR (2001) [ps, pdf; long version]

In-class midterm

10/23/07 An LM-based approach to implicit feedback
- Lecture guide
- Lecture references:
  - Xuehua Shen, Bin Tan, and ChengXiang Zhai. Context-sensitive information retrieval using implicit feedback. SIGIR, pp. 43–50 (2005). [pdf, pdf2]
  - John Lafferty and ChengXiang Zhai. Document language models, query models, and risk minimization for information retrieval. SIGIR (2001) [ps, pdf; long version]
10/25/07: Clickthrough data as implicit feedback
- Lecture guide
- Lecture references:
  - Filip Radlinski and Thorsten Joachims. Query chains: learning to rank from implicit feedback. KDD, pp. 239–248 (2005) [pdf, pdf2] Co-winner, best student paper award.
  - Thorsten Joachims. Optimizing search engines using clickthrough data. KDD, 2002 [pdf, pdf2]
  - Thorsten Joachims, Laura Granka, Bing Pan, Helene Hembrooke, and Geri Gay. Accurately interpreting clickthrough data as implicit feedback. SIGIR, pp. 154–161 (2005). [pdf, pdf2]
  - Thorsten Joachims, Laura Granka, Bing Pan, Helene Hembrooke, Filip Radlinski, and Geri Gay. Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search. ACM Transactions on Information Systems (TOIS), 25(2) (2007) [pdf,pdf2]
- Additional references:
  - Ralf Herbrich, Thore Graepel, and Klaus Obermayer. Large margin rank boundaries for ordinal regression. Advances in Large Margin Classifiers, pp. 115–132 (2000) [ps.gz]
10/30/07: The term-document matrix, revisited (preparation for the SVD)
- Lecture guide
- Lecture references:
  - Lloyd N. (Nick) Trefethen and David Bau III. Numerical Linear Algebra (1997). [book homepage] See lectures one, four and five (all from part I, "Fundamentals").
11/1/07: The singular value decomposition and “approximating ± convex hulls ”
- Additional references:
  - Gene H. Golub and Charles F. Van Loan. Matrix Computations, third edition (1996). [book homepage]. The "Bible" on, well, matrix computations. Extensive coverage of the singular value decomposition (SVD), although certainly not as gentle an introduction as in class.
11/6/07: "Few-factor" representations (LSI, pLSI, and others)
- Lecture guide
- Lecture references:
  - David M. Blei, Thomas L. Griffiths, Michael I. Jordan, and Joshua B. Tenenbaum. Hierarchical topic models and the nested Chinese restaurant process. NIPS 16 (2003). [pdf]
  - Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, and Richard Harshman. Indexing by latent semantic analysis. Journal of the American Society of Information Science 41(6):391–407 (1990). [pdf,pdf2]
  - Thomas Hofmann. Probabilistic latent semantic indexing. SIGIR, pp. 50–57 (1999). [pdf,pdf2]
- Additional references:
  - Rie Kubota Ando and Lillian Lee. Iterative Residual Rescaling: An analysis and generalization of LSI. SIGIR, pp. 154–162 (2001) [pdf,pdf2]
  - Yossi Azar, Amos Fiat, Anna Karlin, Frank McSherry, and Jared Saia. Spectral analysis of data. Symposium on Theory of Computing (STOC), pp. 619–626 (2001) [pdf,(longer?) ps]
  - Thomas Hofmann and Jan Puzicha. Statistical models for co-occurrence data. AI Memo 1625, MIT (1998). [pdf]
  - Thomas K. Landauer and Susan T. Dumais. A solution to Plato's problem: The Latent Semantic Analysis theory of acquisition, induction and representation of knowledge. Psychological Review 104(2):211–240 (1997). [pdf,html,html2]
  - Daniel D. Lee and H. Sebastian Seung. Learning the parts of objects by non-negative matrix factorization. Nature 401:788–791 (1999) [pdf]
  - Christos H. Papadimitriou, Hisao Tamaki, Prabhakar Raghavan, and Santosh Vempala. Latent semantic indexing: a probabilistic analysis. Symposium on Principles of Database Systems, pp. 159–168 (1998) [pdf]. Journal version: Journal of Computer and Systems Science 61(20):217–235 (2000) [ps]
  - Fernando Pereira, Naftali Tishby, and Lillian Lee. Distributional clustering of English words. Proceedings of the ACL (1993). [pdf]
  - Donald E. Powers, Jill C. Burstein, Martin Chodorow, Mary E. Fowles, and Karen Kukich. Stumping e-rater: Challenging the validity of automated essay scoring (2001). [pdf] An interesting study in which knowledgeable humans were asked to produce essays that would be incorrectly scored by automatic methods.
  - Michael Steele and David Aldous. Introduction to the interface of probability and algorithms. Probability and Algorithms: A Panel Report. National Academy of Sciences (1993). [pdf] One subsection gives a quick calculation for the expected number of occupied tables in the Chinese restaurant process.
  - Demo: score your own essay (on psycholinguistics) with LSI. [home page]
11/8/07: Modeling syntactic structure: context-free grammars
- Lecture guide
- Lecture references:
  - Noam Chomsky. Three models for the description of language. In I.R.E. Transactions on Information Theory 2, pp. 113–124.
  - Igor Mel'čuk. Studies in Dependency Syntax, Karoma (1979).
  - Lucien Tesnière. Elements de Syntaxe Structurale, Klincksieck (1959).
- Additional references:
  - James Allen. Section 5.2: Movement phenomena in language. Natural language understanding, second edition. Benjamin/Cummings (1995).
  - Christopher Culy. The complexity of the vocabulary of Bambara. Linguistics and Philosophy 8:345–351 (1985). [Springer link]
  - Daniel Jurafsky and James H. Martin. Chapter 9: Context-free grammars for English. Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition. Prentice Hall (2000). [homepage] In the second edition, this is chapter 12 (draft form).
  - Lillian Lee. “I'm sorry Dave, I'm afraid I can't do that”: Linguistics, Statistics, and Natural Language Processing circa 2001. In Computer Science: Reflections on the Field, Reflections from the Field, National Academies Press, pp. 111–118, 2004. [pdf, html]
  - Geoff Pullum, 1986. Footloose and context-free. Natural language and linguistic theory 4, pp. 283–289. Reprinted in The great Eskimo vocabulary hoax, U. of Chicago Press 1991. A very fun read.
  - Stuart Shieber. Evidence against the context-freeness of natural language. Linguistics and Philosophy 8:333–343 (1985). [pdf,Springer link]
4/13/06: Augmented context-free grammars
- Lecture guide
- Lecture references:
  - James Allen. Section 5.3: Handling questions in context-free grammars. Natural Language Understanding, second edition. Benjamin/Cummings (1995).
- Additional references:
  - Daniel Jurafsky and James H. Martin. Chapter 11: Features and unification. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice Hall (2000). [homepage] In the second edition, this is chapter 16 (draft form).
  - Fernando C. N. Pereira and Stuart M. Shieber. Prolog and natural-language analysis. CLSI (1987) [pdf]. Sections 4.2.3-4.2.5 and 4.3.3 (pages 102ff. of the pdf file) describe syntactic aspects of filler-gap dependencies.
11/15/07: Introduction to tree-adjoining grammars
- Additional references:
  - Aravind K. Joshi and Yves Schabes. Tree-adjoining grammars. In Grzegorz Rozenberg and Arto Salomaa, editors, Handbook of Formal Languages, volume 3 (Beyond words), pp. 69–123 (1997). [pdf]
  - K. Vijay-Shankar and Aravind K. Joshi. Some computational properties of tree adjoining grammars. Proceedings of the 23rd ACL, pp. 82–93 (1985). [pdf,pdf2]
  - The XTAG project. [home page]. Site includes a tutorial, grammars for English and Korean, synchronous-tag implementation, and a supertagging demo.
11/20/07: adjunction constraints and feature-based TAGS
- Lecture references:
  - K. Vijay-Shanker and Aravind K. Joshi. Feature structures based tree adjoining grammars. COLING, pp. 714–719 (1988) [pdf]
- Additional references:
  - Anne Abeillé and Yves Schabes. Parsing idioms in lexicalized TAGs. Proc. of the 4th EACL, pp. 1--9 (1989). [pdf]
  - Yasubumi Sakakibara, Michael Brown, Richard Hughey, I. Saira Mian, Kimmen Sjölander, Rebecca C. Underwood, and David Haussler. Stochastic context-free grammars for tRNA modeling. Nucleic Acids Research 22(23):5112–20 (1994). [pdf]
  - David B. Searls. The linguistics of DNA. American Scientist 80(6):579–591. (1992)
  - Yasuo Uemura, Aki Hasegawa, Satoshi Kobayashi, Takashi Yokomori. Tree adjoining grammars for RNA structure prediction. Theoretical Computer Science 210(2): 277–303, special issue on genome informatics (1999) [pdf]
11/27/07: The EM algorithm, part one
- Lecture references:
  - Zhiyi Chi and Stuart Geman. Estimation of probabilistic context-free grammars. Computational Linguistics 24(2):298–305 (1998). [pdf]
  - Dempster, Laird, and Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, B, 39(1):1–38 (1977) [ocr]
  - Frederick Jelinek. Statistical Methods for Speech Recognition, MIT Press, 1997.
  - Detlef Prescher. A tutorial on the Expectation-Maximization algorithm including maximum-likelihood estimation and EM training of probabilistic context-free grammars. Presented at the 15th European Summer School in Logic, Language and Information (ESSLLI-03). [pdf]
- Additional references: Note that notation varies across different presentations, with the same variable often used to refer to different things, unfortunately.
  - Adam Berger. Convexity, maximum likelihood and all that. [ps]
  - Michael Collins. The EM algorithm (1997) [ps]
  - Geoffrey J. McLachlan and Thriyambakam Krishnan. The EM algorithm and extensions. Second edition. Wiley (2006). [errata for first printing]
  - Ted Pedersen. The EM algorithm: selected readings (2001) [pdf]
  - Stefan Riezler. Getting EM to work (talk at EMNLP 2001) [ps.gz]
11/29/07: Conclusion of EM, and the course
- Lecture references:
  - Detlef Prescher. A tutorial on the Expectation-Maximization algorithm including maximum-likelihood estimation and EM training of probabilistic context-free grammars. Presented at the 15th European Summer School in Logic, Language and Information (ESSLLI-03). [pdf]
  - Andreas Stolcke. An Efficient Probabilistic Context-Free Parsing Algorithm that Computes Prefix Probabilities. Computational Linguistics 21(2):165–201. [pdf,html,tech-report ps]
- Additional references: