CS 674/INFO 630, Fall 2007
CS 674/INFO 630, Fall 2007: Advanced Language Technologies (or, IR,
NLP, and special guests;)
Prof. Lillian
Lee (follow link for contact information and office hours)
TR 1:25-2:40, Hollister 312.
Final exam: Thursday, December 13, 2-4:30pm, Thurston 202.
Short description: This course is a graduate-level introduction to research fundamentals for
information retrieval and natural language processing. Please see the
Course description and policies
handout for more information.
Tentative
syllabus: Three fundamental paradigms in
information retrieval: the vector-space model; the
(Robertson-Spärck Jones) probabilistic retrieval paradigm; the
language-modeling approach. Relevance feedback, explicit and
implicit. Latent Semantic Indexing (LSI).
Feature-based context-free grammars (CFGs). Tree adjoining grammars
(TAGs). Parsing. The Expectation-Maximization (EM) algorithm.
Maximum-entropy modeling.
Administrative info:
Resources: (reference texts, other lecture
notes and slides, etc.)
The homepage for the
previous running of this course may also be useful.
Lectures:
Quick links: starts of the units on: the vector
space model (8/28/07); RSJ probabilistic
retrieval (9/11/07); language-modeling
approaches to IR (9/20/07); relevance
feedback (10/2/07); implicit relevance
feedback (10/11/07); the SVD and LSI
(10/30/07); CFGs (11/8/07); TAGs (11/15/07); EM (11/27/07)
- 8/23/07: "Sense and Sensibility:
Automatically Analyzing
Subject and Sentiment in Human-Authored Texts"
(a prefatory lecture; students are not responsible for the material that was covered)
- Handouts: course description and
policies [pdf]
- Main lecture references:
- Rie Kubota Ando and Lillian Lee. Iterative residual rescaling: An analysis and generalization of LSI.
SIGIR, pp. 154–162, 2001. [pdf]
- Bo Pang and Lillian Lee. A
sentimental education: Sentiment analysis using subjectivity
summarization based on minimum cuts.
ACL, pp. 271–278, 2004. [pdf]
- Additional material and references:
- 8/28/07: Basics of information retrieval; the
vector-space model
- Lecture guide
- Handouts: Example problem for lecture-guide preparation [pdf]
- Lecture references:
- TREC conference
proceedings. [website].
See especially the overviews.
- Additional references:
- David Durbin. The most influential paper Gerard Salton never
wrote. Library Trends, Spring 2004. [html]
An interesting sidelight on the origins of the vector space model, the
assumptions underlying the VSM, and
phantom citations.
- Hui Fang, Tao Tao, and ChengXiang Zhai. A formal study of
information retrieval heuristics. SIGIR, pp. 49–56, 2004. Best
paper award. [pdf,
pdf2]
- Donna Harman. The history of IDF and its influences on IR and
other fields. In Charting a New Course: Natural Language Processing
and Information Retrieval: Essays in Honour of Karen Spärck
Jones, pp. 69–79, 2005.
- Amit Singhal. Modern
information retrieval: A brief overview
. IEEE Data Engineering
Bulletin 24(4), pp. 35–43, 2001. [pdf]
- Karen Spärck Jones. IDF term weighting and IR research lessons.
Journal of Documentation 60(5):521–523 (2004).
[pdf]
- 8/30/07: length normalization (who'da think?)
- Lecture guide
- Lecture references:
- Amit Singhal, John Choi, Donald Hindle, David D. Lewis, Fernando
Pereira. AT&T at TREC-7. [pdf, pdf.gz]
- 9/4/07: pivoted document-length normalization
- Lecture guide
- Lecture references:
- Amit Singhal, Chris Buckley, and Mandar Mitra. Pivoted document
length normalization. SIGIR 1996. [pdf, pdf2]
- Additional references:
-
Tze Leung Chung,
Robert Wing Pong Luk,
Kam Fai Wong,
Kui Lam Kwok,
Dik Lun Lee. Adapting pivoted document-length normalization for query
size: Experiments in Chinese and English. ACM Transactions on Asian
Language Information Processing (TALIP) 5(3):245--263 (2006). [pdf]
-
Jaap Kamps,
Maarten de Rijke, and
Börkur Sigurbjörnsson. Length normalization in XML retrieval.
SIGIR 2004 [abstract,
pdf,
pdf
of 2005 version in Information Retrieval]
- 9/6/07: remarks on evaluation (of SBM SIGIR '96, and beyond)
- Lecture references:
- Chris Buckley and Ellen M. Voorhees. Retrieval evaluation with
incomplete information. SIGIR, pp. 25–32, 2004. [pdf,
pdf2]
- Stefan Büttcher, Charles L. A. Clarke, Peter C. K. Yeung, and
Ian Soboroff. Reliable Information Retrieval Evaluation with
Incomplete and Biased Judgements. SIGIR, pp. 63–70, 2007 [pdf, pdf2]
- Abdur Chowdury, M. Catherine McCabe, David Grossman, Ophir
Frieder. Document normalization revisited. SIGIR poster (2002). [pdf,
pdf2]
- Tetsuya Sakai. Alternatives to Bpref. SIGIR,
pp. 71–78, 2007. [pdf,
pdf
of talk slides]
- Section 3.3.2 of Ellen Voorhees and Donna Harman. Overview of the Eighth Text
REtrieval Conference (TREC-8) [good ps, poor
pdf].
- Justin Zobel. How reliable are large-scale information retrieval
experiments? SIGIR 1998 [pdf,
pdf2, ]
- Additional references:
- Ben He, Iadh Ounis.
On setting the hyper-parameters of term frequency normalization for
information retrieval. TOIS 25(3), 2007. [pdf].
For a different retrieval paradigm, but still suggestive in the
context of this lecture.
- Mark Sanderson and Hideo Joho. Forming test collections with no
system pooling. SIGIR 2004. [pdf]
- 9/11/07: Introduction to probabilistic retrieval
(Robertson-Spärck Jones version)
- Lecture references:
- Karen Spärck Jones, Steve Walker, and Stephen Robertson. A
probabilistic model of information retrieval:
Development and comparative experiments. Information Processing
and Management, pp. 779–808, 809–840 (2000). [html of preface, pdf
of Part 1, pdf
of Part 2]
- William S. Cooper. Some inconsistencies and misidentified
modeling assumptions in probabilistic information retrieval. ACM
Transactions on Information Systems (TOIS), pp. 100–111, 1995.
[pdf]
- Michael D. Gordon and Peter Lenk. When is the probability ranking
principle suboptimal? Journal of the American Society for Information
Science 43:1–14 (1992). [pdf]
- Stephen Robertston and Karen Spärck Jones. Relevance
weighting of search terms. Journal of the American Society for
Information Science 27, 129-46 (1976). [pdf]
- C. J. “Keith” van Rijsbergen. The emergence of
probabilistic accounts of information retrieval. Pp. 23–38 in In John
I. Tait. ed., Charting a New Course: Natural Language Processing and
Information Retrieval: Essays in Honour of Karen Spärck Jones (2005)
- Additional references:
- Fabio Crestani, Mounia Lalmas, C. J. Van Rijsbergen, and Iain
Campbell. "Is this document relevant? ...probably": A survey
of probabilistic models in information retrieval. ACM Computing
Surveys 30(4) (1998). [pdf,
pdf2]
- Norbert Fuhr. Probabilistic models in information retrieval. The
Computer Journal (1992) [pdf]
- Bojidar Mateev, Elke Mittendorf and Peter Schäuble. Where
the Linked Dependence Assumption Fails and How to Move Beyond It.
19th Annual BCS-IRSG Colloquium on IR. [pdf]
- M. E. Maron. Probabilistic approaches to the document retrieval
problem. SIGIR, pp. 98–107 (1982). [pdf] Discusses models for
interpreting the probability of relevance, and also gives a personal
historical perspective on information science (as Maron construes it).
- Stephen Robertson. The probability ranking principle in IR.
Journal of Documentation 33 (1977). Reprinted in Spärck Jones and
Peter Willett, eds., Readings in Information Retrieval (1997).
Appendix gives an example where ranking by probability of relevance
is suboptimal.
- 9/13/07: Probabilistic retrieval with binary attribute variables:
derivations of the IDF
- Lecture guide
- Lecture references:
- W. Bruce Croft and D. Harper. Using probabilistic models of
information retrieval without relevance information. Journal of
Documentation 35: 285–295 (1979). Reprinted in Spärck Jones and
Peter Willett, eds., Readings in Information Retrieval (1997).
- L. Lee. IDF revisited: A simple new derivation within the
Robertson-Spärck Jones probabilistic model. SIGIR
pp. 751–752, 2007. [pdf,
pdf2]
- Stephen E. Robertson and Karen Spärck Jones. Relevance
weighting of search terms. Journal of the American Society for
Information Science (JASIS) 27(3):129–146, 1976. Reprinted in
Peter Willett (ed.), Document Retrieval Systems, pp. 143–160, 1988. [pdf].
- Stephen E. Robertson and Steve Walker. On relevance weights with
little relevance information. SIGIR, pp. 16–24 (1997) [pdf].
- Additional references:
- Kenneth W. Church and William A. Gale. Inverse document
frequency (IDF): A measure of deviations from Poisson. Third Workshop
on Very Large Corpora (WVLC), pp. 121–130, 1995. [pdf, ps]
- Kishore Papineni. Why inverse document frequency? NAACL (2001).
[pdf]
- Stephen Robertson. Understanding inverse document frequency: On
theoretical arguments for IDF. Journal of Documentation
60(5):503–520 (2004). [pdf]
- Arjen P. de Vries and Thomas Roelleke. Relevance Information:
A Loss of Entropy but a Gain for IDF? SIGIR (2005). [pdf,
pdf2]
- 9/18/07: Probabilistic retrieval with attribute-count variables:
Poisson-based models
- Lecture guide
- Lecture references:
- Abraham Bookstein and D. R. Swanson. Probabilistic models for
automatic indexing. Journal of the American Society for Information
Science, 25(5):312–318 (1974). [pdf]
- Stephen P. Harter. A probabilistic approach to automatic keyword
indexing, part I: On the distribution of specialty words in a
technical literature. Journal of the American Society for Information
Science 26(4):197–206, 1975. [pdf]
- Additional references:
-
Eugene L. Margulis. N-Poisson document modelling. SIGIR,
pp. 177–189 (1992). [pdf]
- 9/20/07: Derivation of BM/Okapi term weighting. Intro to the
language-modeling paradigm for IR
- Lecture guides: part one, part two
- John Lafferty and Chengxiang Zhai. Probabilistic relevance models
based on document and query generation. In Croft and Lafferty, eds.,
Language Modeling and
Information Retrieval (2003). [pdf]
- Jay M. Ponte and W. Bruce Croft. A language modeling approach to
information retrieval. SIGIR (1998). [pdf]
- Stephen E. Robertson and Steve Walker. Some simple effective
approximations to the 2-Poisson model for probabilistic weighted
retrieval. SIGIR, pp. 232–241 (1994) [pdf, pdf2]. Reprinted in Spärck Jones and
Peter Willett, eds., Readings in Information Retrieval (1997).
- Additional references:
- S. E. Robertson and S. Walker and S. Jones and
M. M. Hancock-Beaulieu and M. Gatford. Okapi at TREC-3. Proceedings
of Trec 3, pp. 109–126, 1995. [pdf,
ps.gz]
- S. E. Robertson, S. Walker, and M. Beaulieu. Okapi at TREC-7:
automatic ad hoc, filtering, VLC and interactive. Proceedings of TREC
7, pp. 253–264, 1999. [ps.gz
(might be better),pdf.gz,
pdf2]
- 9/25/07: More on the LM approach
- Lecture guide
- Lecture references:
- John Lafferty and Chengxiang Zhai. Document language models,
query models, and risk minimization for information retrieval. SIGIR
(2001) [ps,
pdf;
long
version]
- John Lafferty and Chengxiang Zhai. Probabilistic relevance models
based on document and query generation. In Croft and Lafferty, eds.,
Language Modeling and
Information Retrieval (2003). [pdf,
ps]
- Victor Lavrenko and W. Bruce Croft. Relevance-based language
models. SIGIR (2001) [pdf,
pdf2]
- ChengXiang Zhai and John Lafferty. A study of smoothing methods
for language models applied to ad hoc information retrieval. SIGIR
(2001). [pdf,
ps,
pdf
of long version (TOIS 22(2):179--214)]
- Additional references:
- Language modeling and information retrieval. The Lemur project. [html]
- Xiaoyong Liu and W. Bruce Croft. Statistical language modeling
for information retrieval. Annual Review of Information Science and
Technology 39 (2003). [pdf]
-
David MacKay and Linda Peto. A hierarchical Dirichlet language model.
Natural Language Engineering 1(3):289–307 (1995). [pdf]
- 9/27/07: Completion of the LM approach
- Lecture references:
- Djoerd Hiemstra and Wessel Kraaij. Twenty-One at TREC-7: Ad Hoc
and Cross Language track. [pdf.gz]
- ChengXiang Zhai and John Lafferty. A study of smoothing methods
for language models applied to ad hoc information retrieval. SIGIR
(2001). [pdf,
ps,
pdf
of long version (TOIS 22(2):179--214)]
- 10/2/07: Introduction to
relevance feedback
-
- Lecture references:
- Ide, E. 1971. New experiments in relevance feedback. In
Gerard Salton, ed., The SMART retrieval system: Experiments in Automatic
Document Processing, pp. 337–354 (1971).
- E. Ide and G. Salton. Interactive search strategies and dynamic
file organization in information retrieval. In
Gerard Salton, ed., The SMART retrieval system: Experiments in Automatic
Document Processing, pp. 373–393 (1971).
- Stephen Robertson. On term selection for query expansion.
Journal of Documentation 46(4):359–364 (1990) [pdf]
- J. J. Rocchio. Relevance feedback in information retrieval. In
Gerard Salton, ed., The SMART retrieval system: Experiments in Automatic
Document Processing, pp. 313–323 (1971).
- Ian Ruthven and Mounias Lalmas. A survey on the use of relevance
feedback for information access systems. Knowledge Engineering Review,
18(1) (2003) [pdf]
- Jaime Teevan, Eytan Adar, Rosie Jones and Michael
Potts. Information Re-Retrieval: Repeat Queries in Yahoo's Logs. SIGIR
(2007) [pdf,pdf2]
- Additional references:
- Y.K. Chang, C. Cirillo, J. Razon. Evaluation of feedback
retrieval using modified freezing, residual collection, and test and
control groups. In G. Salton, ed., The SMART retrieval system —
experiments in automatic document processing, pp. 355–370
(1971).
- Gerard Salton and Chris Buckley. Improving retrieval performance
by relevance feedback. JASIS 41(4): 288–297 (1990). [pdf]
- 10/4/07: Automatic query expansion (AQE) and interactive query
expansion (IQE)
- Lecture references:
- Ian Ruthven. Re-examining the potential effectiveness of
interactive query expansion. SIGIR, pp. 213–220 (2003). [pdf, pdf2] Best paper award.
- 10/11/07: Implicit feedback. Further notes on language models
(in preparation for combining them with implicit feedback).
- Lecture references:
- Diane Kelly and Jaime Teevan. Implicit feedback for inferring user preference:
A bibliography. SIGIR Forum 37(2), pp. 18–28 (2003). [pdf, pdf2]
- Masahiro Morita and Yoichi Shinoda. Information filtering
based on user behavior analysis and best match text
retrieval. SIGIR, pp. 272–281 (1994). [pdf]
- Jarkko Salojärvi, Kai Puolamäki and Samuel Kaski.
Implicit Relevance Feedback from Eye Movements. ICANN (2005). [pdf]
- Jaime Teevan, Susan T. Dumais, and Eric Horvitz. Personalizing
search via automated analysis of interests and activities. SIGIR,
pp. 449–456
(2005). [pdf,
pdf2]
- Additional references:
- Diane Kelly and Nicholas J. Belkin. Display time as implicit
feedback: Understanding task effects. SIGIR (2004). [pdf,
pdf2]
- Jarkko Salojärvi, Kai Puolamäki, Jaana Simola, Lauri
Kovanen, Ilpo Kojo and Samuel Kaski. Inferring Relevance from Eye
Movements: Feature Extraction. Technical Report, Computer and
Information Science, University of Helsinki (2005). [pdf].
Associated PASCAL challenge, 2005 [homepage]
- 10/16/07: More on language models, in preparation for combining
them with relevance feedback
- Lecture references:
- Thomas Cover and Joy A. Thomas. Elements of information theory.
Wiley (1991) [book
homepage].
- John Lafferty and ChengXiang Zhai. Document language models,
query models, and risk minimization for information retrieval. SIGIR
(2001) [ps,
pdf;
long
version]
- 10/23/07 An LM-based approach to implicit feedback
- Lecture guide
- Lecture references:
-
Xuehua Shen, Bin Tan, and
ChengXiang Zhai. Context-sensitive information retrieval using
implicit feedback. SIGIR, pp. 43–50 (2005).
[pdf,
pdf2]
- John Lafferty and ChengXiang Zhai. Document language models,
query models, and risk minimization for information retrieval. SIGIR
(2001) [ps,
pdf;
long
version]
- 10/25/07: Clickthrough data as implicit feedback
- Lecture guide
- Lecture references:
- Filip Radlinski and Thorsten Joachims.
Query chains: learning to rank from implicit feedback. KDD,
pp. 239–248 (2005)
[pdf,
pdf2]
Co-winner, best student paper award.
- Thorsten Joachims. Optimizing search engines using clickthrough
data. KDD, 2002 [pdf,
pdf2]
- Thorsten Joachims, Laura Granka, Bing Pan, Helene Hembrooke, and
Geri Gay. Accurately interpreting clickthrough data as implicit
feedback. SIGIR, pp. 154–161 (2005). [pdf,
pdf2]
-
Thorsten Joachims, Laura Granka, Bing Pan, Helene Hembrooke, Filip
Radlinski, and Geri Gay. Evaluating the accuracy of implicit feedback
from clicks and query reformulations in Web search. ACM Transactions
on Information Systems (TOIS), 25(2) (2007) [pdf,pdf2]
- Additional references:
- Ralf Herbrich, Thore Graepel, and Klaus Obermayer. Large margin
rank boundaries for ordinal regression. Advances in Large Margin
Classifiers, pp. 115–132 (2000) [ps.gz]
- 10/30/07: The term-document matrix, revisited (preparation for the
SVD)
- Lecture guide
- Lecture references:
- Lloyd N. (Nick) Trefethen and David
Bau III. Numerical Linear Algebra (1997). [book
homepage] See lectures one,
four
and five
(all from part I, "Fundamentals").
- 11/1/07: The singular value decomposition and “approximating
± convex hulls ”
- Additional references:
- Gene H. Golub and Charles F. Van Loan. Matrix Computations, third edition
(1996). [book
homepage]. The "Bible" on, well, matrix
computations. Extensive coverage of the singular value
decomposition (SVD), although certainly not as gentle an introduction
as in class.
- 11/6/07: "Few-factor" representations (LSI, pLSI, and others)
- Lecture guide
- Lecture references:
- David M. Blei, Thomas L. Griffiths, Michael I. Jordan, and Joshua B.
Tenenbaum. Hierarchical topic models and the nested Chinese restaurant
process. NIPS 16 (2003). [pdf]
- Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas
K. Landauer, and Richard Harshman. Indexing by latent semantic
analysis. Journal of the American Society of Information Science
41(6):391–407 (1990). [pdf,pdf2]
- Thomas Hofmann. Probabilistic latent semantic indexing. SIGIR,
pp. 50–57 (1999). [pdf,pdf2]
- Additional references:
- Rie Kubota Ando and Lillian Lee. Iterative Residual Rescaling:
An analysis and generalization of LSI. SIGIR, pp. 154–162
(2001) [pdf,pdf2]
- Yossi Azar, Amos Fiat, Anna Karlin, Frank McSherry, and Jared
Saia. Spectral analysis of data. Symposium on Theory of
Computing (STOC), pp. 619–626 (2001) [pdf,(longer?)
ps]
- Thomas Hofmann and Jan Puzicha. Statistical models for
co-occurrence data. AI Memo 1625, MIT (1998). [pdf]
- Thomas K. Landauer and Susan T. Dumais. A solution to Plato's problem:
The Latent Semantic Analysis theory of acquisition, induction and
representation of knowledge. Psychological Review 104(2):211–240
(1997). [pdf,html,html2]
- Daniel D. Lee and H. Sebastian Seung. Learning the parts of objects by
non-negative matrix factorization. Nature 401:788–791
(1999) [pdf]
-
Christos H. Papadimitriou,
Hisao Tamaki,
Prabhakar Raghavan, and
Santosh Vempala. Latent semantic indexing: a probabilistic analysis.
Symposium on Principles of Database Systems, pp. 159–168
(1998)
[pdf].
Journal version: Journal of Computer and Systems Science
61(20):217–235 (2000) [ps]
- Fernando Pereira, Naftali Tishby, and Lillian Lee. Distributional
clustering of English words. Proceedings of the ACL (1993). [pdf]
- Donald E. Powers, Jill C. Burstein, Martin Chodorow, Mary E. Fowles, and Karen Kukich. Stumping e-rater:
Challenging the validity of automated essay scoring (2001). [pdf]
An
interesting study in which knowledgeable humans were asked to produce
essays that would be incorrectly scored by automatic methods.
- Michael Steele and David Aldous. Introduction to the interface of
probability and algorithms. Probability and Algorithms: A Panel
Report. National Academy of Sciences (1993). [pdf]
One subsection gives a quick calculation for the expected number of
occupied tables in the Chinese restaurant process.
- Demo: score your own essay (on psycholinguistics) with LSI. [home page]
- 11/8/07: Modeling syntactic structure: context-free grammars
- Lecture guide
- Lecture references:
- Noam Chomsky. Three models for the description of language. In
I.R.E. Transactions on Information Theory 2, pp. 113–124.
- Igor Mel'čuk. Studies in Dependency Syntax, Karoma (1979).
- Lucien Tesnière. Elements de Syntaxe Structurale, Klincksieck (1959).
- Additional references:
- James Allen. Section 5.2: Movement phenomena in language. Natural language
understanding, second edition. Benjamin/Cummings (1995).
- Christopher Culy. The complexity of the vocabulary of
Bambara. Linguistics and Philosophy 8:345–351 (1985). [Springer
link]
- Daniel Jurafsky and James H. Martin. Chapter 9: Context-free grammars for
English. Speech and language processing: An introduction to natural
language
processing, computational linguistics, and speech
recognition. Prentice Hall (2000). [homepage]
In the second edition, this is chapter
12 (draft form).
- Lillian Lee. “I'm sorry Dave, I'm afraid I can't do that”:
Linguistics, Statistics, and Natural Language Processing circa
2001. In Computer Science: Reflections on the Field, Reflections from the
Field, National Academies Press, pp. 111–118, 2004. [pdf, html]
- Geoff Pullum, 1986. Footloose and context-free. Natural language and
linguistic theory 4, pp. 283–289. Reprinted in The great Eskimo
vocabulary hoax, U. of Chicago Press 1991. A very fun read.
- Stuart Shieber. Evidence against the context-freeness of natural
language. Linguistics and Philosophy 8:333–343 (1985). [pdf,Springer
link]
- 4/13/06: Augmented context-free grammars
- Lecture guide
- Lecture references:
- James Allen. Section 5.3: Handling questions in context-free grammars.
Natural Language
Understanding, second edition. Benjamin/Cummings (1995).
- Additional references:
- Daniel Jurafsky and James H. Martin. Chapter 11: Features and unification. Speech and Language Processing: An Introduction to Natural
Language Processing, Computational Linguistics, and Speech
Recognition. Prentice Hall (2000). [homepage]
In the second edition, this is chapter
16 (draft form).
- Fernando C. N. Pereira and Stuart M. Shieber. Prolog and
natural-language analysis. CLSI (1987) [pdf].
Sections 4.2.3-4.2.5 and 4.3.3 (pages 102ff. of the pdf file) describe
syntactic aspects of filler-gap dependencies.
- 11/15/07: Introduction to
tree-adjoining grammars
- Lecture references:
- Aravind K. Joshi, Leon S. Levy, and Masako Takahashi. Tree adjunct
grammar. Journal of Computer and System Sciences 10(1):136–163 (1975).
- Additional references:
- Aravind K. Joshi and Yves Schabes. Tree-adjoining grammars.
In Grzegorz Rozenberg and Arto Salomaa, editors, Handbook of Formal Languages,
volume 3 (Beyond words), pp. 69–123
(1997). [pdf]
- K. Vijay-Shankar and Aravind K. Joshi. Some computational
properties of tree adjoining grammars. Proceedings of the 23rd ACL,
pp. 82–93 (1985). [pdf,pdf2]
- The XTAG project. [home
page]. Site includes a tutorial, grammars for English and Korean, synchronous-tag
implementation, and a supertagging demo.
- 11/20/07: adjunction constraints and feature-based TAGS
- Lecture references:
- K. Vijay-Shanker and Aravind K. Joshi. Feature structures based
tree adjoining grammars. COLING, pp. 714–719 (1988) [pdf]
- Additional references:
- Anne Abeillé and Yves Schabes. Parsing idioms in
lexicalized TAGs. Proc. of the 4th EACL, pp. 1--9 (1989). [pdf]
- Yasubumi Sakakibara, Michael Brown, Richard Hughey, I. Saira Mian,
Kimmen Sjölander, Rebecca C. Underwood, and David
Haussler. Stochastic context-free grammars for tRNA modeling. Nucleic
Acids Research 22(23):5112–20 (1994). [pdf]
- David B. Searls. The linguistics of DNA. American Scientist
80(6):579–591. (1992)
- Yasuo Uemura, Aki Hasegawa, Satoshi Kobayashi, Takashi Yokomori.
Tree adjoining grammars for RNA structure prediction. Theoretical
Computer Science 210(2): 277–303, special issue on genome
informatics (1999) [pdf]
- 11/27/07: The EM algorithm, part one
- Lecture references:
- Zhiyi Chi and Stuart Geman. Estimation of probabilistic context-free
grammars. Computational Linguistics 24(2):298–305 (1998). [pdf]
- Dempster, Laird, and Rubin. Maximum likelihood from incomplete
data via the EM algorithm. Journal of the Royal Statistical Society,
B, 39(1):1–38 (1977) [ocr]
- Frederick Jelinek. Statistical Methods for Speech Recognition,
MIT Press, 1997.
- Detlef Prescher. A tutorial on the Expectation-Maximization
algorithm including maximum-likelihood estimation and EM training of
probabilistic context-free grammars. Presented at the 15th European
Summer School in Logic, Language and Information (ESSLLI-03). [pdf]
- Additional references: Note that notation varies
across different presentations, with the same variable often used to
refer to different things, unfortunately.
- Adam Berger. Convexity, maximum likelihood and all that. [ps]
- Michael Collins. The EM algorithm (1997) [ps]
- Geoffrey J. McLachlan and Thriyambakam Krishnan. The EM algorithm
and extensions. Second edition. Wiley (2006). [errata for first printing]
- Ted Pedersen. The EM algorithm: selected readings (2001) [pdf]
- Stefan Riezler. Getting EM to work (talk at EMNLP 2001) [ps.gz]
- 11/29/07: Conclusion of EM, and the course
- Lecture references:
- Detlef Prescher. A tutorial on the Expectation-Maximization
algorithm including maximum-likelihood estimation and EM training of
probabilistic context-free grammars. Presented at the 15th European
Summer School in Logic, Language and Information (ESSLLI-03). [pdf]
- Andreas Stolcke. An Efficient Probabilistic Context-Free Parsing
Algorithm that Computes Prefix Probabilities. Computational
Linguistics 21(2):165–201. [pdf,html,tech-report
ps]
- Additional references: