CS/INFO 630, Spring 2006: Human Language Technology
(or, IR,
NLP, and special guests;
or, Representing and accessing digital information)
Note: After the Spring 2006 semester, this course was
renumbered; the current version is CS674/INFO 630.
Prof. Lillian
Lee (follow link for contact information and office hours)
TR 2:55-4:10, Bard 140
Official description (from the course catalog):
Information retrieval has evolved from the problem of locating books in a library to a multitude of tasks ubiquitous in business, science, and personal life. Modern information systems automatically compose newspapers, extract facts from the web, and analyze usage patterns. This course covers the necessary techniques for representing, organizing, and accessing digital information that is in textual or semistructured form. Topics combine information retrieval, natural language processing, and machine learning, with links to work in databases and data mining.
Administrative info:
Resources: (reference texts, other lecture
notes and slides, etc.)
Lectures:
- 1/24/06: "Sense and Sensibility:
Automatically Analyzing
Subject and Sentiment in Human-Authored Texts" (preface)
- Handouts: course description and
policies [pdf]
- 1/26/06: Basics of information retrieval; the vector-space model
- Lecture guides: cover sheet,
Haque/Shaparenko, Rabkin/Krafft
- Handouts: Examples for lecture-guide preparation [pdf] and an example (for a different class)
of the scribe-notes portion [ps]
- Lecture references:
- Amit Singhal. Modern
information retrieval: A brief overview. IEEE Data Engineering
Bulletin 24(4), pp. 35–43, 2001. [ps]
- TREC conference
proceedings. [website].
See especially the overviews.
- Additional references:
- See the course resources page for
other overviews.
- David Durbin. The most influential paper Gerard Salton never
wrote. Library Trends, Spring 2004. [html]
An interesting sidelight on the origins of the vector space model, the
assumptions underlying the VSM, and
phantom citations.
- Karen Spärck Jones. IDF term weighting and IR research lessons.
Journal of Documentation 60(5):521–523 (2004).
[pdf]
Reviews the history of the introduction of the three
"consensus" term-weighting quantities.
Notes that Spärck Jones' data
consisted of presence/absence wordlists, thus explaining the
non-incorporation of
term-frequency information.
- 1/31/06: An example of empirical IR research: length normalization
- Lecture guides: cover sheet for
Danis/Rogan (statement of problem corrected 2/21/06), Danis/Rogan
scribe notes, Danis/Rogan
problems, Leshed/Shami (cover
sheet for Leshed/Shami superseded by updated cover sheet for Danis/Rogan)
- Lecture references:
- Amit Singhal, Chris Buckley, and Mandar Mitra. Pivoted document
length normalization. SIGIR 1996. [ps, pdf]
- Justin Zobel. How reliable are large-scale information retrieval
experiments? SIGIR 1998 [pdf, pdf2]
- Additional references:
- Mark Sanderson and Hideo Joho. Forming test collections with no
system pooling. SIGIR 2004. [pdf]
- Ellen Voorhees and Donna Harman. Overview of the Eighth Text
REtrieval Conference (TREC-8) [good ps, poor
pdf]. Section 3 discusses the effects of pooling.
- 2/2/06: Completion of pivoted document-length normalization
- Lecture guides: cover sheet, Dejgosha/Hu scribe notes, Hamatake/Park
- Additional references:
- Abdur Chowdury, M. Catherine McCabe, David Grossman, Ophir
Frieder. Document normalization revisited. SIGIR poster (2002). [pdf,
pdf2]
-
Jaap Kamps,
Maarten de Rijke, and
Boerkur Sigurbjoernsson. Length normalization in XML retrieval.
SIGIR (2004) [abstract,
pdf]
- 2/7/06: Introduction to probabilistic retrieval (classic case)
- Lecture guides: cover sheet, Au/Gerner/Kot, Babinski/Lin
- Lecture references:
- Karen Spärck Jones, Steve Walker, and Stephen Robertson. A
probabilistic model of information retrieval:
Development and comparative experiments. Information Processing
and Management, pp. 779–808, 809–840 (2000). [html of preface, pdf
of Part 1, pdf
of Part 2]
- William S. Cooper. Some inconsistencies and misidentified
modeling assumptions in probabilistic information retrieval. ACM
Transactions on Information Systems (TOIS), pp. 100–111, 1995.
[pdf]
- Additional references:
- Fabio Crestani, Mounia Lalmas, C. J. Van Rijsbergen, and Iain
Campbell. "Is this document relevant? ...probably": A survey
of probabilistic models in information retrieval. ACM Computing
Surveys 30(4) (1998). [pdf,
pdf2]
- M. E. Maron. Probabilistic approaches to the document retrieval
problem. SIGIR, pp. 98–107 (1982). [pdf] Discusses models for
interpreting the probability of relevance, and also gives a personal
historical perspective on information science (as Maron construes it).
- Stephen Robertson. The probability ranking principle in IR.
Journal of Documentation 33 (1977). Reprinted in Spärck Jones and
Peter Willett, eds., Readings in Information Retrieval (1997).
Appendix gives an example where ranking by probability of relevance
is suboptimal.
- Stephen Robertston and Karen Spärck Jones. Relevance
weighting of search terms. Journal of the American Society for
Information Science 27, 129-46 (1976). [pdf]
- 2/9/06: Completion of a derivation of (a version of) the RSJ model;
the case of binary attributes; an isolated-Poisson model for term frequencies
- Lecture guides: cover sheet, Connolly/Mujeeb,
Haque/Shaparenko
- Lecture references:
- W. Bruce Croft and D. Harper. Using probabilistic models of
information retrieval without relevance information. Journal of
Documentation 35: 285–295 (1979). Reprinted in Spärck Jones and
Peter Willett, eds., Readings in Information Retrieval (1997).
- Additional references, including samples from the continuing examination of
explanations for/derivations of IDF weighting.
- Kishore Papineni. Why inverse document frequency? NAACL (2001).
[pdf]
- Stephen Robertson. Understanding inverse document frequency: On
theoretical arguments for IDF. Journal of Documentation
60(5):503–520 (2004). [pdf]
- Arjen P. de Vries and Thomas Roelleke. Relevance Information:
A Loss of Entropy but a Gain for IDF? SIGIR (2005). [pdf,
pdf2]
- 2/14/06: The two-Poisson model and approximations (complete
classic probabilistic retrieval)
- Lecture guides: cover
sheet (3/9/06: minor update to note that a corrected version of the lecture
handout has been posted), Krafft/Rabkin, Leshed/Shami
- Handout: pdf (3/7/06: corrected
lecture number and noted typo in Singhal's Okapi equation)
- Lecture references:
- Abraham Bookstein and D. R. Swanson. Probabilistic models for
automatic indexing. Journal of the American Society for Information
Science, 25:312–318 (1974).
- Stephen P. Harter. A probabilistic approach to automatic keyword
indexing, part I: On the distribution of specialty words in a
technical literature. Journal of the American Society for Information
Science 26(4). [pdf.)
- Stephen E. Robertson and Steve Walker. Some simple effective
approximations to the 2-Poisson model for probabilistic weighted
retrieval. SIGIR, pp. 232–241 (1994) [pdf, pdf2]. Reprinted in Spärck Jones and
Peter Willett, eds., Readings in Information Retrieval (1997).
- Additional references:
-
Eugene L. Margulis. N-Poisson document modelling. SIGIR,
pp. 177–189 (1992). [pdf]
- S. E. Robertson and S. Walker and S. Jones and
M. M. Hancock-Beaulieu and M. Gatford. Okapi at TREC-3 (1994) [pdf,
ps.gz]
Introduces BM25.
- Origin of the name "Okapi". [html]
- 2/16/06: Introduction to the language modeling approach
- Lecture guide: cover sheet, Danis/Rogan
scribe notes, Danis/Rogan problems
- Lecture references:
- Jay M. Ponte and W. Bruce Croft. A language modeling approach to
information retrieval. SIGIR (1998). [pdf]
- John Lafferty and Chengxiang Zhai. Probabilistic relevance models
based on document and query generation. In Croft and Lafferty, eds.,
Language Modeling and
Information Retrieval (2003). [pdf,
ps]
- ChengXiang Zhai and John Lafferty. A study of smoothing methods
for language models applied to ad hoc information retrieval. SIGIR
(2001). [pdf,
ps]
- Additional references:
- Language modeling and information retrieval. The Lemur project. [html]
- Xiaoyong Liu and W. Bruce Croft. Statistical language modeling
for information retrieval. Annual Review of Information Science and
Technology 39 (2003). [pdf]
- 2/21/06: More on the LM approach
- Lecture guides: cover
sheet for Babinski/Lin, Babinski/Lin
(3/10/06: lecturer-induced typo in exponent for HMM LM incorporated
into guide and hence notice removed from
cover sheet), Hamatake/Park scribe
notes (3/16: a few typos corrected), Hamatake/Park problems
- Additional references:
- David MacKay and Linda Peto. A hierarchical Dirichlet language model.
Natural Language Engineering 1(3):289–307 (1995). [pdf]
- ChengXiang Zhai. Notes on the KL-divergence retrieval formula
and Dirichlet prior smoothing (2003). [pdf]
Our discussion of the effects of using corpus-dependent Dirichlet
smoothing is modeled on the derivation given in this paper,
although the initial scoring function is a bit different.
- 2/23/06: Completion of alternate LM formulation; introduction to
relevance feedback
- Lecture guides (cover sheet
announces the altered derivation presented in both sets of scribe notes): Au/Gerner/Kot (3/29: a few
typos corrected),
Connolly/Mujeeb
- Lecture references:
- Ian Ruthven and Mounias Lalmas. A survey on the use of relevance
feedback for information access systems. Knowledge Engineering Review,
18(1) (2003) [pdf]
- Additional references:
- ChengXiang Zhai and John Lafferty. Document language models,
query models, and risk minimization for information retrieval. SIGIR
(2001) [ps,
pdf;
long
version]
Considers the query and the relevant documents to have been
generated by the same source LM.
- Karen Spärck Jones. Language modelling's generative model:
is it rational? (2004) [pdf]
- 2/28/06: Relevance feedback methods
- Lecture guides: cover
sheet (updated 6/12/06),
Haque/Shaparenko, Krafft/Rabkin
- Lecture references:
- Victor Lavrenko and W. Bruce Croft. Relevance-based language
models. SIGIR (2001) [pdf,
pdf2]
- Stephen Robertson. On term selection for query expansion.
Journal of Documentation 46(4):359–364 (1990) [pdf]
- J. J. Rocchio. Relevance feedback in information retrieval. In
Gerard Salton, ed., The SMART retrieval system: Experiments in Automatic
Document Processing (1971)
- E. Ide and G. Salton. Interactive search strategies and dynamic file organization in information retrieval. Ibid, pp. 373-393 (1971)
- Additional references:
- Gerard Salton and Chris Buckley. Improving retrieval performance
by relevance feedback. JASIS 41(4): 288–297 (1990). [pdf]
- 3/2/06: Relevance feedback: further explorations and evaluation issues
- Lecture guides: Danis/Rogan, Leshed/Shami
- Lecture references:
- Y.K. Chang, C. Cirillo, J. Razon. Evaluation of feedback retrieval
using modified freezing, residual collection, and test and control
groups. In G. Salton, ed., The SMART retrieval system — experiments
in automatic document processing, pp. 355–370 (1971).
- Helene Fowkes and Micheline Beaulieu. Interactive searching
behavior: Okapi experiments for TREC8. Proceedings of 22nd BCS-IRSG
European Colloquium on IR Research (2000).
- Ian Ruthven. Re-examining the potential effectiveness of
interactive query expansion. SIGIR, pp. 213–220 (2003). [pdf, pdf2]
- Xuehua Shen and ChengXiang Zhai. Active feedback in ad hoc
information retrieval. SIGIR, pp. 59–66 (2005). [pdf, pdf2]
- Additional references:
- Micheline Beaulieu, Helene Fowkes, Nega Alemayehu, and
Mark Sanderson. Interactive
Okapi at Sheffield — TREC-8 (1999). [pdf]
- 3/7/06: Completion of relevance feedback; implicit feedback sources
- Lecture guides: Hamatake/Park
- Lecture references:
- Jaime Teevan, Susan T. Dumais, and Eric Horvitz. Personalizing
search via automated analysis of interests and activities. SIGIR,
pp. 449–456
(2005). [pdf,
pdf2]
- Masahiro Morita and Yoichi Shinoda. Information filtering
based on user behavior analysis and best match text
retrieval. SIGIR, pp. 272–281 (1994). [pdf]
- Diane Kelly and Nicholas J. Belkin. Display time as implicit
feedback: Understanding task effects. SIGIR (2004). [pdf,
pdf2]
- Additional references:
- NIPS 2005 workshop on machine learning for implicit feedback and
user modeling [homepage]
and associated PASCAL "inferring relevance from eye movements
challenge" 2005 [homepage]
- Diane Kelly and Jaime Teevan. Implicit feedback for inferring user preference:
A bibliography. SIGIR Forum 37(2), pp. 18–28 (2003). [pdf, pdf2]
- Combining eye movements and collaborative filtering for proactive
information retrieval. Kai Puolamäki, Jarkko Salojärvi,
Eerika Savia, Jaana Simola, and Samuel Kaski. SIGIR (2005). [pdf, pdf2]
- Jaime Teevan, Susan T. Dumais, and Eric Horvitz. Beyond the
commons: Investigating the value of personalizing web search.
Workshop on New Technologies for Personalized Information Access (PIA)
(2005). [pdf]
Study showing that individuals can give the same results
different relevancy judgments even when the individuals had the same information
need, that the same query can represent different information needs,
and related findings.
- 3/9/06: An LM-based approach to implicit feedback
- Lecture guides: cover
sheet (updated 5/15/06 to correct a mistake in notation), Au/Gerner/Kot, Babinski/Lin
- Lecture references:
-
Xuehua Shen, Bin Tan, and
ChengXiang Zhai. Context-sensitive information retrieval using
implicit feedback. SIGIR, pp. 43–50 (2005).
[pdf,
pdf2
]
- Additional references:
- Thomas Cover and Joy A. Thomas. Elements of information theory.
Wiley (1991) [book
homepage]. Well-known introduction to the field..
- Lillian Lee. Similarity-based approaches to natural language
processing (1997). [pdf]
Section 2.3 contains some introductory material describing motivations for the use of the Kullback-Leibler (KL) divergence.
- 3/14/06: Clickthrough data as implicit feedback: human validation
- Lecture guides: Connolly/Mujeeb, Haque/Shaparenko
- Lecture references:
- Thorsten Joachims, Laura Granka, Bing Pan, Helene Hembrooke, and
Geri Gay. Accurately interpreting clickthrough data as implicit
feedback. SIGIR, pp. 154–161 (2005). [pdf,
pdf2]
3/16/06: midterm exam (see also mid(-)term notes (3/28/06) and the Lecture 15 Krafft/Rabkin guide (includes brief solution
sketches for some midterm questions))
- 3/28/06: Clickthrough data as relative implicit feedback
- Lecture guides: Au/Gerner, Krafft/Rabkin (includes brief solution
sketches for some midterm questions)
- Handouts: mid(-)term notes
- Lecture references:
- Thorsten Joachims. Optimizing search engines using clickthrough
data. KDD, 2002 [pdf, pdf2]
- Filip Radlinski and Thorsten Joachims.
Query chains: learning to rank from implicit feedback. KDD,
pp. 239–248 (2005)
[pdf,
pdf2]
Co-winner, best student paper award.
- Additional references:
- Ralf Herbrich, Thore Graepel, and Klaus Obermayer. Large margin
rank boundaries for ordinal regression. Advances in Large Margin
Classifiers, pp. 115–132 (2000) [ps.gz]
- 3/30/06: Matrix-theoretic characterizations of corpus structure
(introduction to the singular value decomposition (SVD))
- Lecture guides: Danis/Rogan, Kot/Leshed
-
Additional references:
- Lloyd N. (Nick) Trefethen and David
Bau III. Numerical Linear Algebra (1997). [book
homepage] This book, and especially lectures one,
four
and five
(all from part I, "Fundamentals"),
should have been referenced in class: our presentation turns out to
resemble theirs, but is, not surprisingly, not nearly as nice.
- 4/4/06: The SVD
- Lecture guides: Connolly/Lin/Mujeeb, Hamatake/Park (Update, 6/12/06: corrigenda)
- Additional references: (see also previous lecture)
- Gene H. Golub and Charles F. Van Loan. Matrix Computations, third edition
(1996). [book
homepage]. The "Bible" on, well, matrix
computations. Extensive coverage of the singular value
decomposition (SVD), although certainly not as gentle an introduction
as in class.
- 4/6/06: "Few-factor" representations (LSI, pLSI, and others)
- Lecture guides: Haque/Shaparenko.
Note: In class, I incorrectly stated that the GMAT automated scoring system,
e-rater,
uses LSI; I confused two different systems. This lecture guide
corrects the error.
- Handouts: Lecture aid
- Lecture references:
- David M. Blei, Thomas L. Griffiths, Michael I. Jordan, and Joshua B.
Tenenbaum. Hierarchical topic models and the nested Chinese restaurant
process. NIPS 16 (2003). [pdf]
- Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas
K. Landauer, and Richard Harshman. Indexing by latent semantic
analysis. Journal of the American Society of Information Science
41(6):391–407 (1990). [pdf,pdf2]
- Peter W. Foltz, Darrell Laham, and Thomas
K. Landauer. Automated essay scoring: Applications to education
technology. Proceedings of ED-MEDIA (1999). [html]
- Thomas Hofmann. Probabilistic latent semantic indexing. SIGIR,
pp. 50–57 (1999). [pdf,pdf2]
- Thomas K. Landauer and Susan T. Dumais. A solution to Plato's problem:
The Latent Semantic Analysis theory of acquisition, induction and
representation of knowledge. Psychological Review 104(2):211–240
(1997). [pdf,html,html2]
- Daniel D. Lee and H. Sebastian Seung. Learning the parts of objects by
non-negative matrix factorization. Nature 401:788–791
(1999) [pdf]
-
Christos H. Papadimitriou,
Hisao Tamaki,
Prabhakar Raghavan, and
Santosh Vempala. Latent semantic indexing: a probabilistic analysis.
Symposium on Principles of Database Systems, pp. 159–168
(1998)
[pdf].
Journal version: Journal of Computer and Systems Science
61(20):217–235 (2000) [ps]
- Additional references:
- Rie Kubota Ando and Lillian Lee. Iterative Residual Rescaling:
An analysis and generalization of LSI. SIGIR, pp. 154–162
(2001) [pdf,pdf2]
- Yossi Azar, Amos Fiat, Anna Karlin, Frank McSherry, and Jared
Saia. Spectral analysis of data. Symposium on Theory of
Computing (STOC), pp. 619–626 (2001) [pdf,(longer?)
ps]
- Thomas Hofmann and Jan Puzicha. Statistical models for
co-occurrence data. AI Memo 1625, MIT (1998). [pdf]
- Fernando Pereira, Naftali Tishby, and Lillian Lee. Distributional
clustering of English words. Proceedings of the ACL (1993). [pdf]
- Donald E. Powers, Jill C. Burstein, Martin Chodorow, Mary E. Fowles, and Karen Kukich. Stumping e-rater:
Challenging the validity of automated essay scoring (2001). [pdf]
An
interesting study in which knowledgeable humans were asked to produce
essays that would be incorrectly scored by automatic methods.
- Michael Steele and David Aldous. Introduction to the interface of
probability and algorithms. Probability and Algorithms: A Panel
Report. National Academy of Sciences (1993). [pdf]
One subsection gives a quick calculation for the expected number of
occupied tables in the Chinese restaurant process.
- Demo: score your own essay (on psycholinguistics) with LSI. [home page]
- 4/11/06: Modeling syntactic structure: context-free grammars
- Lecture guides: Au, Krafft/Rabkin
- Handouts: Lecture aid
- Lecture references:
- Gerald Gazdar, Ewan Klein, Geoffrey K. Pullum, and Ivan A. Sag.
Generalized phrase structure grammar. Harvard University Press
(1985).
- Maurice Gross. Méthodes en syntaxe: régime des constructions
complétives. Hermann (1975).
- Additional references:
- James Allen. Section 5.2: Movement phenomena in language. Natural language
understanding, second edition. Benjamin/Cummings (1995).
- Christopher Culy. The complexity of the vocabulary of
Bambara. Linguistics and Philosophy 8:345–351 (1985). [Springer
link]
- Daniel Jurafsky and James H. Martin. Chapter 9: Context-free grammars for
English. Speech and language processing: An introduction to natural
language processing, computational linguistics, and speech
recognition. Prentice Hall (2000).
- Fernando C. N. Pereira and Stuart M. Shieber. Prolog and
natural-language analysis. CLSI (1987) [pdf].
Sections 4.2.3-4.2.5 and 4.3.3 (pages 102ff. of the pdf file) describe
syntactic aspects of filler-gap dependencies.
- Geoff Pullum, 1986. Footloose and context-free. Natural language and
linguistic theory 4, pp. 283–289. Reprinted in The great Eskimo
vocabulary hoax, U. of Chicago Press 1991. A very fun read.
- Stuart Shieber. Evidence against the context-freeness of natural
language. Linguistics and Philosophy 8:333–343 (1985). [pdf,Springer
link]
- 4/13/06: Feature-based context-free grammars; introduction to
tree-adjoining grammars
- Lecture guides: Kot/Leshed, Danis/Rogan
- Lecture references:
- Aravind K. Joshi, Leon S. Levy, and Masako Takahashi. Tree adjunct
grammar. Journal of Computer and System Sciences 10(1):136–163 (1975).
- Additional references:
- James Allen. Section 5.3: Handling questions in context-free grammars.
Natural language
understanding, second edition. Benjamin/Cummings (1995). Motivated by gap feature propagation in GPSG, as described in
Gazdar et al. (1985) (full citation in references for the previous
lecture).
- Aravind K. Joshi and Yves Schabes. Tree-adjoining grammars.
In Grzegorz Rozenberg and Arto Salomaa, editors, Handbook of Formal Languages,
volume 3 (Beyond words), pp. 69–123
(1997). [pdf]
- Daniel Jurafsky and James H. Martin. Chapter 11: Features and unification. Speech and Language Processing: An Introduction to Natural
Language Processing, Computational Linguistics, and Speech
Recognition. Prentice Hall (2000).
- K. Vijay-Shankar and Aravind K. Joshi. Some computational
properties of tree adjoining grammars. Proceedings of the 23rd ACL,
pp. 82–93 (1985). [pdf,pdf2]
- The XTAG project. [home
page]. Site includes a tutorial, grammars for English and Korean, synchronous-tag
implementation, and a supertagging demo.
- 4/18/06: TAGs, continued
- Lecture guides: Connolly/Lin/Mujeeb
- Additional references: (see also above)
- Lillian Lee. "I'm sorry Dave, I'm afraid I can't do
that": Linguistics, Statistics, and Natural Language Processing
circa 2001. In Computer science: Reflections on the Field, Reflections
from the Field, National Academies Press, pp. 111–118
(2004). [pdf]
- 4/20/06: TAGs for idiom analysis; adjunction constraints
- Lecture guides: Hamatake/Park
- Lecture references:
- K. Vijay-Shanker and Aravind K. Joshi. Feature structures based
tree adjoining grammars. COLING, pp. 714–719 (1988) [pdf]
- Additional references: (see also above)
- Anne Abeillé and Yves Schabes. Parsing idioms in
lexicalized TAGs. Proc. of the 4th EACL, pp. 1--9 (1989). [pdf]
- 4/25/06: Algorithms for grammar parsing and learning
- Lecture references:
- Yasubumi Sakakibara, Michael Brown, Richard Hughey, I. Saira Mian,
Kimmen Sjölander, Rebecca C. Underwood, and David
Haussler. Stochastic context-free grammars for tRNA modeling. Nucleic
Acids Research 22(23):5112–20 (1994). [pdf]
- David B. Searls. The linguistics of DNA. American Scientist
80(6):579–591. (1992)
- Andreas Stolcke. An Efficient Probabilistic Context-Free Parsing
Algorithm that Computes Prefix Probabilities. Computational
Linguistics 21(2):165–201. [pdf,html,tech-report
ps]
- Yasuo Uemura, Aki Hasegawa, Satoshi Kobayashi, Takashi Yokomori.
Tree adjoining grammars for RNA structure prediction. Theoretical
Computer Science 210(2): 277–303, special issue on genome
informatics (1999) [pdf]
- Additional references:
- Daniel Jurafsky and James H. Martin. Chapter 10: Parsing with context-free grammars. Speech and language processing: An introduction to natural
language processing, computational linguistics, and speech
recognition. Prentice Hall (2000).
- Lillian Lee. Fast context-free grammar parsing requires fast
Boolean matrix multiplication. Journal of the ACM 49(1):1–15
(2002). [pdf,pdf2]
- Giorgio Satta. Tree adjoining grammar parsing and Boolean matrix
multiplication. Computational Linguistics 20(2):173–192
(1994). [pdf]
- Giorgio Satta. Tutorial on tree adjoining grammar parsing.
Seventh international workshop on tree adjoining grammarrs and related
formalisms (TAG+7) (2004). [ps]
- 4/27/06: The EM algorithm
- Lecture references:
- James K. Baker. Trainable grammars for speech recognition.
Speech communication papers for the 97th meeting of the Acoustical
Society of America, pp. 547–550 (1979)
- Karim Lari and Steve J. Young. The estimation of stochastic context-free grammars
using the inside-outside algorithm. Computer Speech
and Language 4:35–56 (1990)
- Detlef Prescher. A tutorial on the Expectation-Maximization
algorithm including maximum-likelihood estimation and EM training of
probabilistic context-free grammars. [pdf]
The handout for lecture consists of pages 44–47.
- Additional references: Note that notation varies
across different presentations, with the same variable often used to
refer to different things, unfortunately.
- Adam Berger. Convexity, maximum likelihood and all that. [ps]
- Zhiyi Chi and Stuart Geman. Estimation of probabilistic context-free
grammars. Computational Linguistics 24(2):298–305 (1998). [pdf]
- Michael Collins. The EM algorithm (1997) [ps]
- Dempster, Laird, and Rubin. Maximum likelihood from incomplete
data via the EM algorithm. Journal of the Royal Statistical Society,
B, 39(1):1–38 (1977) [ocr]
- Geoffrey J. McLachlan and Thriyambakam Krishnan. The EM algorithm
and extensions. Second edition. Wiley (2006). [errata for first printing]
- Ted Pedersen. The EM algorithm: selected readings (2001) [pdf]
- Stefan Riezler. Getting EM to work (talk at EMNLP 2001) [ps.gz]
- 5/2/06: Conclusion of EM; introduction to maximum-entropy models.
- Handouts: lecture aid
- Lecture references:
- Adam Berger, Stephen Della Pietra, and Vincent Della Pietra. A
maximum entropy approach to natural language processing. Computational
Linguistics 22(1):39–71 (1996). [pdf,ps]
- Additional references:
- Adam Berger's page on "MaxEnt and exponential models"
[html]
Contains many tutorials on relevant topics.
- 5/4/06: Conclusion of maximum entropy and of the class
- Handouts: Information regarding
the final exam
- Lecture references:
- John Lafferty, Andrew McCallum, and Fernando Pereira. Conditional
random fields: Probabilistic models for segmenting and labeling
sequence data. ICML, pp. 282–289 (2001) [pdf]