Meetings: (approximately) alternate Wednesdays, 4-5 pm, Upson 5130
Occasionally co-meeting with CS772, the Artifical Intelligence Seminar
Resources: NLP at CUCS | CS674 | Other CS course web pages | Cognitive studies talks
Abstract: Different on-line documents on the same subject frequently contain substantially similar information. Hence, summaries that synthesize common information across documents and emphasize the differences between them would significantly help readers. In the talk, I will present MultiGen, a summarization system developed at Columbia University, which generates a concise summary of news articles presenting different descriptions of the same event. During the analysis stage, MultiGen identifies repeated information in the input articles; through this process it selects phrases that adequately convey the common information. The generation component orders those phrases and combines them into a fluent summary.
Aside from MultiGen, I will give a brief overview of automatic summarization, covering genres of summary, techniques for automated summarization, and evaluation strategies.
Abstract: Why should linguists care about probabilities? Wasn't the move to studying language at the purely symbolic level one of the great advances of the generative revolution of the 1950's and 60's? In this talk I suggest that while it is critical to continue to model deep, rich structural knowledge at many linguistic levels, it is equally critical to understand the way this knowledge is used probabilistically by human language users. I will summarize a number of results from our lab on the role of probability, and the directly related ideas of frequency, informativeness, and entropy, in human language processing. Our claim, drawing from our work and that of colleagues and predecessors dating back to Jespersen and even earlier, is that language processing is "probabilistic all the way down". Humans compute the probability of an interpretation in order to resolve lexical, syntactic, and thematic ambiguities. Humans compute the probability of words in language production to help determine the surface form the words should take. I will focus especially on our recent results on language production and the probabilistic representation of relations between words. This talk describes joint work with Alan Bell, Eric Fosler-Lussier, Daniel Gildea, Cynthia Girand, Michelle Gregory, Srini Narayanan, William D. Raymond, and Doug Roland.
This talk is jointly sponsored by Cornell's department of Linguistics, Computer Science, and Cognitive Studies.
CS775, Spring '00
Lillian Lee