Department of Computer Science
COLLOQUIUM
Thursday, October 21, 2004
4:15pm
B17 Upson Hall
Raymond Mooney
University of Texas at Austin
Learning to Extract Proteins and their Interactions from Medline Abstracts
Automatically extracting information
from biomedical text holds the promise of easily consolidating large amounts of
biological knowledge in computer-accessible form. This strategy is particularly
attractive for extracting data on human genes from the 11 million abstracts in
Medline. We have developed and evaluated a variety of learned
information-extraction systems for identifying human proteins and their
interactions in Medline abstracts. We demonstrate that machine-learning
approaches using support-vector machines, maximum-entropy, and conditional
random fields are able to identify human proteins with higher accuracy than
several previous approaches. We also demonstrate that various rule induction
methods are able to identify protein interactions more accurately than
manually-developed rules. I will also discuss our recent results on collectively
extracting all protein names in an abstract using Relational Markov Networks
that utilize specific relations between possible protein references.
Joint work with Razvan Bunescu, Edward Marcotte, Ruifang Ge, Rohit Kate, Yuk-Wah
Wong, and Arun Ramani.
Bio Sketch:
Raymond J. Mooney is a Professor in the Department of Computer Sciences at the University of Texas at Austin. He received his Ph.D. in 1988 from the University of Illinois at Urbana/Champaign. He is an author of over 100 published papers in artificial intelligence, primarily in the area of machine learning, a former co-chair of the International Conference on Machine Learning, and a former editor of the Machine Learning journal. His recent research has focused on text mining, learning for natural-language processing, information extraction, text categorization and clustering, recommender systems, relational learning and inductive logic programming, and semi-supervised learning. Additional information is available on the web at http://www.cs.utexas.edu/users/mooney.