Speaker: Golan Yona
Affiliation: Stanford
Date: 11/23/99 - Tuesday
Time & Location: 4:15PM, 155 Olin
Title: Methods for global self-organization of all known proteins -
Towards a map of the protein space
Abstract:
In recent years we have witnessed a massive flow of new biological data. Large-scale sequencing projects throughout the world turn out new sequences, and create new challenges for investigators. These ongoing sequencing efforts have already uncovered the sequences of over 400,000 proteins. Such projects continue to yield the sequence of many new proteins whose function is not known.
Given a new protein sequence, the traditional approach to predicting its function and analyzing its properties hinges on pairwise comparisons with the sequences of other proteins whose properties are already known. However, in many cases this method fails to provide clues about the functionality of the protein in question. Already a decade ago, biologists began to realize that the amount of biological data accumulated in the databases can no longer be analyzed only by means of pairwise sequence comparison, and several large-scale analyses were carried out. These studies have led to the compilation of useful databases of protein families and domains. However, these studies did not yield a mathematical representation of protein sequences, and did not provide us with a global view of the sequence space. Such a view can lead to the discovery of high-level features of the protein space. This is extremely important in view of the fact that the common methods for protein sequence analysis still fail to assign a clear biological function to more than 40% of the sequences in the databases.
Starting from the novel concept of global organization, my work focuses on methods for global organization of all known protein sequences, aiming to obtain a bird's eye view of this space. I will talk about three different approaches that I have tested within the last few years (i) a Euclidean embedding approach, (ii) a graph-based approach. (iii) a unified sequence and structure based mapping of proteins. These studies resulted in pioneering maps of the protein space.