[TOC] [Next]

Exploring Distant Protein Sequence Relationships

Searching Protein sequence databases -
What are we looking for?

Similarity <=> Homology


Biological <=> Statistical

Protein sequence similarity searches are most powerful when they are used to infer sequence homology by identifying statistically significant sequence similarity. Homologous sequences are thought to share a common ancestor; i.e. at some point in evolutionary history, there was a single protein sequence, which, through processes of speciation or gene duplication and divergence, produced the two homologous sequences we see today.

The inference of homology --- common ancestry --- is the most powerful conclusion that one can draw from a similarity search because homologous proteins share similar three-dimensional structures. This can be seen in Fig. 1, where the structures of three members of the serine protease superfamily are shown. Two of these proteins, bovine chymotrypsin and S. griseus trypsin, share strong sequence similarity while the third related sequence, S. griseus protease A, does not share significant similarity (E()< 66) yet the protein has a very similar structure. Thus, as will be seen throughout this chapter, homologous proteins need not share statistically significant, or even detectable, sequence similarity.

Fig. 1 - Expectation values (E()), percent identity, the length of the alignment are shown with respect to bovine trypsin. The last two numbers report the length of the alignment and the length of the library sequence whose structure is shown.

Endochitinase is an example of a very high-scoring, but unrelated protein whose structure is known. This high scoring unrelated sequence does not share any structural similarity with trypsin or other serine proteases. If two proteins are not homologous, one cannot draw any conclusion about their structural similarity, even though they may have high similarity scores.

We infer that S. griseus Protease A (Fig. 1) is homologous to serine proteases because (1) they share very similar three dimensional structures and (2) they have similar functions. The first criterion is the most important - many homologous proteins perform different functions. However, as is clear in both this example and Fig. 3, often homologous proteins do not share significant sequence similarity. Thus, the inference of homology can be based on sequence similarity, but the converse is not true. Distantly related, homologous proteins need not share significant sequence similarity.

[TOC] [Next]