Protein sequence similarity searches are most powerful when they are used to infer sequence homology by identifying statistically significant sequence similarity. Homologous sequences are thought to share a common ancestor; i.e. at some point in evolutionary history, there was a single protein sequence, which, through processes of speciation or gene duplication and divergence, produced the two homologous sequences we see today.
The inference of homology ---
common ancestry --- is the most powerful conclusion that one can draw
from a similarity search because homologous proteins share similar
three-dimensional structures. This can be seen in Fig. 1, where the
structures of three members of the serine protease superfamily are
shown. Two of these proteins, bovine chymotrypsin and
S. griseus trypsin, share strong sequence similarity while
the third related sequence, S. griseus protease A, does not
share significant similarity (E()< 66) yet the protein has a very
similar structure. Thus, as will be seen throughout this chapter,
homologous proteins need not share statistically significant, or even
detectable, sequence similarity.
Fig. 1 - Expectation values (E()), percent identity, the length of the alignment are shown with respect to bovine trypsin. The last two numbers report the length of the alignment and the length of the library sequence whose structure is shown.
We infer that S. griseus Protease A (Fig. 1) is homologous to serine proteases because (1) they share very similar three dimensional structures and (2) they have similar functions. The first criterion is the most important - many homologous proteins perform different functions. However, as is clear in both this example and Fig. 3, often homologous proteins do not share significant sequence similarity. Thus, the inference of homology can be based on sequence similarity, but the converse is not true. Distantly related, homologous proteins need not share significant sequence similarity.