Biochem 503, Fall 2008
Protein sequence comparison and Protein evolution
(9/2,4/2008)
Mathews and van Holde, Chapter 7, 233-239
Topics
- Protein homology is inferred from "unusually" similar structures - Homologous proteins have similar structures, but not necessarily similar functions.
- Sequences that share statistically significant similarity are homologous (and have similar structures)
- Unrelated sequences behave like random sequences
- Similarity searches seek homologous proteins
- Protein sequence similarity is measured with a PAM-like scoring matrix
- Protein sequence comparison (or translated DNA sequence comparison) is MUCH more sensitive than DNA sequence comparison
- Homology (over the same domain of a protein) is transitive
- BLAST and FASTA perform rapid similarity searches with accurate statistics
- Statistical estimates are much more reliable than percent identity
- With two exceptions, Statistical estimates are accurate
- Proteins can be comprised of duplicated homologous domains
Protein evolution
Doolittle, R. F., Feng, D. F., Johnson, M. S., and McClure,
M. A. (1986) Relationships of human protein sequences to
those of other organisms. Cold Spring Harb. Symp. Quant.
Biol. 51:447-455.
***Pearson, W. R., "Protein sequence comparison and protein
evolution," ISMB2000 - Tutorial, San Diego, CA (2000).
PDF version
Statistics and evaluation of matches
***Altschul, S. F., Boguski, M. S., Gish, W., and Wootton, J.
C. (1994) Issues in searching molecular sequence databases.
Nature Genet. 6:119-129.
Sites for sequence database searching:
BLAST search page at the National Center for Biotechnology
Information: http://www.ncbi.nlm.nih.gov/BLAST/.
FASTA search page at the U. of Virginia: http://fasta.bioch.virginia.edu/fasta
EXPASY server for similarity searches, motifs, pI, molecular weight, etc.
http://expasy.hcuge.ch
WWW sites discussing sequence comparison
An excellent tutorial on using the BLAST programs: http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/information3.html.
http:prot_talk12-95.html
http://twod.med.harvard.edu/seqanal/
http://www.public.iastate.edu/~pedro/research_tools.html
Exercises:
Use the FASTA WWW
search page and/or BLAST to search protein
databases or specific genomes:
-
Search the SwissProt database with the mouse voltage gated potassium
channel cik4_mouse. Identify the phylogenetically most
distant homologous sequence. Is this the most distantly related
sequence? What is the expectation value E()-value of the highest
scoring unrelated sequence? (Why do you think the sequence is unrelated?)
-
Can you identify a p53_human homolog from a non-vertebrate?
(Hint: search SwissProt, if no non-vertebrate is found, search Blast
NR. Use advanced Blast to limit the search to different organisms,
e.g. C. elegans).
-
Does M. jannaschii contain a glutamate dehydrogenase (dhe4_human)? Search the M. jannaschii
proteome using SSEARCH.
No homologues? Try searching again with dhe2_sulso.
Questions on this topic from previous exams
-
Give two reasons why percent identity is not as useful as the
E()-value in establishing protein or DNA homology. What does
E()-value mean? What range of values can an E()-value take?
-
The mouse voltage gated potassium channel cik4_mouse shares weak, but
not statistically significant similarity with a hypothetical
Methanococcus jannaschii open reading frame (ORF, a putative protein
sequence) with an expectation value E() ~ 0.1. Suggest some
additional sequence-based strategies to confirm that this putative
M. jannaschii protein is homologous to mammalian voltage-gated ion
channels, indicating which strategies would provide the most reliable
inferences.
-
Present one argument to support the assertion that "random" protein sequences have similarity scores that behave like "real, unrelated" protein sequences.
How is this assertion used to justify the inference of homology from similarity?
-
(20 pts) You have just cloned and sequenced a gene encoding a novel
protein (<50% identical to any known protein) induced during liver
regeneration. Describe the different analyses you could perform on
the sequence to infer its 2o and 3o structure.
Which are the most reliable? The least?
-
Why is protein sequence comparison more effective than DNA sequence
comparison? Do you expect to be able to detect homology relationships
between human and yeast genes with DNA sequences? With protein
sequences? Describe a sequence comparison experiment to demonstrate
the difference in sensitivity of the two methods.
-
In genome research, investigators use the phrase transitive
catastrophe when referring to large sets of exceptionally misleading
incorrect annotations based on BLAST or FASTA similarity searches on
proteins from newly sequenced genomes. What is the most likely
explanation for these errors? How could they be avoided?
-
Shown below are the results of a FASTA sequence similarity search
using a Drosophila glutathione transferase query sequence. (a)
Assuming the statistics are accurate, what is likely to be the highest
scoring unrelated sequence? (b) Given that there are hundreds of
glutathione transferase sequences known from animals, plant, and
bacterial, how might you confirm that your candidate unrelated
sequence is truly unrelated? (c) How might you demonstrate, using
sequence similarity alone, that A37378 gluthathione transferase pi is
homologous to the Drosophila query sequence?
The best scores are: bits E(14548)
XUFF11 glutathione transferase 1 - fruit fl ( 210) 326 3.8e-90
XUZM32 glutathione transferase III - maize ( 223) 48 3.3e-06
XUZM31 glutathione transferase III - maize ( 221) 43 5.6e-05
XUZM1 glutathione transferase I - maize ( 214) 41 0.00031
RGECSS stringent starvation protein E. coli ( 213) 40 0.00058
XURTG glutathione transferase class alpha Ya1 ( 223) 40 0.0007
XURT8C glutathione transferase 8, - rat ( 223) 30 0.53
XURTG4 glutathione transferase 4 - rat ( 219) 29 0.98
A37378 glutathione transferase pi - human ( 211) 27 5.4
OUBP22 antirepressor protein ant - phage P22 ( 301) 27 6.8
NOBY2 phosphopyruvate hydratase 2 - yeast ( 438) 27 7.4
S30223 translation elongation factor eEF-1 b ( 228) 26 7.8
PWBYD H+-transporting ATP synthase delta ( 213) 26 8.7
Biochem 503 Home page