Computational Genomics, November 2005
Workshop III - BLAST, PSI-BLAST, and PSI-SEARCH
NCBI BLAST WWW site
These exercises use the
ECG2005 CHAPS,
ECG2005 PSI-BLAST
WWW pages, and the
wrpmg5b PSI-SEARCH and
xs00.achs PSI-SEARCH WWW pages.
CHAPS
allows you to enter a set of sequences, generate a multiple alignment,
and use that multiple aligment for a PSI-BLAST
search.
Additional information on the CHAPS program, which takes a set of sequences,
produces a multiple alignment, and then uses the multiple alignment
with PSI-BLAST, can be found here.
Looking at profiles/PSSMs -- the effect of diversity
-
Using the CHAPS WWW
page, make a multiple alignment and generate a PSSM using
the two sequences: gstm1_human, gstm2_human run CHAPS. After generating the alignment with
Run ClustalW Now, select Generate PSSM Now.
Examine the PSSM (position specific scoring matrix). Compare the values to BLOSUM62.
The weights of each residue on shown on the right half of the PSSM
-
Try the same process with: gstm1_human, gstm2_human, gstm3_human,
gstm1_mouse run CHAPS. Does the scoring matrix or weighting
change much?
-
Try the sequences gstm1_human, gstm3_human, gstp1_human, gsta1_human,
gstt1_human, gsto1_human, ptgd2_human,
run CHAPS. Now look at the the scoring matrix
and weighting.
Iterative searching with PSI-BLAST
-
Using the ECG2005 PSI-BLAST page, search the PIR1
database using gstt1_drome (gi|121694).
-
Set Iterations to
10 and E() cutoff to 1e-4. Are the E()-values for
XURTG and XURT8C the same as the ones you saw in problem
3? Does PSI-BLAST ever include a non-glutathione transferase homolog?
-
Do the same search turning composition statistics off. Check the
E()-values for XURTG and XURT8C.
-
Search with gstt1_drome (gi|121694) against PIR1 setting the E()-cutoff to 0.01. Do any non-homologs obtain scores better than 0.01?
-
Try the same search setting the E() cutoff to 0.2. What is the final E()-value for
SYEP_HUMAN Bifunctional aminoacyl-tRNA synthetase.
-
Do the same series of searches using OOHU (gi|476517). Set the
E() cutoff to 1e-4 and search the PIR1 database
for 10 iterations. Compare the converged results for a search
with and without composition-based statistics.
-
Try searching using the PSSM's you generated in the CHAPS/PSSM section.
Search the swissprot database, which has been annotated to indicate most GST homologs.
- Search with two sequences: gstm1_human, gstm2_human run CHAPS
- Search with four sequences: gstm1_human, gstm2_human, gstm3_human,
gstm1_mouse run CHAPS.
- Try the sequences gstm1_human, gstm3_human, gstp1_human, gsta1_human,
gstt1_human, gsto1_human, ptgd2_human,
run CHAPS.
In each of the searches, try to determine how broad the initial search was, and watch out for high-scoring unrelated sequences.
Looking at Profiles/PSSMs -- Statistics
-
We have set up a special version of PSI-BLAST at two different sites: PSI-SEARCH and PSI-SEARCH2
that provides three statistical estimates - the normal PSI-Blast
statistics, statistics from the distribution of unrelated scores
calculated by SSEARCH using the PSI-BLAST PSSM, and
statistics calculated using PRSS. The program first does a search at the NCBI, and then uses the PSSM used in that search to do a Smith-Waterman and PRSS against the same library.
Try using PSI-SEARCH with the sequence GSTM1_HUMAN (121735) against the
SwissProt database. At the end of each iteration, look at
the E() values calculated in the three different ways for new
sequences about to be included (or not) in the next iteration. Pay
particular attention to the discrepancies between BLAST and
SSEARCH/PRSS after the second iteration. Examine some of sequences
where the E()-values differ substantially, and consider whether the
"homologies" are genuine.
Computational Genomics Home Page