Workshop V Part A: Exercises in profile HMM methods

1. Practical use of a profile HMM software package

(To do these exercises, you must logon to the cerebus Unix server using an Xterm:
On the PCs:On the Macs:
double click the "SSH Secure Shell Client" icon found on the lower left of your desktop;
  • click "Quick Connect" in the upper left corner
  • hostname: cerebus.cshl.org
  • user: courseXX
  • password: [ask us]
using "Terminal" (the black screen icon right of Netscape in the dock):
  • slogin cerebus.cshl.org
  • password: [ask us]
In both cases the login is "courseXX" and the password is "[ask us]". You should now get a new window with a UNIX command line prompt. Try typing a few simple UNIX commands such as pwd or hostname or more /ecg/slib/pir1.

HMMER 2 is installed in your path, in /ecg/bin. The files mentioned in the tutorial are in /ecg/data/hmmer/demos/. Copy any/all of these files to your home directory.
cp /ecg/data/hmmer/demos/globins* . (<<-- the "." is important)

I. Core functionality

  1. Use the program hmmbuild to build an HMM from an alignment (for example, the alignment globins50.msf in demos/).
    hmmbuild globins.hmm globins50.msf

  2. Search the /ecg/slib/blast/wormpep.lseg and/or /ecg/slib/pir1 database with your globin hmm:
    hmmsearch globins.hmm /ecg/slib/pir1 > glob_src1.pir_out
    Check to see how many globins the hmm can find in pir1. (View the file by typing:
    more glob_src1.pir_out
    Then do a search against the C. elegans wormpep database:
    hmmsearch globins.hmm /ecg/slib/wormpep20 > glob_src1.worm_out
    Look at the E() values for the high scoring worm globins and non-globins.

  3. Use the program hmmcalibrate to determine some statistics for your new HMM, so that HMMER can estimate E-values fairly accurately in any subsequent searches you do with the HMM.
    hmmcalibrate globins.hmm

  4. Do the wormpep search again with the calibrated globin.hmm for new globins.
    hmmsearch globins.hmm /ecg/slib/wormpep20 > glob_src2.worm_out

    Can the C. elegans globins (worm_glob.html)found by hmmsearch be identified by single sequence search (blast, PSI-blast, fasta, ssearch) ? What is the higest scoring unrelated sequences? Is the worm globin WO1C9.5 found with the globin HMM?

  5. Use the program hmmalign to align a large set of globins (globins630.fa) using the model you've built from the smaller set of 50.

  6. Use the program hmmsearch to search the start with some known globins (demos/globins630.fa), then maybe "parse" a multidomain globin (Artemia globin is in demos/Artemia.fa)
    hmmsearch globins.hmm globins630.fa > globins630.out
    pg globins630.out

II. Comparison of HMM searching with other approaches

We will do the example described by Sean Eddy in the handout entitled "Multiple alignment and multiple sequence based searches".

  1. Use the FASTA search page and/or PSI-BLAST to compare sra 4_caeel (gi 1176649) to the SwissProt database.

  2. Using PSI-Blast, identify the protein family.

  3. Login to the unix server.

  4. Copy the library of 32 sequences clearly related to sra4_caeel in /ecg/data/eddy/sra.lib to your unix directory:
    cp /ecg/data/eddy/sra.lib sra.lib

  5. Use the clustalw program to align the sra.lib sequences.
    clustalw sra.lib
    (Alternatively, copy the multiple alignment
    cp /ecg/data/eddy/sra.aln sra.aln

  6. Use hmmbuild to build a multiple alignment of the sra.aln alignment.
    hmmbuild sra.hmm sra.aln

  7. Use hmmcalibrate to calibrate the sra.hmm.
    hmmcalibrate sra.hmm
    (This takes a few minutes.)

  8. Use hmmsearch to search SwissProt (this will take a long time - please do it in groups).
    hmmsearch sra.hmm /ecg/slib/swissprot > sra_hmm.results
    (You can then view the results by typing more sra_hmm.results

    Compare a few of your results to what you can find at the PFAM website's SwissPFAM repository.


    You might also try running PSI-BLAST via the CHAPS interface using the sra.lib files for input. Also, compare the form of the PSI-BLAST profile (PSSM) with the HMM model file.