Hand in: Note results (copy and paste where appropriate) and answer questions in your lab report, attach to an email to burhansd@canisius.edu
Below is shown the amino acid sequence for an `unknown' bacteriocin (carnobacteriocin A9b) from lactic acid bacteria, and five sequences for some `known' (published) bacteriocins. Note the X's which stand for `not determined'. The `unknown' bacteriocin kills Listeria monocytogenes, which is a human-pathogenic, food-borne bacterium. The illness listeriosis attacks people that have a weak immune system, as well as foetuses (by infecting the bearer of the foetus) and the mortality rate is approximately 25 per cent. In Denmark, approximately 40 people are affected by listeriosis per year.
> carnobacteriocin A9b
V N Y G N G V S X X K K X X
> V1a piscicocin
K Y Y G N G V S C N K N G C
> V1b piscicocin
A I S Y G N G V Y C N K E K C
> bacteriocin B2
V N Y G N G V S C S K T K C
> leucocin A
K Y Y G N G V H C T K S G C
> mesentericin Y105
K Y Y G N G V H C T K S G C
Copy and paste these sequences given above into the ClustalW multiple alignment website http://www.ebi.ac.uk/clustalw/. Note that you should read about ClustalW so that you can understand your output. Make sure to view your output using JalView, the Java Applet graphical viewer, as well as to look at the other output files. What are the default parameters used for computing this alignment? How might they be changed, and what would you expect from changing them (discuss a couple of scenarios).
The protein family has a name. What do you think it is? Find out by searching SwissProt using protein Blast (link to BLAST web site) on one of the known sequences. If you search the nr (non-redundant) database instead, you may need to increase the Expect value to 1000 or 10000 to make Blast less fastidious, otherwise you will get no hits. This is because the nr database is much larger than Swissprot, and the query sequences are very short.
The file carboxypep.fasta contains 18 sequences for carboxypepsidases from humans, cows, rats, pigs, etc. Download the file and align the sequences using ClustalW.
How well conserved are the sequences? Are there any sequences that seem to be outliers (more distantly related)? Try removing the outliers by copying and editing the file and then redo the alignment. How does the new alignment compare to the original?
The file Chloroperoxidase.fasta contains eight sequences for proteins that are chloroperoxidases, or related to chloroperoxidases.
Use ClustalW to find a multiple alignment for these sequences. Because some sequences are short and others very long. What does the output indicate in terms of a common motif among the sequences? Are the different sequence lengths problematic?
Now use T-COFFEE instead of ClustalW to make a multiple aligment of the same eight sequences. You can run T-COFFEE at T-COFFEE. Select the Advanced option under T-Coffee (first set of choices). Inspect the HTML version of the alignment. How does this ouput compare to the ClustalW output?
Did either or both programs find the P[AS]YPSGHAT motif without problems? Explain your answer/include your data.
Assuming this is the coding strand, can you help her to identify the most likely translation frame?
Some of the exercses have been taken and modified following websites, thanks to the original authors:
http://www.bioalgorithms.info/practical-problems.php
http://www.matfys.kvl.dk/bioinformatik/exercise-multiple.html