Question: Med11, obtaining phylogenetic profiles of homologs by multiple means.
As introduced in the lectures we want to know for protein complexes in which species homologs and orthologs of their components occur. Here we will look at med11, a short/small subunit of the mediator complex. For med11 the problem is finding homologs rather than distinguishing paralogs from orthologs.
Run psi-blast iteration 2 of search of MED11. What happens to YALI0E20053p and WP_043903783.1. Give an explanation for why this happens.
How are the sequences (that are significantly similar to yeast med11 according to (PSI-)BLAST) annotated with regards to function? (how) Do you think you can now improve on some of these annotation? Check the genbank entries of some hits
Look for the protein with identifier KEY78982 in your second iteration psiblast and try to find it in the output of the first iteration. Explain your findings.
To find homologs (which are likely also orthologs) of med11 we maybe do not need to do psiblast. Go to PFAM and look up the med11 entry. Download the hmm model of med11 from PFAM (link to curation and models in the side bar; then the hyperlink "download" at the bottom). Run the pfam domain against full UNIPROT (i.e. change the target database) by using online HMMSEARCH . After having done the hmmsearch, search for the med11 hit in Trichomonas via the taxonomy link / filter at the top. How is this protein annotated with regards to function, i.e. what function does the database predict for this protein?
Get the sequence of the protein in Trichomonas that is hit by the med11 hmm. Blast(slow) or phmmer(somewhat faster) this sequence. How many significant hits do you obtain? Try to explain this. And could you have done a iteratively generated profile search from the Trichomonas sequence?
Go to http://orthomcl.org/orthomcl/. In the "Groups Quick Search" search box, search for med11.
How many orthologous groups do you find? Why did we find these groups, i.e. why are they found with the text query med11?
Click on the phyletic patterns tab.
Inspect the phyletic patterns and the species compostion. To what extend do they overlap or avoid each other (feel free to something more smart than using your eyes although I do not know what) What do you think is going on here?
- Obtain the protein sequence of Med11 from here. At NCBI perform a PSI-BLAST search with as query the protein sequence of Med11 and with adjusted paramters. To perform a psiblast instead of a blast, you need to change the search from blast to psi-blast under"Program Selection". Adjust the "max target sequences" to 5000 (you need to expand the "Algorithm parameters" button to see this and other options). Also check whether your search is not still limited to a few species because of previous blast searches (if your blast is still limited to a few species, press the reset page button; and re-adjust the changes as described in this question).
(link to PSI-BLAST). Is there Conseved Domain (Putative conserved domains) match at the top of the output page before the blast output? Ifso to which family/domain?
- In the PSI-BLAST output, look at the hit with XP_504172 and look at the hit with WP_043903783.
Specifically you should note the bits(score), the Expect (=e-value) and the Identity.
- Now we are going to see how both "greyzone" hits fit with the signigicantly similar sequences of blast (without iterating). Do the following:
- Save the page with the first psiblast results. Collect the protein sequences of (a selection of) significant hits and put them in a text file. To do so, first do something smart with NCBI here, which frees you from
having to cut-and-paste each individual protein sequence and directly sent the sequences into a file (e.g. select the sequences you want to do download, press download, select FASTA (Complete sequences), and save your file).
Add the protein sequences of the two hits (XP_504172 and WP_043903783) to the fasta file. You can add sequences by opening the sequence file of the significant hits using a text editor such as wordpad and then paste the sequences of the two sequences into this file using the fasta format.
Go to clustalOmega (http://www.ebi.ac.uk/Tools/msa/clustalo/) online.
Submit your downloaded sequences for multiple sequence alignment. Inspect the alignment (use jalview which should be installed on the university computers or see if the jalview button works from the clustal output (or switch browsers / laptops until you find a browser in which it works)).