Question: Med11, obtaining phylogenetic profiles of homologs by multiple means.
As introduced in the lecture we want to know for protein complexes, like mediator, in which species homologs and orthologs of their components occur. Here we will look at med11, a short/small subunit of the mediator complex. For med11 the problem is finding homologs rather than distinguishing paralogs from orthologs.
Obtain the protein sequence of Med11 from here. At NCBI perform a PSI-BLAST search (ctrl-F program selection to change from blast to psi-blast) with as query the protein sequence of Med11. Adjust the "max target sequences" to 5000 (you need to expand the "Algorithm parameters" button to see this and other options).
(link to PSI-BLAST). Is there a CDD (conserved domain database) match at the top of the output page? Ifso to which family/domain?
Save the page with the first psiblast results. Collect the protein sequences of all significant hits and put them in a text file. To do so, first do something smart with NCBI here, which frees you from
having to cut-and-paste each individual protein sequence and directly sent the sequences into a file (e.g. select the sequences you want to do download, press download, select FASTA (Complete sequences), and save your file). Go to clustalOmega (http://www.ebi.ac.uk/Tools/msa/clustalo/) online. Submit your downloaded sequences for multiple sequence alignment. Inspect the alignment (use jalview, or use local clustalx).
In the PSI-BLAST output, look at the hit with YALI0E20053p / XP_504172 and look at the hit with bifunctional ornithine acetyltransferase/N-acetylglutamate synthas / WP_043903783.1.
Specifically you should note the bits(score), the Expect (=e-value) and the Identity.
Add the protein sequences of these two hits to the fasta file, redo the alignment and look at the output. You can add sequences by opening the sequence file of the significant hits using a text editor such as wordpad and then paste the sequences of the two sequences into this file using the fasta format. Which of the two sequences fits better?
Run psi-blast iteration 2 of search of MED11. What happens to YALI0E20053p and WP_043903783.1. Give an explanation for why this happens.
How are the sequences (that are significantly similar to yeast med11 according to (PSI-)BLAST) annotated with regards to function? (how) Do you think you can now improve on some of these annotation? Check the genbank entries of some hits
Look for the gene "MGG_06741 / XP_003709455" in your second iteration psiblast and try to find it in the output of the first iteration. Explain your findings.
To find homologs (which are likely also orthologs) of med11 we maybe do not need to do psiblast. Go to PFAM and look up the med11 entry. Download the hmm model of med11 from PFAM (link to curation and models in the side bar; then the hyperlink "download" at the bottom). Run the pfam domain against full UNIPROT (i.e. change the target database) by using online HMMSEARCH . After having done the hmmsearch, search for the med11 hit in Trichomonas via the taxonomy link / filter at the top. How is this protein annotated?
Get the sequence of the protein in Trichomonas that is hit by the med11 hmm. Blast or phmmer this sequence. How many significant hits do you obtain? Try to explain this. And could you have done a iteratively generated profile search from the Trichomonas sequence?
Go to http://orthomcl.org/orthomcl/. In the "Groups Quick Search" search box, search for med11.
How many orthologous groups do you find? Why did we find these groups, i.e. why are they found with the text query med11?
Click on the phyletic patterns tab.
Inspect the phyletic patterns and the species compostion. To what extend do they overlap or avoid each other (feel free to something more smart than using your eyes although I do not know what) What do you think is going on here?