MSC1

Evolutionary relationship of pombe msc1 to genes in cerevisiae

In a recent paper on the comparative analysis of protein complexes between two fungal species (ref), there is conflicting information on the evolutionaty relationship of the msc1 protein from Schizosaccheromyces pombe to those in other species, most notably Sacceromyces cerevisiae. In one table they note msc1 as not having an ortholog in cerevisiae. In contrast, in one figure (and the main text) they propose that ecm5 is the ortholog of msc1 and that it is absent from the S. cerevisiae swr1 complex. One of the themes we care about is the interplay between genome evolution and function/network evolution or put in other words evolution of function. Also it should be clear now that we care a lot about evolution after speciation vs evolution after duplication. In this particular case, the paper claims in figure 5 and the main text that the protein msc1 or ecm5 evolved its function despite being orthologs. Of course this happens in biology and provides interesing insights but there is two reasons to take a close look in this particular case: (1) change in function after speciation (i.e. between orthologs) is the exception and (2) their inference on what precisely happened in genome evolution appears inconsistent within the paper (i.e. is msc1 now orthologous to ecm5 or not? and are we looking at simple function evolution or are we also looking at genome evolution.) So what is going on in the evolution / orthology of msc1 and ecm5? Let's try and find out! NB in a way this question is quite similar to some of the mini projects so it is quite a big exercise.

Find the protein sequences for pombe msc1/SPAC343.11c and for S. cerevisiae ecm5/YMR176W. Put them both in PFAM or the conserved domain database (CD-search at NCBI) or SMART. Describe the similarities and differences.
We want to collect sequences for a tree by finding homologs of msc1 and ecm5. So both should serve as a query. And we want to collect relevant homologs. Before we start collecting homologs, note also that in the paper ecm5 is described to be part of a complex in cerevisiae namely with sntc2, ecm5 and rdp3. In the paper the sntc2 protein in pombe is in a complex with lid2 and jmj3.
- Perhaps collecting the sequences is easiest in phmmer because this server allows a nice taxonomy breakdown of the result similarity search in a tree that collapses in or out. You could also use blast and restrict your blast search to a few target species described below.
- Note that you easily mistakenly could take instead of different proteins/hits, e.g. (splice)variants from the same species or orthologous proteins from closely related strains/species (next to a keen eye for gene names or very similar bitscores, use of refseq in the case of blast could slightly mitigate this problem or use of phmmer and browsing their taxonomy output).
- Collect two/three best and significant hits (if possible) from each of the following organisms Schizosaccharomyces pombe (including msc1 itself!), Saccharomyces cerevisiae, one or two animals, one or two plant species, and two or three other fungi, one ascomycete fungus, one basidomycete fungi and one early branching fungus (i.e. from a non ascomycete/basidiomycete fungus. These fungi can be very informative see e.g. the example of RAL in the lectures.
- For all hits you collect also write down the location in query and target of the HSP (high scoring segment pair) or in other words, regions where the two proteins are relevant. Explain why we care where in the sequences the proteins are homologous to each other?
- Rename the sequences in the file to a name you like (this name ideally includes a species names but also all protein names should be different!). Open the file in a clustal (x locally or omega on the web). Do the alignment. Then make a neighbor-joining tree or use the neighbor-joining tree provided by the clustal omega server. Open the tree in itol on the web or in treeview locally. Look at the tree.
What is the evolutionary relation between msc1, ecm5 and other pombe genes also taking into account the results from domain seearches, and the location of the HSP's? If you cannot make up your mind include homologs from additional genomes that could help and make sure you have included S. pombe lid2. Also look at the domain composition of your homologs in domain search servers such as PFAM, SMART or the conserved domain database, and perhaps there also play around with the threshold to be able to see grey-zone domain hits.
What do you think this means for the protein complex membership evolution of the ancestral lineage of msc1? (also given the proteomics data in the original paper ref) and the complex memberships of sntc2 in cerevisiae and pombe (as described there). i.e. do you think msc1 gained complex membership after speciation or after duplication?