Nudix

Alignments, units of homology and trees.

  1. Save the following fasta file (http://bioinformatics.bio.uu.nl/snel/support/nudix_selection.fsa) to your machine. Open it in clustalOmega. Then align it and look at the alignment. Which segments of which sequences are aligned quite well and which are not?
  2. Make a tree and open it in for example itol (web). DROMO, DROWI and DROME are different species from the genus Drosophila, AEDAE and CULQU are different species of mosquito and ACYPI is an aphid. The rest of the genes are from vertebrates, primitive animals (NEMWE, TRIAD), and CHLRE is an algal sequence. Are all these insects grouped together?
  3. Search with the sequences (as present in your fasta file) of "Q9VB64_DROME" against PFAM at HMMSCAN: "https://www.ebi.ac.uk/Tools/hmmer/search/hmmscan , but adjust the significance and report e-value to 1. Write down the domain organisation. Do the same for the proteins "NUD14_HUMAN" and "Q17M97_AEDAE". What is the difference in domain organisation?
  4. Explain what regions of proteins NUD14_HUMAN, Q17M97_AEDAE and Q9VB64_DROME are homologous? e.g. you could sketch a very simple scenario of what you think happened and from this deduce what regions of what proteins are homologous both within and between proteins.
  5. Now go back to the alignment. How are your sequences aligned in relation to the domain organization?
  6. In a multiple sequence alignment, homologous residues should be on in the same column, i.e. aligned. To what extent is that now not the case in the current alignment?
  7. Make a copy of the original fasta file. In this file split the composite proteins (i.e. Q9VB64_DROME, B4K755_DROMO, B4NBF6_DROWI) into their respective homologous units (i.e. the duplicated units, i.e. the nudix domains+some surrounding amino acids). If you are not sure about domain boundaries submit the sequences to PFAM. Rename the protein fragments that you have thus created with different and useful names. Again do the alignment and make a tree. Annotate this tree in terms of speciations, (gene/domain) losses and (gene/domain) duplications. How is this new tree a more accurate depiction of the evolutionary relationships between the proteins than the tree you obtained in question 2?
  8. We already constructed a tree of the same domains and proteins using PhyML instead of neighbor-joining. Download and open this tree (nudix_regions_phyml.ph). How is it different from your tree? Does this tree imply a more parsimonious (i.e. a more simple) scenario of evolution than implied your neighbor joining tree from before.