RPL22. Intepreting a complex tree of a simple ribosomal protein

We will be discussing the interpretation of tree of a ribosomal protein subunit (see below). In this tree the following species are present. A_thaliana is the plant model for molecular biology. We guess you know H_sapiens and D_melanogaster. A_fulgidus and P_horikoshii and M_thermophila are archaebacteria. R_prowazekki is an alpha proteobacteria. Nostoc_sp and P_marinus are cyanobacteria. B_subtilis is a bacterium. The tree of life for these species is generally to be considered as follows:

We made a PhyML tree for the ribosomal subunit protein (using the following non-default options; data type: AA; model of amino-acids substitution: WAG; proportion of invariable sites: estimated; number of substitution rate categories: 6; gamma distribution parameter: estimated). This resulted in the following tree:

  1. Try to annotate the gene tree in terms of duplications and losses for 10 minutes. How many gene losses and gene duplications in which time point / ancestor does your reconstruction (approximately) imply?
  2. Search in NCBI protein database (http://www.ncbi.nlm.nih.gov/protein) for the entry on NP_051097.1(the refseq version of RK22_ARATH). By inspecting the entry, find out on which chromosome its gene is encoded and what the subcellular location for the protein is.
  3. Do the same for the NP_567805.1 protein. On which chromosome is its gene encoded and what is its subcellular location? You find the chromosome also by looking for the information provided by the link given at "GeneID". (your findings should also hold for the human and fly orthologs of NP_567805.1 but double check in case of doubt)
  4. How would endosymbiosis explain the tree? Indicate symbiosis events in the tree.
  5. Moreover, if we consider the chromosomal location as retrieved in question 7.2 and 7.3 of the A. thaliana genes what other event happened to one of the two genes (NP_ 567805.1 vs. RK22_ARATH) that did not happen to the other?