Due Monday, April 14th. Bring a hard-copy of your results, and be prepared to (briefly, 10-15 minutes) share your findings with the class.
Groups:
- Group 1: Hillary, Seth, Annette.
Theme: Cetacean phylogeny and evolution. What is the relationship of the cetaceans (whales and dolphins) to the rest of Mammalia?
Species for this lab: Physeter catodon, any baleen what (suborder Mysticeti), Sus scrofa, Homo sapiens, Hippopotamus amphibius.
Helpful reading: A complete phylogeny of the whales, dolphins and even-toed hoofed mammals (Cetartiodactyla) (pubmed id:16094808); Evidence from milk casein genes that cetaceans are close relatives of hippopotamid artiodactyls (pubmed id: 8752004).
- Group 2: Huimin, Gregg, Lindsey, Shanshan.
Theme: Bat phylogeny and evolution. How do bats fit in the mammalian phylogeny? Are megachiropterans and microchiropterans monophyletic? What are there closest relatives?
Species for this lab: Any megachiropteran, Myotis lucifugus, Homo sapiens, Mus musculus, Bos taurus.
Helpful reading: A molecular phylogeny for bats illuminates biogeography and the fossil record (pubmed id: 15681385); Flying primates? (pubmed id: 3945827); Base compositional bias and phylogenetic analyses: a test of the "flying DNA" hypothesis (pubmed id: 10051393).
- Group 3: Sarah, Kristin, Feng.
Theme: Human and primate phylogeny and evolution.
Species for this lab: Homo sapiens, Tupaia belangeri, Mus musculus, Tarsius (any species in this genus), Macaca mulatta.
Helpful reading: Molecular and Genomic Data Identify the Closest Living Relative of Primates (pubmed id: 17975064); Primate molecular divergence dates (pubmed id: 16815047);
Everyone should read the following article:Mammalian evolution and biomedicine: new views from phylogeny (pubmed id: 17624960).
Supplemental optional readings:
You will do a first attempt at a phylogenetic hypothesis by picking a representative protein or DNA sequence and using UGPMA (unweighted group-pair method with arithmetic mean) and neighbor-joining.
Note that choosing and aligning the sequences is largely independent of the programming part of the this assignment, and these parts can mostly be done by different group members.
Feel free to ask me for help.
- Choose a DNA sequence with known protein orthologs in each of the five given species. Do not use a cytochrome oxidase or a ribosomal protein (such as SSU proteins). You want to choose something with a moderate level of variability between the species - something like a histone might not vary enough, whereas an immunoglobulin might vary too much. Start with the least-studied species of the five (you are allowed to substitute closely related species if it helps - for example, it is acceptable to use Rattus norvegicus instead of Mus musculus). Explain your choice.
It may be helpful to use NCBI's Taxonomy database, with some of the optional items checked such as showing the number of Nucleotide and Protein sequences available at each taxonomic level.
- Get a multiple alignment of the five DNA sequences. The ClustalW server at EMBL-EBI is one recommended option. To get better results for your phylogeny, you may wish to trim some sequences to a well-aligned region - often some sequences are much longer than others, and these long gaps can distort measures of similarity and the quality of the multiple alignment. The well-aligned parts of your sequences should be at least 200-300 nucleotides and probably less than 20000.
- Write a program that outputs the neighbor-joined tree in Newick format from a distance matrix. The file here will get you started - it has a program that outputs the UPGMA tree (also available from the published list on the 5233 server). Here is an example DNA .aln file you can use as practice (it has 6 species).
Use these programs to compute the UPGMA and neighbor-joined tree for your data, using Jukes-Cantor or Kimura distances.
- Are your results qualitatively and/or quanitatively consistent with the literature? Are there any controversial phylogenies that relate to your theme?