Natural Selection Project

Project Member

Genevieve Kendall

Project Description

A March 2008 paper in Nature Genetics, by Barreiro et al, claims to have discovered certain elements that have been selected for or against in the human population. They came to this conclusion by employing an FST analysis to determine allele frequency differences among the HapMap populations in 2.8 million Phase 2 SNPs. From their analysis they claim that overall, "negative selection has globally reduced population differentiation at amino acid-altering mutations, particularly in disease causing genes". They also assert that, "positive selection has ensured the regional adaptation of human populations by increasing population differentiation in gene regions, primarily at nonsonymous and 5'-UTR variants". My project involves replicating their materials and methods, and generating the same types of figures, in order to see if I come up with similar conclusions.

Project Timeline

Week 1 (April 21-27)

  • Pick project, make Wiki, literature search

Week 2 (April 28-May 4)

  • Read all background information, gather HapMap data, determine how to best employ the various analyses

Week 3 (May 5-11)

  • Replicate Figure 1 and Figure 2, make powerpoint slides over this section

Week 4 (May 12-18)

  • Replicate Figure 3 and Figure 4, make powerpoint slides over this section

Week 5 (May 19-25)

  • Replicate Figure 5, see if I get the same results as those seen in Table 1, make powerpoint slides over this section

Week 6 (May 26- June 1)

  • Fix anything that has gone wrong, finalize powerpoint and make a cohesive presentation

Week 7 (June 2-June8)

  • Present!, add suggestions from class into finalized project

Week 8 (June9-June15)

  • Turn in finished project

Detailed Progress

Week 1: April 21-27

1. What I did this week: Picked my project, made a Wiki page, did a background literature search for papers I need to read, started reading
2. What I plan to do next week: Read the main paper in more depth as well as read the necessary background papers, start getting the HapMap data I will need as well as figuring out the best programs for analyzing the data
3. How what I did compared to what I planned to do: Awesome!
4. My grade for myself this week: A

Week 2: April 28-May 4

1. What I did this week: I read the paper/methods in detail as well as the supplemental tables/notes, figured out the best program for analyzing the data is probably R, figured out that I need to collect relevant SNP data using HapMart that has all of the filters I need, I haven't started collecting the different datasets yet because I was studying for the test!
2. What I plan to do next week: Start collecting the data and try and replicate Figure 1
3. How what I did compared to what I planned to do: I got a little behind, but I at least have a plan in place and now that the test is over I will have more time to focus on the project
4. My grade for myself this week: A-, maybe I was a little too ambitious for Week 2, ie with the test coming up that Monday

Week 3: May 5-11

1. What I did this week: Got all of the SNP frequency data for the different populations/groups that I need from HapMart, got a hold of Peter Dalgaard's book, Intro Statistics with R for backup to help me write my F statistic
2. What I plan to do next week: Write my statistic in R and start analyzing data
3. How what I did compared to what I planned to do: Pretty good, I got all of my data, but I need to start writing my R statistic. I think I need to rearrange my timeline because the figures will be the easiest part to make, the hard part is actually analyzing the data and should take up a greater chunk of time.
4. My grade for myself this week: B+, need to write statistic in R!!

Week 4 (May 12-18)

1. What I did this week: Worked on defining the terms in the paper/trying to calculate FST for just a few alleles
2. What I plan to do next week: I don't think using this statistic is going to work, way too difficult, come up with Plan B
3. How what I did compared to what I planned to do: I'm really trying and I spent a lot of time on it this week
4. My grade for myself this week: A

Week 5 (May 19-25)

1. What I did this week: Utilizing a much simpler statistic (HT-HS)/HT ; gathering individual frequency data at each allele to calculate 2pq or expected hardy weinberg hets for each subpopulation, calculating HT is a lot simpler just need the average allele frequency from all populations/N
2. What I plan to do next week: Start cranking out results
3. How what I did compared to what I planned to do: It took me forever to figure out a plan of action but now I think I have a clear goal for what needs to happen
4. My grade for myself this week: A

Week 6 (May 26- June 1)

1. What I did this week: Started by trying to parse/analyzing data in excel (not a good idea). Switched to PERL, asked a lot of questions/wrote PERL code to both parse data and analyze it!!, making figures for my presentation, etc…
2. What I plan to do next week: Finish powerpoint and figures
3. How what I did compared to what I planned to do: Success!!!! Yay!!
4. My grade for myself this week: A+

Week 7 (June 2-June8)

1. What I did this week: Finalized presentation and figures, presented on the 4th
2. What I plan to do next week: If anybody has suggestions add those in and write the 2 page summary report
3. How what I did compared to what I planned to do: Just as planned!!
4. My grade for myself this week: A+

Week 8 (June9-June15)

1. What I did this week: Re analyzed the 3' UTR region with new data from HapMap, I changed a few slides to reflect my new results at the end of my powerpoint presentation, and I also wrote my 2 page summary over my final project
2. What I plan to do next week: Project is done!
3. How what I did compared to what I planned to do: Very well!
4. My grade for myself this week: A+

Relevant Papers

Barreiro et al 2008; Natural selection has driven population differentiation in modern humans

Akey et al 2002; Interrogating a high-density SNP map for signatures of natural selection

Williamson et al 2007; Localizing recent adaptive evolution in the human genome

Carlson et al 2005; Genomic regions exhibiting positive selection identified from dense genotype data

Sabeti et al 2007; Genome-wide detection and characterization of positive selection in human populations

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License