Disease-causing pathogens are among the most intriguing forces shaping human evolution, as they have a tremendous impact on our genome and themselves evolve over time. Through natural selection, genetic variants that confer resistance to infectious diseases can spread through human populations over time, leaving distinctive patterns in the human genome. The classic example of such an effect is the evolution of human genetic resistance to malaria across Africa.
In the era of genomics, we can now probe information buried in the millions of sequence variations that have occurred and persisted in the human genomes in search of signatures of recent evolution. We have developed computational methods, such as CMS, LRH and XP-EHH tests, to detect genetic variants under positive selection. These methods identify variants that have recently emerged and spread through populations, relying on the breakdown of recombination as a clock for estimating the ages of alleles. We have applied these methods to large datasets of human genetic variation finding many novel candidates for selection.
We are developing methods to further refine the signals from large candidate regions to localize the underlying selected polymorphism. We have also developed software to make detection of selection, by these and other methods, possible for the rapidly expanding empirical data on genetic variation in humans and other species. The Sabeti lab continues to refine existing, and develop novel, methods and tools to detect and localize signals of selection in humans and other organisms. We are using approaches that take advantage of rapidly expanding datasets of genetic variation and larger population sampling, increasingly affordable full-genome sequencing, and new insights into the structure of genetic variation in the genome. We will apply our methods to look for instances of natural selection, using our own data and data collected from humans in two international efforts: The International Haplotype Map Consortium (1000 individuals genotyped for 1 million polymorphisms) and the 1000 Genomes Consortium (full genome sequences from 1000 individuals).
Through our search for positively selected genes in the human genome, we discovered a novel target of selection in a Nigerian study population - the LARGE gene, a protein that is critical for infection with Lassa virus. We continue to carry out studies of genetic susceptibility to Lassa hemorrhagic fever, investigate the genetic properties of resistance alleles, and seek to develop genetic association methods that utilize signals of natural selection.
Figure 1. The strongest signal of natural selection in the Yoruba of Nigeria is a 300 kb genomic region that lies entirely within the gene LARGE. (A) The signal, based on the LRH test, on Chromosome 22. (B) A visual display of an allele at 39% prevalence with long-range associations (haplotype) across the region to either side, suggesting it is a common allele of young age. The bottom diagram, for comparison, shows another common allele, not believed to be under selection, with much greater decay of long-range associations, suggesting an older age.