More Uniform Sampling of Human Genetic Diversity (Xing et al. 2010)


High-throughput genotyping data are useful for making inferences about human evolutionary history. However, the populations sampled to date are unevenly distributed, and some areas (e.g., South and Central Asia) have rarely been sampled in large-scale studies.

To assess human genetic variation more evenly, we sampled 296 individuals from 13 worldwide populations that are not covered by previous studies. By combining these samples with a data set from our laboratory and the HapMap II samples, we assembled a final dataset of ~ 250,000 SNPs in 850 individuals from 40 populations. With more uniform sampling, the estimate of global genetic differentiation (FST) substantially decreases from ~ 16% with the HapMap II samples to ~ 11%. A panel of copy number variations typed in the same populations shows patterns of diversity similar to the SNP data, with highest diversity in African populations.

This unique sample collection also permits new inferences about human evolutionary history. The comparison of haplotype variation among populations supports a single out-of-Africa migration event and suggests that the founding population of Eurasia may have been relatively large but isolated from Africans for a period of time. We also found a substantial affinity between populations from central Asia (Kyrgyzstani and Mongolian Buryat) and America, suggesting a central Asian contribution to New World founder populations.

The Eurasian PCA is interesting:

At the sub-continental level, we focus first on Eurasia, where most of our samples have been selected (Figure 4A). Overall, PC1 and PC2 mainly reflect the geographic distribution of the populations, with the majority of genetic variation accounted for by their locations. PC1 (accounting for 62.7% of the variance) reflects an east-west gradient, while PC2 (3.3% of the variance) reflects a north-south gradient.

The results of ADMIXTURE analysis (for Eurasian individuals) are presented in graphical format in the paper itself, for (K=7).

Not much to comment on this that hasn't been seen before:

One observation is the existence of some "red" West Asian component in the N. European sample, which is not found in Slovenians. This may parallel the peculiarity of the Caucasoid components observed for Russians and Lithuanians recently, although the several Caucasoid components detected in that study are folded into 2 in the current one.

Notice also, how "red" is the main extraneous component in Indian Brahmins. As expected, even Brahmins are predominantly of "indigenous" origin, as these Brahmins are from Tamil Nadu and Andhya Pradesh, and not from North India.

The West Asian affiliations of the main Caucasoid component are evident, and agree with Behar et al. (2010) where South Asians had a major overlap with West Asians (light green) and a minor one with Europeans (dark blue). In this paper, with a lower K the different European and West Asian subclusters are not visible.

The most interesting part of the study -for me- was the inclusion of three novel African samples, the Luhya, Alur, and Hema. Notice the blue component in these people, which resolves partially to orange at K=12. This is an indication of Eurasian affinities that are mostly lacking in other black Africans.

The Luhya are Bantu speakers from Kenya, so they are not indigenous to East Africa, but have probably picked up some native East African ancestry from their non-Bantu neighbors.

The Hema are from the Democratic Republic of Congo, but they are Nilo-Saharan pastoralists. Their fairly noticeable West Eurasian component may reflect origins outside the Congo. Are these another member of the non-Bantu pastoralists expanding from East Africa to the south? It would be interesting to take a look at these people's Y chromosomes.

All in all, a very interesting paper which adds important new populations to the discussion of human origins. Also of note, the free availability of the paper's genotype data and supplementary material at the Jorde Lab.