There's been a fair bit of talk in recent years about how human genetics mirror geography. In other words, how PCA or MDS plots often show a high correlation with geographic maps - for example, see here. However, the correlation is never perfect, and I think the reasons for this are pretty obvious. These include barriers to gene flow, like mountains or linguistic and cultural differences, as well as relatively recent admixtures from distant populations. In other words, people move around a lot and mix, but the world isn't flat, so sometimes they take detours, and don't always mix with their nearest neighbors.
This really complicates things if you want to place a genetic plot over a geographic map, and put all samples close to their geographic points of origin, perhaps in order to find the origins of individuals with unknown ancestry? In most cases, if you position the plot so that some of the samples fit the geographic map fairly well, many of the others won't. So there's usually a lot of fiddling involved in such exercises, and often the plots have to be reshaped, which basically means a more subjective and less useful analysis in the end.
However, there's now a better way to do all that, thanks to a new plotting algorithm called SPatial Ancestry Analysis (SPA), which I introduced in my last blog post. SPA offers some very interesting options, one of which allows the samples to be split into halves and plotted as two different points, and I already discussed that. Another option is that real world geographic coordinates can be used to obtain more accurate results.
Basically, "pure" samples of known origin are marked with the longitude and latitude of where they come from, and the program is asked to find allele frequency gradients for each of the SNPs in the analysis. After that, all of the test samples are run together, and placed on a plot according to those gradients. The results should resemble geography very closely, and they did in this, my latest analysis of West Eurasia.
The data sheet for that particular plot can be downloaded here. I also ran an extended version of that analysis, including individuals from Pakistan and North India, just to test their effect on the other samples. I'm not a huge fan of the latter, largely because it forces many European groups to overlap. However, it's still useful just to see where South Central Asians cluster compared to Europeans and West Asians, and the effect they have on the Iranian samples. The data sheet for that plot can be downloaded here.
There's a lot of interesting stuff on those plots, if you know what to look for. That shouldn't be too difficult here, because I marked the major geographic features of West Eurasia. This was obviously possible because the PC coordinates correlated with longitude and latitude. It'd take me a whole day to discuss properly everything I've spotted, so I'll only focus on a few points...
- Mountain ranges, deserts, steppes and large seas have been very effective at preventing gene flow between many groups in Europe and West Asia, even those that aren't far away from each other in terms of distance. So it's not a coincidence that these often fall in empty areas on the plots.
- The Mediterranean Sea has generally done a good job at separating Europe from North Africa and West Asia, but trails of migrations across this sea are visible on the plot. They're formed from such groups as Cypriots, Sicilians and Jews.
- The Black Sea, Caucasus Mountains and possibly the Pontic Caspian Steppe, have done an amazing job at blocking off West Asia from Eastern Europe, creating the widest gap on the plots.
- In Europe, the most effective genetic barrier seem to be the Alps. They've basically cut off North Italy from Central Europe. The Carpathians and Pyrenees have also played a role in separating certain populations. However, it seems Hungarians have been affected by significant gene flow from north of the Carpathians, because they appear too northwestern for their geography.
- The Irish Sea has not proved much of a genetic barrier, because the Irish actually cluster east of the Irish Sea, which is totally out of whack with geography. Thus, continuous gene flow from England and Scotland, and probably also directly from continental Europe, is suggested.
- Iranians cluster north of the Zagros Mountains on the first plot, but are pushed much further south on the second plot, which features South Central Asian samples. This possibly indicates some gene flow from Pakistan and surrounds into Iran. However, the Lut and Great Salt deserts have otherwise put a fairly wide gap between those populations.
- Many of the South Central Asian samples look too northern for their geography, and this is particularly true for the North Indians and Pathans, who are pushed past the central geographic point of the Hindu Kush.
And yes, I also ran a dual ancestry analysis with the same samples as on the first plot. However, I didn't make use of geographic coordinates in this case, largely because the results look a bit too tight for most samples when I do this, and thus aren't as much fun. You can view that plot here. The data sheet can be downloaded here.
A genetic map of West Eurasia with a difference (aka. the ancestral dichotomy in our genomes)