Search This Blog

Loading...

Sunday, September 9, 2012

STRUCTURE analysis of Eastern Europe at K=6


I've gone back to the old school for this latest admixture analysis of Eastern Europe, because it was the only way to achieve a sound result. In other words, I used STRUCTURE (the latest, 2009 version) instead of ADMIXTURE. The former is much slower, but based on what I've seen, it performs better in unsupervised runs with limited samples of very closely related groups. And that's exactly what I needed in this instance.

I used all of my Eastern European samples (those from east of Scandinavia, and north of Hungary), as well as various other relevant groups from Europe and Asia. Despite that, the dataset was still fairly lightweight, containing only around 400 individuals (and 65K SNPs to make sure the run didn't last for days). But I'm impressed with the results; the six clusters make perfect sense, and there's very little noise.


Key: Red = Siberian, Yellow = West Asian, Green = North European, Light Blue = Volga-Ural, Dark Blue = South Baltic, Pink = Mediterranean. See spreadsheet for details.




Now, it's important to note that programs like STRUCTURE have problems picking up ancient admixture events, even between highly divergent groups (see here). However, I think there's still a lot of very useful information in these results that correlate very well with linguistics and archeology.

For instance, based on genetic distances, the three Northern European components look closely related, and I'd say they all come from a single source somewhere in North/Central Europe. My bet is that this source is the Corded Ware cultural horizon (perhaps also with some Bell Beaker influence).

The dark blue "South Baltic" cluster is probably the result of the relatively recent Balto-Slavic expansion. The Volga-Ural cluster is likely much older and has a more complex story. Perhaps it represents the eastward movement of Corded Ware and derived groups to the Volga, and then backflow with Uralic tribes? This would explain the high correlation between the Volga-Ural and Siberian components in Baltic and Volga Finns, because based on latest linguistic data, it seems the pre-proto-Uralics originated in Siberia.

I do realise that the Chuvashs, who show the highest levels of the Volga-Ural in this test, are Turkic speakers. However, it's likely they're mostly natives to the Volga region who shifted languages from Uralic to Turkic during historic times. If the Volga-Ural cluster was of Turkic origin, it would be difficult to explain its very close genetic relationship to the other West Eurasian clusters.

The North European cluster, which peaks in Baltic Finns and Scandinavians, is found at high levels in all samples from non-Mediterranean Europe. It shows a lose correlation with the Germanic language group, but I think it probably predates the ethnogensis of this group. I suspect it represents all the allele frequencies from across Northern Europe not scooped up by Balto-Slavic and Volga-Ural founder effects and/or expansions. In other words, it's probably closest to the aforementioned ancestral Northern cluster.

Reference...

Jonathan K. Pritchard, Matthew Stephens and Peter Donnelly, Inference of Population Structure Using Multilocus Genotype Data, Genetics 155: 945–959 ( June 2000)

8 comments:

  1. Are not the "Siberian" component values quite high for Georgians and Abkhasians? What is causing that?

    ReplyDelete
  2. ^ They're higher than in ADMIXTURE runs, but that doesn't mean they're too high. As we now know, in reality, they're higher in all the samples than shown here.

    ReplyDelete
  3. As we now know, in reality, they're higher in all the samples than shown here.

    I meant being high for informal admixture analyses like STRUCTURE and ADMIXTURE. Georgians and Abkhasians never appear so much Mongoloid-admixed in ADMIXTURE analyses, and I do not see a particular reason for their STRUCTURE analysis results to be so different from their ADMIXTURE analysis results because, as far as I know, ADMIXTURE and STRUCTURE are quite similar softwares usually giving quite similar results.

    ReplyDelete
  4. Also, Georgians and Abkhasians are genetically West Asians and should be much less affected from the ancient East Eurasian element than Europeans. So they should appear not much more Mongoloid-admixed in formal analyses than in informal analyses.

    ReplyDelete
  5. ^ ADMIXTURE and STRUCTURE can give very different results.

    I ran a similar K=6 to the one above with ADMIXTURE, and most Finns ended up in a Finnish cluster with 0% Siberian. The Chuvash had their own cluster at K=7, and most also had 0% Siberian.

    So it appears that ADMIXTURE has a greater tendency to lump continental admixtures into ethnic clusters than STRUCTURE.

    ReplyDelete
  6. Anyway, the latest formal analysis results have made me very skeptical of informal analysis results.

    ReplyDelete
  7. The Abhkasians and Georgians usually show lower East Eurasian influence because its contained within more western clusters, like "West Asian" in Dienekes' K=7.

    If you take away East Eurasian influence from that West Asian cluster, you'll just get his "Southern" cluster, which is what the West Asian cluster in the above K=6 is.

    In other words, the West Asian here is like Dienekes' Southern + West Asian minus Siberian and East Asian.

    ReplyDelete
  8. The ancient Mongoloid element of West Asian populations is nowhere near to the relatively high values that we see in European populations.

    ReplyDelete