Sunday, September 9, 2012
STRUCTURE analysis of Eastern Europe at K=6
I've gone back to the old school for this latest admixture analysis of Eastern Europe, because it was the only way to achieve a sound result. In other words, I used STRUCTURE (the latest, 2009 version) instead of ADMIXTURE. The former is much slower, but based on what I've seen, it performs better in unsupervised runs with limited samples of very closely related groups. And that's exactly what I needed in this instance.
I used all of my Eastern European samples (those from east of Scandinavia, and north of Hungary), as well as various other relevant groups from Europe and Asia. Despite that, the dataset was still fairly lightweight, containing only around 400 individuals (and 65K SNPs to make sure the run didn't last for days). But I'm impressed with the results; the six clusters make perfect sense, and there's very little noise.
Key: Red = Siberian, Yellow = West Asian, Green = North European, Light Blue = Volga-Ural, Dark Blue = South Baltic, Pink = Mediterranean. See spreadsheet for details.
Now, it's important to note that programs like STRUCTURE have problems picking up ancient admixture events, even between highly divergent groups (see here). However, I think there's still a lot of very useful information in these results that correlate very well with linguistics and archeology.
For instance, based on genetic distances, the three Northern European components look closely related, and I'd say they all come from a single source somewhere in North/Central Europe. My bet is that this source is the Corded Ware cultural horizon (perhaps also with some Bell Beaker influence).
The dark blue "South Baltic" cluster is probably the result of the relatively recent Balto-Slavic expansion. The Volga-Ural cluster is likely much older and has a more complex story. Perhaps it represents the eastward movement of Corded Ware and derived groups to the Volga, and then backflow with Uralic tribes? This would explain the high correlation between the Volga-Ural and Siberian components in Baltic and Volga Finns, because based on latest linguistic data, it seems the pre-proto-Uralics originated in Siberia.
I do realise that the Chuvashs, who show the highest levels of the Volga-Ural in this test, are Turkic speakers. However, it's likely they're mostly natives to the Volga region who shifted languages from Uralic to Turkic during historic times. If the Volga-Ural cluster was of Turkic origin, it would be difficult to explain its very close genetic relationship to the other West Eurasian clusters.
The North European cluster, which peaks in Baltic Finns and Scandinavians, is found at high levels in all samples from non-Mediterranean Europe. It shows a lose correlation with the Germanic language group, but I think it probably predates the ethnogensis of this group. I suspect it represents all the allele frequencies from across Northern Europe not scooped up by Balto-Slavic and Volga-Ural founder effects and/or expansions. In other words, it's probably closest to the aforementioned ancestral Northern cluster.
Jonathan K. Pritchard, Matthew Stephens and Peter Donnelly, Inference of Population Structure Using Multilocus Genotype Data, Genetics 155: 945–959 ( June 2000)