Search This Blog

Saturday, August 25, 2018

Global25 workshop 3: genes vs geography in Northern Europe


To produce the intra-North European Principal Components Analysis (PCA) plot below, download this datasheet, plug it into the PAST program, which is freely available here, then select all of the columns by clicking on the empty tab above the labels, and choose Multivariate > Ordination > Principal Components or Discriminant Analysis. This is what you should end up with...


I'd say that the result more or less resembles a geographic map of Northern Europe. Of course, if you're in the possession of your own personal Global25 coordinates, you can add yourself to this plot to check whether your position matches your geographic origin.

Please keep in mind, however, that the vast majority (>90%) of your ancestry must be from north of the Alps, Balkans and Pyrenees to obtain a sensible outcome. Also please ensure that all of the columns in the datasheet are filled out correctly, including the group column, otherwise your position on the plot will be skewed.

See also...

Global25 workshop 1: that classic West Eurasian plot

Global25 workshop 2: intra-European variation

Global25 workshop 3: genes vs geography in Northern Europe

Global25 workshop 4: a neighbour joining tree

Modeling genetic ancestry with Davidski: step by step

Getting the most out of the Global25

Genetic ancestry online store (to be updated regularly)

Sunday, August 12, 2018

Global25 workshop 2: intra-European variation


Even though the Global25 focuses on world-wide human genetic diversity, it can also reveal a lot of information about genetic substructures within continental regions.

Several of the dimensions, for instance, reflect Balto-Slavic-specific genetic drift. I ensured that this would be the case by running a lot of Slavic groups in the analysis. A useful by-product of this strategy is that the Global25 is very good at exposing relatively recent intra-European genetic variation.

To see this for yourself, download the datasheet below and plug it into the PAST program, which is freely available here. Then select all of the columns by clicking on the empty tab above the labels, and choose Multivariate > Ordination > Principal Components.

G25_Europe_scaled.dat

You should end up with the plot below. Note that to see the group labels and outlines, you need to tick the appropriate boxes in the panel to the right of the image. To improve the experience, it might also be useful to color-code different parts of Europe, and you can do that by choosing Edit > Row colors/symbols. Of course, if you have Global25 coordinates you can add yourself to the datasheet to see where you plot.


Components 1 and 2 pack the most information and, more or less, recapitulate the geographic structure of Europe. However, many details can only be seen by plotting the less significant components. For instance, a plot of components 1 and 3 almost perfectly separates Northeastern Europe into two distinct clusters made up of the speakers of Indo-European and Finno-Ugric languages.


This plot might also be useful for exploring potential Jewish ancestry, because Ashkenazi, Italian and Sephardi Jews appear to be relatively distinct in this space. Thus, people with significant European Jewish ancestry will "pull" towards the lower left corner of the plot. For example, someone who is half Ashkenazi and half German will probably land in the empty space between the Northwest Europeans and Jews.

See also...

Global25 workshop 1: that classic West Eurasian plot

Global25 workshop 3: genes vs geography in Northern Europe

Global25 workshop 4: a neighbour joining tree

Modeling genetic ancestry with Davidski: step by step

Getting the most out of the Global25

Genetic ancestry online store (to be updated regularly)

Global25 workshop 1: that classic West Eurasian plot


In this Global25 workshop I'm going to show you how to reproduce the classic plot of West Eurasian genetic diversity seen regularly in ancient DNA papers and at this blog (for instance, here). To do this you'll need the datasheet below, which I'll be updating regularly, and the PAST program, which is freely available here.

G25_West_Eurasia_scaled.dat

Download the datasheet, plug it into PAST, select all of the columns by clicking on the empty cell above the labels, and go to Multivariate > Ordination > Principal Components. Here's a screen cap of me doing it:


This is what you should end up with. Please note that I also ticked the "convex hulls" box to define the populations from the "group" column in the datasheet.


Here I also ticked the "group labels" box. It's generally a useful feature, even though it makes a mess of the plot in this case due to the large number of populations.


See also...

Global25 workshop 2: intra-European variation

Global25 workshop 3: genes vs geography in Northern Europe

Global25 workshop 4: a neighbour joining tree

Modeling genetic ancestry with Davidski: step by step

Getting the most out of the Global25

Genetic ancestry online store (to be updated regularly)