Eurogenes Genetic Ancestry Project

Monday, August 9, 2021

Genetic ancestry online store

Update 05/02/2025: To get your Global25 coords, please use the app HERE. The whole process usually takes a couple of days. Feel free to spread the word.

Please don't order the Global25 unless you have experience in modeling Global25 data with the Vahaduo analysis tools.

Note that the conversion of VCF, BAM, CRAM and/or fastq files is 30 to 50€ extra depending on the case. For enquiries please email teepean47 on g25requests@gmail.com.

...

Following a rigorous testing phase, the awesome Global 25 analysis is now available at the store for $12 USD. What's so awesome about this test, you might ask? See here and here.

Please send your request and autosomal genotype data (from AncestryDNA, FTDNA, LivingDNA, MyHeritage or 23andMe) to eurogenesblog at gmail dot com.

However, note that this test is free for anyone who already has Global 10 coordinates (see here). That's right, if you already have Global 10 coordinates, all you have to do is to send me your data and say what it's for. Simple as that.

...

My Celtic vs Germanic Principal Component Analysis (PCA) is now available via the store for $6 USD (see here). Please note that this test is only really useful for people of Central, Northern and/or Western European origin, and indeed geared for those of overwhelmingly Northwestern European ancestry.

Please send your request and autosomal genotype data (from AncestryDNA, FTDNA, LivingDNA, MyHeritage or 23andMe) to eurogenesblog at gmail dot com.

...

The popular Basal-rich K7 admixture test is now available via the store for $6 USD. It's suitable for everyone, except people with significant (>10%) Sub-Saharan ancestry. For more information about this test and some ideas about what to do with the output see here and here.

Please send your request and autosomal genotype data (from AncestryDNA, FTDNA, LivingDNA, MyHeritage or 23andMe) to eurogenesblog at gmail dot com.

See also...

Global25 workshop 1: that classic West Eurasian plot

Global25 workshop 2: intra-European variation

Global25 workshop 3: genes vs geography in Northern Europe

Modeling genetic ancestry with Davidski: step by step

Thursday, August 6, 2020

New Global25 interpretation tools

They're available at Ancestry Calculator and GENOPLOT. Unfortunately, I can't tell you exactly how to get the most out of them. All I can recommend is robust experimentation.

Tuesday, January 14, 2020

Modeling your ancestry has never been easier

An exceedingly simple, yet feature-packed, online tool ideal for modeling ancestry with Global25 coordinates is freely available HERE. It works offline too, after downloading the web page onto your computer. Just copy paste the coordinates of your choice under the "source" and "target" tabs, and then mess around with the buttons to see what happens. The screen caps below show me doing just that.

Another free, easy to use online tool that works with Global25 coordinates is the Principal Component Analysis (PCA) runner HERE. Below is a screen cap of me checking out one of the many PCA that it offers.

See also...

Getting the most out of the Global25

Saturday, December 14, 2019

Avalon vs Valhalla revisited

Pictured below is a new version of my Celtic vs Germanic genetic map. It's based on the same Principal Component Analysis (PCA) as the original (which can be seen here), but more focused on Northwestern Europe and produced with a different program.

To see the interactive online version, navigate to Vahaduo Custom PCA and copy paste the text from here into the empty space under the PCA DATA tab. Then press the PLOT PCA button under the PCA PLOT tab. For more guidance, refer to the screen caps here and here.

To include a wider range of populations in the key, just edit the data accordingly. For instance, to break up the ancient grouping into more specific populations, delete the Ancient: prefix in all of the relevant rows. This is what you should see:

Conversely, you can leave the ancient sample set intact and instead reorder the present-day linguistic groupings into, say, geographic groupings. To achieve this just delete all of the linguistic prefixes, such as Celtic:, Germanic:, and so on. You should end up with a datasheet like this and plot like this.

Of course, you can design your own plot by using any combination of the ancient and present-day individuals and populations that I've already run in this PCA. Their coordinates are listed here. Indeed, if you're in the possession of your own Celtic vs Germanic PCA coordinates, you can add yourself to the plot. And if you're not, see here.

It's also possible to re-process PCA data via the SOURCE tab. But I don't recommend doing this with the Celtic vs Germanic data, which are derived from a fine scale analysis and don't pack much variation. On the other hand, Global25 data are ideal for such re-processing. I made the plots below from subsets of Global25 coordinates available in a zip file here. To see how, refer to the screen caps here and here.

Friday, July 12, 2019

Getting the most out of the Global25

The first thing you need to know about the Global25 is that I update the relevant datasheets regularly, usually every few weeks, but they're always at these links:

Global25 datasheet ancient scaled

Global25 pop averages ancient scaled

Global25 datasheet ancient

Global25 pop averages ancient

...

Global25 datasheet modern scaled

Global25 pop averages modern scaled

Global25 datasheet modern

Global25 pop averages modern

Each sample has a population code and an individual code. The population codes represent the countries, ethnic groups and/or archeological affinities of the samples, and I often modify these codes to suit my needs. On the other hand, the individual codes are unique to most of the samples and I usually don't change them.

So if you'd like to know more details about the samples try searching for their individual codes via a decent online search engine. Basic information about many of the samples is also available in the "anno" files here.

The main purpose of the Global25 is to provide data for mixture modeling. In other words, for estimating ancestry proportions, both ancient and modern (see here). This can be done on your computer with the R program and the nMonte R script, or online with a couple of different tools, which I discuss below.

If you don't have R installed on your computer, you can get it here, while nMonte is available here. For this tutorial please download nMonte and nMonte3, and store them in your main working folder (usually My Documents).

Once you have R set up, make sure its working directory is the same place where you stored nMonte. You can check this in R by clicking on "File" and then "Change dir". Additionally, you'll need two nMonte input files in the working directory titled "data" and "target". Examples of these files are available here. We'll be using them to test the ancient ancestry proportions of a sample set from present-day England.

Before you can begin the analysis you need to first call the nMonte script by typing or copy pasting source('nMonte.R') into the R console window, and then hitting "enter" on your keyboard. This is what you should see in the R console window afterwards.

To start the mixture modeling process, type or copy paste getMonte('data.txt', 'target.txt') into the R console window, hit "enter", and wait for the results. After a short time, probably less than a minute or two, you should see this output.

The data and target files contain population averages. And, as you can see, the results that these population averages have produced are in line with what one would expect from such a model focusing on the genetic shifts in Northern Europe during the Late Neolithic. Very similar ancient ancestry proportions have been reported for the English and other Northern Europeans recently in scientific literature.

However, when focusing on exceptionally fine-scale genetic variation that isn't reflected too well in the Global25 population averages, a more effective strategy might be to use multiple individuals from each reference population and let nMonte3 aggregate and average the inferred ancestry proportions.

This is often the case when attempting to model ancestry proportions for more recent periods, such as the Middle Ages. So let's try this with the English sample set using a modified data file, which is available here.

Replace the old data file with the new one in your working directory, and, like before, copy paste into the R console window the following two commands, hitting "enter" after each one: source('nMonte3.R') and getMonte('data.txt', 'target.txt'). This is what you should eventually see.

It's difficult to say how accurate these estimates are. But they look more or less correct considering the limited and less than ideal reference samples. For instance, the individuals labeled SWE_Viking_Age_Sigtuna are supposed to be stand ins for Danish and Norwegian Vikings, but they're a relatively heterogeneous group from Sweden, possibly with some British or Irish ancestry, so they might be skewing the results.

However, I'll be adding many more ancient samples to the Global25 datasheets as they become available, including lots of new Vikings, which should greatly improve the accuracy of these sorts of fine-scale mixture models.

An exceedingly simple, yet feature-packed, online tool ideal for modeling ancestry with Global25 coordinates is the VahaduoJS. It's freely available HERE, and it works offline too after downloading the web page. Just copy paste the coordinates of your choice under the "source" and "target" tabs, and then mess around with the buttons to see what happens. The screen caps below show me doing just that.

However, it's important to note that the Global25 is a Principal Component Analysis (PCA), so it makes good sense to also use it for producing PCA graphs. To do this just plot any combination of two or three of its Principal Components (PCs) to create 2D or 3D graphs, respectively. This can be done with a wide variety of programs, including PAST, which is freely available here.

To produce a 2D graph, open a Global25 datasheet in PAST, choose comma as the separator, highlight any two columns of data, click on the "Plot" tab and, from the drop down list, pick "XY graph". Below is a series of graphs that I created in exactly this way. I also color coded the samples according to their geographic origins. This was done by ticking the "Row attributes" tab.

PAST can also be used to run PCA on subsets of the Global25 scaled data to produce remarkably accurate plots of fine-scale population structure. For instance, here's a plot based on present-day populations from north of the Alps, Balkans and Pyrenees.

To try this create a new text file with your choice of populations from the Global25 scaled datasheet, open it with PAST and choose Multivariate > Ordination > Principal Components Analysis. I've already put together several datasheets limited to European, Northern European, West Eurasian and South Asian populations. They're available at the links below along with more details on how to run them with PAST.

Global25 workshop 1: that classic West Eurasian plot

Global25 workshop 2: intra-European variation

Global25 workshop 3: genes vs geography in Northern Europe

The South Asian cline that no longer exists

Another free, easy to use online tool that works with Global25 coordinates is the Vahaduo Global25 Views [LINK]. Below is a screen cap of me checking out one of the many PCA that it offers.

And if you're fond of tree-like structures as a means to describe fine-scale genetic variation, please see this blog post...

Global25 workshop 4: a neighbour joining tree

See also...

New Global25 interpretation tools

Sunday, June 9, 2019

Global25 nMonte runner

Those of you who are having trouble with making use of your Global25 coordinates on your own computers, please be aware that there's an online tool that might be of help. It's called the Global25 nMonte runner and very easy to use. For more info see here.

Saturday, August 25, 2018

Global25 workshop 3: genes vs geography in Northern Europe

To produce the intra-North European Principal Components Analysis (PCA) plot below, download this datasheet, plug it into the PAST program, which is freely available here, then select all of the columns by clicking on the empty tab above the labels, and choose Multivariate > Ordination > Principal Components or Discriminant Analysis. This is what you should end up with...

I'd say that the result more or less resembles a geographic map of Northern Europe. Of course, if you're in the possession of your own personal Global25 coordinates, you can add yourself to this plot to check whether your position matches your geographic origin.

Please keep in mind, however, that the vast majority (>90%) of your ancestry must be from north of the Alps, Balkans and Pyrenees to obtain a sensible outcome. Also please ensure that all of the columns in the datasheet are filled out correctly, including the group column, otherwise your position on the plot will be skewed.

See also...

Global25 workshop 1: that classic West Eurasian plot

Global25 workshop 2: intra-European variation

Global25 workshop 3: genes vs geography in Northern Europe

Global25 workshop 4: a neighbour joining tree

Modeling genetic ancestry with Davidski: step by step

Getting the most out of the Global25

Genetic ancestry online store (to be updated regularly)

Search This Blog