Search This Blog

Thursday, September 6, 2018

Global25 nMonte runner


Those of you who are having trouble with making use of your Global25 coordinates on your own computers, please be aware that there's an online tool that might be of help. It's called the Global25 nMonte runner and very easy to use. For more info see here.


See also...

Genetic ancestry online store (to be updated regularly)

Modeling genetic ancestry with Davidski: step by step

If you're using my tools to find Jewish ancestry please read this

Saturday, August 25, 2018

Global25 workshop 3: genes vs geography in Northern Europe


To produce the intra-North European Principal Components Analysis (PCA) plot below, download this datasheet, plug it into the PAST program, which is freely available here, then select all of the columns by clicking on the empty tab above the labels, and choose Multivariate > Ordination > Principal Components or Discriminant Analysis.


I'd say that the result more or less resembles a geographic map of Northern Europe. Of course, if you're in the possession of your own personal Global25 coordinates, you can add yourself to this plot to check whether your position matches your geographic origin.

Please keep in mind, however, that the vast majority (>90%) of your ancestry must be from north of the Alps, Balkans and Pyrenees to obtain a sensible outcome. Also please ensure that all of the columns in the datasheet are filled out correctly, including the group column, otherwise your position on the plot will be skewed.

See also...

Global25 workshop 1: that classic West Eurasian plot

Global25 workshop 2: intra-European variation

Global25 workshop 3: genes vs geography in Northern Europe

Global25 PAST-compatible datasheets

Modeling genetic ancestry with Davidski: step by step

Genetic ancestry online store (to be updated regularly)

Sunday, August 12, 2018

Global25 workshop 2: intra-European variation


Even though the Global25 focuses on world-wide human genetic diversity, it can also reveal a lot of information about genetic substructures within continental regions.

Several of the dimensions, for instance, reflect Balto-Slavic-specific genetic drift. I ensured that this would be the case by running a lot of Slavic groups in the analysis. A useful by-product of this strategy is that the Global25 is very good at exposing relatively recent intra-European genetic variation.

To see this for yourself, download the datasheet below and plug it into the PAST program, which is freely available here. Then select all of the columns by clicking on the empty tab above the labels, and choose Multivariate > Ordination > Principal Components.

G25_Europe_scaled.dat

You should end up with the plot below. Note that to see the group labels and outlines, you need to tick the appropriate boxes in the panel to the right of the image. To improve the experience, it might also be useful to color-code different parts of Europe, and you can do that by choosing Edit > Row colors/symbols. Of course, if you have Global25 coordinates you can add yourself to the datasheet to see where you plot.


Components 1 and 2 pack the most information and, more or less, recapitulate the geographic structure of Europe. However, many details can only be seen by plotting the less significant components. For instance, a plot of components 1 and 3 almost perfectly separates Northeastern Europe into two distinct clusters made up of the speakers of Indo-European and Finno-Ugric languages.


This plot might also be useful for exploring potential Jewish ancestry, because Ashkenazi, Italian and Sephardi Jews appear to be relatively distinct in this space. Thus, people with significant European Jewish ancestry will "pull" towards the lower left corner of the plot. For example, someone who is half Ashkenazi and half German will probably land in the empty space between the Northwest Europeans and Jews.

See also...

Global25 workshop 1: that classic West Eurasian plot

Global25 workshop 3: genes vs geography in Northern Europe

Global25 PAST-compatible datasheets

Modeling genetic ancestry with Davidski: step by step

Genetic ancestry online store (to be updated regularly)

Global25 workshop 1: that classic West Eurasian plot


In this Global25 workshop I'm going to show how to reproduce, more or less, that classic plot of West Eurasian genetic diversity seen regularly in ancient DNA papers and at this blog (for instance, here). To do this you'll need the datasheet below, which I'll be updating regularly, and the PAST program, which is freely available here.

G25_West_Eurasia_scaled.dat

This is what you'll get if you follow my instructions to the letter. Note the fairly strong correlation with geography. I think this is impressive for so many reasons.

OK, so, download the said datasheet, plug it into PAST, select columns 1 to 8, and go to Multivariate > Ordination > Principal Components. Here's a screen cap of me doing it:


The initial output won't resemble my plot above. So you'll need to place PC2 on the X axis, PC1 on the Y axis, and set the image size to 1206x706. After doing that, you should end up with exactly this:


Then, export the image, flip it horizontally with whatever imaging software that can do the job, and that's it, unless you want to add some labels like I did. Feel free to ask questions and make suggestions in the comments below.

See also...

Global25 workshop 2: intra-European variation

Global25 workshop 3: genes vs geography in Northern Europe

Global25 PAST-compatible datasheets

Modeling genetic ancestry with Davidski: step by step

Genetic ancestry online store (to be updated regularly)

Global25 PAST-compatible datasheets


I'm planning to run regular workshops over the next few months on how to get the most out of Global25 data with various programs, and expecially PAST (see here). So if you have Global25 coordinates, please stay tuned.

To that end, I've put together four color-coded, PAST-compatible Global25 datasheets with thousands of present-day and ancient samples, available at the links below:

Global_25_PCA.dat

Global_25_PCA_pop_averages.dat

Global_25_PCA_scaled.dat

Global_25_PCA_pop_averages_scaled.dat

PAST is an awesome little statistical program and simple to use. The manual is available here. To kick things off, here's a quick guide how to run a Neighbor Joining tree on your Global25 coordinates:

- download the Global_25_PCA_pop_averages_scaled.dat from the last link above

- open the dat file with something a little more advanced than Windows notepad, like, say, TextPad (see here)

- stick your scaled coordinates at the bottom of the sheet, so that they look exactly like those of the other samples, except give yourself an original symbol, like, say, a black star

- open the edited dat file with PAST and choose all of the columns and rows by clicking the empty tab above the labels

- then, at the top, go to Multivariate > Clustering > Neighbor joining

After a few seconds you should see a nice, color-coded tree like the one below, except you'll also be on it, in black text. I'm very happy with these results, by the way. As far as I can see, all of the populations and individuals cluster exactly where they should.


Those of you who are already very proficient in using PAST, feel free to go nuts with these new datasheets and show us the results in the comments below. I'll try to put together a workshop for beginners within the next couple of weeks.

See also...

Global25 workshop 1: that classic West Eurasian plot

Global25 workshop 2: intra-European variation

Global25 workshop 3: genes vs geography in Northern Europe

Modeling genetic ancestry with Davidski: step by step

Genetic ancestry online store (to be updated regularly)

Monday, March 19, 2018

If you're using my tools to find Jewish ancestry please read this


It's come to my attention that many people are still using the Jtest and taking the results very seriously. Indeed, perhaps too seriously.

Also, some users are doing weird stuff with the Jtest output in an attempt to estimate their supposedly "true" Ashkenazi ancestry proportions, like multiplying their Ashkenazi coefficient by three, because Ashkenazi Jews "only" score around 30% Ashkenazi in this test. Ouch! Please don't do that!

Let me reiterate that this test was only supposed to be a fun experiment. It was never meant to be the definitive online Ashkenazi ancestry test. And even as fun experiments with ADMIXTURE go, it's now horribly outdated, and probably useless for anyone with less than 15-20% Ashkenazi ancestry.

So it might be time to move on. If you really want to confirm your Jewish ancestry, either or both Ashkenazi and Sephardi, then you need to look at much more powerful and sophisticated options. One of these options is the Global25 analysis (see HERE), which can pick up minor Jewish ancestry of just a few per cent. But it's not free (USD $12), and it's a DIY test that requires a bit of time and effort to get the most out of it. Also, you'd need to send me your autosomal file so that I can estimate your Global25 coordinates. But I can help you get started and even quickly check if you have any hope at all of confirming Jewish ancestry.

If, for whatever reason, you'd rather not take advantage of the Global25 offer, because, say, you don't want to share your data with me, then it might be an idea to join the Anthrogenica discussion board and ask the experienced members there about other options [LINK].

In any case, whatever you choose to do, please remember the following points, and feel free to share them with others who are still using the Jtest:

- do not multiply your Jtest Ashkenazi score by 3 in an attempt to find your "true" Ashkenazi ancestry proportion, because this won't work for the vast majority of users

- but do compare your Jtest Ashkenazi score to those of other people of the same or very similar ancestry to yours to get a rough idea whether you might have any Ashkenazi ancestry (the Jtest population averages will be useful for this, see here)

- if you're still not sure what your Jtest results mean, then just focus on your Jtest Oracle-4 output at GEDmatch, and if you don't see AJ at the top of the oracle list, then this is a strong signal that you don't have substantial Ashkenazi ancestry

See also...

Global25 workshop 1: that classic West Eurasian plot

Global25 workshop 2: intra-European variation

Global25 workshop 3: genes vs geography in Northern Europe

Global25 PAST-compatible datasheets

Modeling genetic ancestry with Davidski: step by step

Genetic ancestry online store (to be updated regularly)

Sunday, February 18, 2018

The powerful Global 25 now available via the Eurogenes genetic ancestry online store


Following a rigorous testing phase, the awesome Global 25 analysis is now available via my genetic ancestry online store for $12 USD (see here). What's so awesome about this test, you might ask? See here and here.


Please send your request, autosomal genotype data (from AncestryDNA, FTDNA, LivingDNA, MyHeritage or 23andMe) and money (via PayPal) to eurogenesblog at gmail dot com.

However, note that this test is free for anyone who already has Global 10 coordinates (see here). That's right, if you already have Global 10 coordinates, all you have to do is to send me your data and say what it's for. Simple as that.

See also...

Global25 workshop 1: that classic West Eurasian plot

Global25 workshop 2: intra-European variation

Global25 workshop 3: genes vs geography in Northern Europe

Global25 PAST-compatible datasheets

Modeling genetic ancestry with Davidski: step by step

Thursday, February 15, 2018

Modeling genetic ancestry with Davidski: step by step


There are many different ways to model your genetic ancestry. I prefer the Global25/nMonte method (see here). This is a step by step guide to modeling ancient ancestry proportions with this simple but powerful method using my own genome.


As far as I know, the vast majority of my recent ancestors came from the northern half of Europe. This may or may not be correct, but it gives me somewhere to start, so that I can come up with a coherent model. If you don't have this sort of information, because, perhaps, you were adopted, then just look in the mirror, and work from there. Like I say, it's not imperative that you know anything whatsoever about your ancestry, because your genetic data will do the talking, but you do need a model when modeling.

In scientific literature nowadays, Northern Europeans are often described as a three-way mixture between Yamnaya-related pastoralists, Anatolian-derived early farmers, and Western European Hunter-Gatherers (WHG). So let's see if this model works for me. Obviously, if it does, then it'll confirm the information that I have about my origins, but it might also reveal finer details that I'm not aware of. The datasheet that I'm using for this model is available here.

[1] distance%=6.9025 / distance=0.069025

Davidski

Yamnaya_Samara 53.9
Barcin_N 30.75
Rochedane 15.35
Tepecik_Ciftlik_N 0

Yep, the model does work, with a fairly reasonable distance of almost 7%. The ancestry proportions more or less match those from scientific literature and the plethora of analyses that I've featured at this blog on the topic. Please note that I've kept things very simple, using only four reference populations and individuals as proxies for four distinct streams of ancestry. But I've put my own twist on this Neolithic/Bronze Age model by including two populations from Neolithic Anatolia (Barcin_N and Tepecik_Ciftlik_N), just to see what would happen. The WHG proxy is Rochedane.

Admittedly, though, my Yamnaya cut of ancestry appears somewhat bloated at over 53%, and the model's distance is a little higher than what I normally see for really strong models. So let's check if I can get a better fitting and more sensible result by adding a slightly more easterly forager proxy than Rochedane: Narva_Lithuania.

[1] distance%=5.9331 / distance=0.059331

Davidski

Yamnaya_Samara 45.75
Barcin_N 31.45
Narva_Lithuania 22.8
Rochedane 0
Tepecik_Ciftlik_N 0

The statistical fit does improve, and when given a choice between Rochedane and Narva_Lithuania, the algorithm picks the latter as the only source of extra forager input in my genome.

What could this mean? It might mean that a large part of my ancestry derives from the Baltic region. Actually, I know for a fact that this is true. But even if I had no idea about my genealogy, this result would be a very strong hint about my genetic origins. Indeed, let's follow this trail and try to further improve the fit of the model by adding a more relevant Yamnaya-related proxy, such as early Baltic Corded Ware (CWC_Baltic_early).

[1] distance%=5.444 / distance=0.05444

Davidski

CWC_Baltic_early 54.95
Barcin_N 26.7
Narva_Lithuania 18.35
Rochedane 0
Tepecik_Ciftlik_N 0
Yamnaya_Samara 0

Holy shit! To be honest, I wasn't expecting this sort of resolution and accuracy, and I can't promise that everyone using the Global25/nMonte method will see such incredibly nuanced outcomes, but this isn't a fluke. It can't be, because it gels so well with everything that I know about my ancestry. Please note also that I belong to Y-chromosome haplogroup R1a-M417, which is a lineage intimately associated with the Corded Ware expansion across Northern Europe (for instance, see here).

But of course, the Baltic and nearby regions haven't been isolated from migrations and invasions since the Corded Ware times. For instance, at some point, probably during the Bronze Age, Uralic-speaking peoples moved west across the forest zone of Northeastern Europe and into the East Baltic and northern Scandinavia. It's generally accepted that they brought Siberian admixture with them (see here). Moreover, from the Iron Age to the Middle Ages, East Central Europe was under intense pressure from a wide range of nomadic steppe groups with complex ancestry, such as the Sarmatians, Avars, Huns, and Mongolians. Did any of these peoples leave their mark on my genome? At the risk of overfitting the model, let's explore this possibility by adding a few more reference populations.

[1] distance%=5.444 / distance=0.05444

Davidski

CWC_Baltic_early 54.95
Barcin_N 26.7
Narva_Lithuania 18.35
Han 0
Mongolian 0
Nganassan 0
Rochedane 0
Sarmatian_Pokrovka 0
Tepecik_Ciftlik_N 0
Yamnaya_Samara 0

Nothing changes when I add the Han Chinese, Mongolians, Nganassans (an Uralic people from Siberia), and Sarmatians to the model. But what about if I throw in the only ancient Slav in my datasheet?

[1] distance%=2.9904 / distance=0.029904

Davidski

Slav_Bohemia 85.9
CWC_Baltic_early 7.7
Narva_Lithuania 6.4
Barcin_N 0
Rochedane 0
Tepecik_Ciftlik_N 0
Yamnaya_Samara 0

Considering that the vast majority of my recent ancestors were Poles, thus a Slavic-speaking people from near the Baltic, this outcome makes perfect sense. And check out the new distance! But the problem now is that I'm overfitting the model by using two very similar and probably very closely related references, CWC_Baltic_early and Slav_Bohemia. And overfitting should be avoided at all costs. So it might be useful to break up this effort into two models: one focusing on the Neolithic and Bronze Age, and the other on the Iron Age and Middle Ages. I'll do that soon, but not just yet, because there are still too few Iron Age and Medieval samples available from the Baltic region and surrounds for meaningful analyses of this type.

For a more technical guide to running Global25-type data with nMonte, please refer to this post by regular Eurogenes commentator Onur: An nMonte and 4mix guide for the participants of the Basal-rich K7 and/or Global 10 tests.

See also...

Global25 workshop 1: that classic West Eurasian plot

Global25 workshop 2: intra-European variation

Global25 workshop 3: genes vs geography in Northern Europe

Global25 PAST-compatible datasheets

Genetic ancestry online store (to be updated regularly)

Tuesday, January 30, 2018

Support this blog, buy a Haplotee


If you buy a Haplotee or any other DNAGeeks merchandise through this blog via this LINK, I'll get some cash.

Why is this important? Because 2018 is going to be a huge year for population genetics, and especially for ancient DNA, and if this blog is also going to be huge, then I'll need some money. So if you like this blog, or even if you hate it, but you like spending time here hating it, then buy a Haplotee. Or several.


Please note also that I've recently launched a genetic ancestry online store, which will be updated regularly with different tests throughout the year (see here). By purchasing tests from the store, you'll not only be helping to make this blog awesome, but also getting amongst the most accurate ancestry analyses available anywhere. Thank you for your support.