Search This Blog

Monday, March 19, 2018

If you're using my tools to find Jewish ancestry please read this


It's come to my attention that many people are still using the Jtest and taking the results very seriously. Indeed, perhaps too seriously.

Also, some users are doing weird stuff with the Jtest output in an attempt to estimate their supposedly "true" Ashkenazi ancestry proportions, like multiplying their Ashkenazi coefficient by three, because Ashkenazi Jews "only" score around 30% Ashkenazi in this test. Ouch! Please don't do that!

Let me reiterate that this test was only supposed to be a fun experiment. It was never meant to be the definitive online Ashkenazi ancestry test. And even as fun experiments with ADMIXTURE go, it's now horribly outdated, and probably useless for anyone with less than 15-20% Ashkenazi ancestry.

So it might be time to move on. If you really want to confirm your Jewish ancestry, either or both Ashkenazi and Sephardi, then you need to look at much more powerful and sophisticated options. One of these options is the Global25 analysis (see HERE), which can pick up minor Jewish ancestry of just a few per cent. But it's not free (USD $12), and it's a DIY test that requires a bit of time and effort to get the most out of it. Also, you'd need to send me your autosomal file so that I can estimate your Global25 coordinates. But I can help you get started and even quickly check if you have any hope at all of confirming Jewish ancestry.

If, for whatever reason, you'd rather not take advantage of the Global25 offer, because, say, you don't want to share your data with me, then it might be an idea to join the Anthrogenica discussion board and ask the experienced members there about other options [LINK].

In any case, whatever you choose to do, please remember the following points, and feel free to share them with others who are still using the Jtest:

- do not multiply your Jtest Ashkenazi score by 3 in an attempt to find your "true" Ashkenazi ancestry proportion, because this won't work for the vast majority of users

- but do compare your Jtest Ashkenazi score to those of other people of the same or very similar ancestry to yours to get a rough idea whether you might have any Ashkenazi ancestry (the Jtest population averages will be useful for this, see here)

- if you're still not sure what your Jtest results mean, then just focus on your Jtest Oracle-4 output at GEDmatch, and if you don't see AJ at the top of the oracle list, then this is a strong signal that you don't have substantial Ashkenazi ancestry

See also...

Genetic ancestry online store (to be updated regularly)

Sunday, February 18, 2018

The powerful Global 25 now available via the Eurogenes genetic ancestry online store


Following a rigorous testing phase, the awesome Global 25 analysis is now available via my genetic ancestry online store for $12 USD (see here). What's so awesome about this test, you might ask? See here and here.


Please send your request, autosomal genotype data (from AncestryDNA, FTDNA, LivingDNA, MyHeritage or 23andMe) and money (via PayPal) to eurogenesblog at gmail dot com.

However, note that this test is free for anyone who already has Global 10 coordinates (see here). That's right, if you already have Global 10 coordinates, all you have to do is to send me your data and say what it's for. Simple as that.

Thursday, February 15, 2018

Modeling genetic ancestry with Davidski: step by step


There are many different ways to model your genetic ancestry. I prefer the Global25/nMonte method (see here). This is a step by step guide to modeling ancient ancestry proportions with this simple but powerful method using my own genome.


As far as I know, the vast majority of my recent ancestors came from the northern half of Europe. This may or may not be correct, but it gives me somewhere to start, so that I can come up with a coherent model. If you don't have this sort of information, because, perhaps, you were adopted, then just look in the mirror, and work from there. Like I say, it's not imperative that you know anything whatsoever about your ancestry, because your genetic data will do the talking, but you do need a model when modeling.

In scientific literature nowadays, Northern Europeans are often described as a three-way mixture between Yamnaya-related pastoralists, Anatolian-derived early farmers, and Western European Hunter-Gatherers (WHG). So let's see if this model works for me. Obviously, if it does, then it'll confirm the information that I have about my origins, but it might also reveal finer details that I'm not aware of. The datasheet that I'm using for this model is available here.

[1] distance%=6.9025 / distance=0.069025

Davidski

Yamnaya_Samara 53.9
Barcin_N 30.75
Rochedane 15.35
Tepecik_Ciftlik_N 0

Yep, the model does work, with a fairly reasonable distance of almost 7%. The ancestry proportions more or less match those from scientific literature and the plethora of analyses that I've featured at this blog on the topic. Please note that I've kept things very simple, using only four reference populations and individuals as proxies for four distinct streams of ancestry. But I've put my own twist on this Neolithic/Bronze Age model by including two populations from Neolithic Anatolia (Barcin_N and Tepecik_Ciftlik_N), just to see what would happen. The WHG proxy is Rochedane.

Admittedly, though, my Yamnaya cut of ancestry appears somewhat bloated at over 53%, and the model's distance is a little higher than what I normally see for really strong models. So let's check if I can get a better fitting and more sensible result by adding a slightly more easterly forager proxy than Rochedane: Narva_Lithuania.

[1] distance%=5.9331 / distance=0.059331

Davidski

Yamnaya_Samara 45.75
Barcin_N 31.45
Narva_Lithuania 22.8
Rochedane 0
Tepecik_Ciftlik_N 0

The statistical fit does improve, and when given a choice between Rochedane and Narva_Lithuania, the algorithm picks the latter as the only source of extra forager input in my genome.

What could this mean? It might mean that a large part of my ancestry derives from the Baltic region. Actually, I know for a fact that this is true. But even if I had no idea about my genealogy, this result would be a very strong hint about my genetic origins. Indeed, let's follow this trail and try to further improve the fit of the model by adding a more relevant Yamnaya-related proxy, such as early Baltic Corded Ware (CWC_Baltic_early).

[1] distance%=5.444 / distance=0.05444

Davidski

CWC_Baltic_early 54.95
Barcin_N 26.7
Narva_Lithuania 18.35
Rochedane 0
Tepecik_Ciftlik_N 0
Yamnaya_Samara 0

Holy shit! To be honest, I wasn't expecting this sort of resolution and accuracy, and I can't promise that everyone using the Global25/nMonte method will see such incredibly nuanced outcomes, but this isn't a fluke. It can't be, because it gels so well with everything that I know about my ancestry. Please note also that I belong to Y-chromosome haplogroup R1a-M417, which is a lineage intimately associated with the Corded Ware expansion across Northern Europe (for instance, see here).

But of course, the Baltic and nearby regions haven't been isolated from migrations and invasions since the Corded Ware times. For instance, at some point, probably during the Bronze Age, Uralic-speaking peoples moved west across the forest zone of Northeastern Europe and into the East Baltic and northern Scandinavia. It's generally accepted that they brought Siberian admixture with them (see here). Moreover, from the Iron Age to the Middle Ages, East Central Europe was under intense pressure from a wide range of nomadic steppe groups with complex ancestry, such as the Sarmatians, Avars, Huns, and Mongolians. Did any of these peoples leave their mark on my genome? At the risk of overfitting the model, let's explore this possibility by adding a few more reference populations.

[1] distance%=5.444 / distance=0.05444

Davidski

CWC_Baltic_early 54.95
Barcin_N 26.7
Narva_Lithuania 18.35
Han 0
Mongolian 0
Nganassan 0
Rochedane 0
Sarmatian_Pokrovka 0
Tepecik_Ciftlik_N 0
Yamnaya_Samara 0

Nothing changes when I add the Han Chinese, Mongolians, Nganassans (an Uralic people from Siberia), and Sarmatians to the model. But what about if I throw in the only ancient Slav in my datasheet?

[1] distance%=2.9904 / distance=0.029904

Davidski

Slav_Bohemia 85.9
CWC_Baltic_early 7.7
Narva_Lithuania 6.4
Barcin_N 0
Rochedane 0
Tepecik_Ciftlik_N 0
Yamnaya_Samara 0

Considering that the vast majority of my recent ancestors were Poles, thus a Slavic-speaking people from near the Baltic, this outcome makes perfect sense. And check out the new distance! But the problem now is that I'm overfitting the model by using two very similar and probably very closely related references, CWC_Baltic_early and Slav_Bohemia. And overfitting should be avoided at all costs. So it might be useful to break up this effort into two models: one focusing on the Neolithic and Bronze Age, and the other on the Iron Age and Middle Ages. I'll do that soon, but not just yet, because there are still too few Iron Age and Medieval samples available from the Baltic region and surrounds for meaningful analyses of this type.

For a more technical guide to running Global25-type data with nMonte, please refer to this post by regular Eurogenes commentator Onur: An nMonte and 4mix guide for the participants of the Basal-rich K7 and/or Global 10 tests.

Tuesday, January 30, 2018

Support this blog, buy a Haplotee


If you buy a Haplotee or any other DNAGeeks merchandise through this blog via this LINK, I'll get some cash.

Why is this important? Because 2018 is going to be a huge year for population genetics, and especially for ancient DNA, and if this blog is also going to be huge, then I'll need some money. So if you like this blog, or even if you hate it, but you like spending time here hating it, then buy a Haplotee. Or several.


Please note also that I've recently launched a genetic ancestry online store, which will be updated regularly with different tests throughout the year (see here). By purchasing tests from the store, you'll not only be helping to make this blog awesome, but also getting amongst the most accurate ancestry analyses available anywhere. Thank you for your support.

Tuesday, October 31, 2017

Genetic ancestry online store (to be updated regularly)


It's an unfortunate reality that most commercial genetic ancestry tests out there are rather lame. They're not wrong per se, but that's probably the best that can be said about them. And let's be honest, that's no longer enough considering how far this area of science has come in recent years.

To try and remedy this problem, I'll be offering a wide range of highly accurate and unique, but low cost, ancestry tests here, in my makeshift online store, based on analyses on my other blog (see here). These tests will focus on either recent or ancient ancestry, or both, using the latest reference samples from scientific literature whenever possible. To make a purchase, send your request, autosomal genotype data (from AncestryDNA, FTDNA, LivingDNA, MyHeritage or 23andMe) and money (via PayPal) to eurogenesblog at gmail dot com.

Let's start things rolling with my genetic and linguistic landscape of Europe north of the Alps, Balkans and Pyrenees (see here). For a mere $6 USD I will pinpoint your location on the plot below amongst a variety of modern-day and ancient individuals. You'll also receive the principal component coordinates, which you can use to model your ancestry proportions (for instance, like here). Please keep in mind, however, that to ensure sensible results in this particular analysis, practically all of your ancestry has to derive from Central, Eastern and/or Northern Europe. Most of my other tests won't be so restrictive.

The relevant datasheet is available here. I'll be updating this plot regularly with many more ancient samples as they become available, but your coordinates will remain relevant as I do so.

...

Following a rigorous testing phase, the awesome Global 25 analysis is now available at the store for $12 USD (see here). What's so awesome about this test, you might ask? See here and here.


Please send your request, autosomal genotype data (from AncestryDNA, FTDNA, LivingDNA, MyHeritage or 23andMe) and money (via PayPal) to eurogenesblog at gmail dot com.

However, note that this test is free for anyone who already has Global 10 coordinates (see here). That's right, if you already have Global 10 coordinates, all you have to do is to send me your data and say what it's for. Simple as that.

...

The popular Basal-rich K7 admixture test is now available via the store for $6 USD. It's suitable for everyone, except people with significant (>10%) Sub-Saharan ancestry. For more information about this test and some ideas about what to do with the output see here and here.


Please send your request, autosomal genotype data (from AncestryDNA, FTDNA, LivingDNA, MyHeritage or 23andMe) and money (via PayPal) to eurogenesblog at gmail dot com.

Thursday, August 10, 2017

Basal-rich K7 & Global 10 updates (10/08/2017)


I've updated the Basal-rich K7 spreadsheet and the Global 10 datasheets with a plethora of ancient individuals and populations, including Anglo-Saxons, British Celts (labeled England_IA), Minoans, Mycenaeans, Bronze Age Iberians and many more.

Basal-rich K7 spreadsheet

Global 10 main datasheet

Global 10 ancient averages datasheet

Please keep in mind that the K7 can be somewhat conservative with minor ancestry proportions, especially Ancient North Eurasian (ANE) admixture, and low coverage samples can behave in odd ways in the Global 10. So when modeling ancestry with ancient samples it might be useful to stick to high coverage individuals that show consistent results. If you don't know what the Basal-rich K7 and Global 10 are, then these links will be useful.

The Basal-rich K7

Global 10: A fresh look at global genetic diversity

An nMonte and 4mix guide for the participants of the Basal-rich K7 and/or Global 10 tests

Please note that the discussion pertaining to this post is at my other blog HERE.

Monday, May 8, 2017

An nMonte and 4mix guide for the participants of the Basal-rich K7 and/or Global 10 tests


Copied from a thread at the Anthrogenica forum because unfortunately it seems that a lot of people can't access the post:

This is an nMonte and 4mix guide I have written for people who donated to the Eurogenes Project in order to take part in the Basal-rich K7 and/or Global 10 tests of that project and subsequently received their test results. For information on how to participate in one or both of the Basal-rich K7 and Global 10 tests, see the link below:

Fund-raising offer: Basal-rich K7 and/or Global 10 genetic map

In your results you receive from Davidski by email, you are provided with your Basal-rich K7 component percentages and your position on the Basal-rich K7 PCA if you took the Basal-rich K7 test, and your Global 10 PCA coordinates and your position on the Global 10 PCA if you took the Global 10 test. You will need your Basal-rich K7 component percentages and/or Global 10 PCA coordinates in order to make use of nMonte and 4mix, which allow you to be modeled as a mix different populations in varying ancestry percentages and varying distance levels based on either of your Basal-rich K7 and Global 10 results. You can download nMonte and 4mix from these links respectively:

nMonte

4Mix

Because that it can run multiple targets at the same time, I gave the link to 4mix_multi rather than classical 4mix. They are basically the same in all other aspects.

In order to use nMonte and 4mix you need to have the R software installed on your PC. You can download it from one of the mirrors here:

CRAN mirrors

Making a target file for Basal-rich K7:

Open Notepad and copy and paste the Basal-rich K7 component names and your Basal-rich K7 component percentages along with your name in this format:


Note the use of commas. Save the file as target.

If you will use your target file with 4mix_multi, you can add multiple targets to it. So if you have participated in the Eurogenes tests with multiple individuals, you can add them together to your target file if you will use it with 4mix_multi. This will allow you to get their 4mix results at the same run. Below is shown how to add multiple targets to your target file:


Note that classical 4mix and nMonte cannot run multiple targets at the same time, so your target file should have only one target if you will use it with classical 4mix or nMonte.

Making a target file for Global 10:

Open Notepad and copy and paste the Global 10 PCA coordinate names and your Global 10 PCA coordinates along with your name in this format:


Save the file as target.

As in the target file for Basal-rich K7, you can add multiple targets to your target file for Global 10 if you will use it with 4mix_multi.

Using nMonte with R:

First make an input file for the kind of modeling you want to make. Making an input file for nMonte is similar to making a target file (whether to use with Basal-rich K7 or Global 10). The difference is that, instead of yours or of other people you want to model, you add the Basal-rich K7 component percentages (if you will use it with Basal-rich K7) or Global 10 PCA coordinates (if you will use it with Global 10) of the population averages or individual population members from the Basal-rich K7 spreadsheet or Global 10 datasheet you want to use as references in your modeling. Below are the links of the Basal-rich K7 spreadsheet and Global 10 datasheet respectively:

Basal-rich K7 spreadsheet

Global 10 datasheet

Save the input file as input.

Here is an example of a Basal-rich K7 input file for nMonte:


Here is an example of a Global 10 input file for nMonte:


nMonte can run an endless number of reference population averages and individuals, so you can enrich the list generously in your nMonte input files.

Before using nMonte with R, make sure that the nMonte R files, input file and target file are in the same directory. Open R and click Change dir… from the File menu and choose the directory where the nMonte R files, input file and target file are located. Then write source(‘nMonte.r’) in the R command prompt (write source(‘nMonte2.r’) instead if you want to make use of more functions of nMonte) and press enter. Now write getMonte(‘input.txt’,’target.txt’), press enter and enjoy your result!

Using 4mix_multi with R:

In 4mix and its 4mix_multi version, you do not need to specify the reference population averages and/or individuals to use in your modeling in the input file. Instead, you can copy and paste all the population averages and individual population members from the Basal-rich K7 spreadsheet or Global 10 datasheet to the input file, albeit in the comma-separated format shown above. Once you hade make that input file, save it as Basal-rich_K7 if it consists of Basal-rich K7 data and as Global_10 if it consists of Global 10 data (the links of the Basal-rich K7 spreadsheet and Global 10 datasheet have already been provided in the section Using nMonte with R above).

Before using 4mix_multi with R, make sure that the 4mix_multi R file, input file and target file are in the same directory. Open R and click Change dir… from the File menu and choose the directory where the 4mix_multi R file, input file and target file are located. Then write source(‘4mix_multi.r’) in the R command prompt and press enter. In 4mix and its 4mix_multi version, reference specification is done in the command prompt.

So now for Basal-rich K7 you will write:

getMix(‘Basal-rich_K7.txt’,’target.txt’,’ref1’,’ref2’,’ref3’,’re f4’)

For Global 10 you will write:

andgetMix(‘Global_10.txt’,’target.txt’,’ref1’,’ref2’, ’ref3’,’ref4’)

In these arguments ref1, ref2, ref3 and ref4 refer to the names of the reference population averages and/or population individual members from the Basal-rich spreadsheet or Global 10 datasheet. An example for Basal-rich K7 would be:

getMix(‘Basal-rich_K7.txt’,’target.txt’,’Belarusian:average’,’Clovis:Anzick’,’Boncuklu_Neolithic:average’,’Palestinian:average’).

An example for Global 10 would be:

getMix(‘Global_10.txt’,’target.txt’,’Makrani’,’Levant_Neolithic:I1699’,’Bell_Beaker_Czech:RISE569’,’Latvian’).

Once you have specified the references for your modeling in the command prompt, you can press enter to enjoy the ensuing hurly burly!

Some tips:

Know thyself, i.e., choose the references in a way that would be most logical for your modeling in light of your known ancestry. Try to model yourself in many different ways to get a good sense of your ancestry from many different angles. Try to diminish the distance level to around 2-3% in your modeling, but refrain from overfitting, 5 or 6 references would be enough in nMonte most of the time. You can safely remove references that show only tiny contributions in most cases. And most important of all: be patient, it might take days of trials and errors for you to find a good modeling of yourself. But always keep in mind that there is no set in stone rule in modeling with nMonte and 4mix.

Onur Dinçer
FTDNA Anatolia-Balkans-Caucasus project admin
https://www.familytreedna.com/groups/anatol-balkan-caucas/about
https://www.facebook.com/groups/800912433320422/

See also...

A more specific guide to modeling your genome with the Global 10/nMonte method:

Your ancient ancestry #1