Search This Blog

Tuesday, October 31, 2017

Genetic ancestry online store (to be updated regularly)


It's an unfortunate reality that most commercial genetic ancestry tests out there are rather lame. They're not wrong per se, but that's probably the best that can be said about them. And let's be honest, that's no longer enough considering how far this area of science has come in recent years.

To try and remedy this problem, I'll be offering a wide range of highly accurate and unique, but low cost, ancestry tests here, in my makeshift online store, based on analyses on my other blog (see here). These tests will focus on either recent or ancient ancestry, or both, using the latest reference samples from scientific literature whenever possible. To make a purchase, send your request, autosomal genotype data (from AncestryDNA, FTDNA or 23andMe) and money (via PayPal) to eurogenesblog [at] gmail [dot] com.

Let's start things rolling with my genetic and linguistic landscape of Europe north of the Alps, Balkans and Pyrenees (see here). For a mere $6 USD I will pinpoint your location on the plot below amongst a variety of modern-day and ancient individuals. You'll also receive the principal component coordinates, which you can use to model your ancestry proportions or produce heat plots (for instance, like here). Please keep in mind, however, that to ensure sensible results in this particular analysis, practically all of your ancestry has to derive from Central, Eastern and/or Northern Europe. Most of my other tests won't be so restrictive.

I'll be updating this plot regularly with many more ancient samples as they become available, but your coordinates will remain relevant as I do so.

Please note that my online store will be closed throughout December, but stay tuned next year for many more offers.

See also...

Fund-raising offer: Basal-rich K7 and/or Global 10 genetic map

Sunday, September 10, 2017

Your ancient ancestry #1


This is the first of a series of guides to modeling your ancient ancestry with the Global 10/nMonte2 method.

I do already have a user guide for running Global 10 and Basal-rich K7 data with nMonte and 4Mix (see here). However, in this series I’m going to recommend specific models that produce results similar to those from my experiments with other methods, such as qpAdm, as well as from scientific literature. Hopefully, this will help users achieve more sensible and accurate outcomes, and avoid problems such as overfitting.

Let’s start with models for modern-day Europeans that focus on Yamnaya-related ancestry, which very likely represents a genetic signal of early Indo-European dispersals during the Early to Middle Bronze Age from the Pontic-Caspian steppe.

It’s now clear via a wide range of methods that about half of the genomes of modern-day Eastern and Northern Europeans, and up to about a quarter of the genomes of modern-day Southern Europeans, are derived from such Yamnaya-related sources. Any tests dealing with ancient European substructures that don’t, one way or another, reflect this robust inference must be considered inadequate.

So if my models are to be useful, then this is what they must show. And indeed they do. Here are a few examples focusing on modern-day and ancient England, in chronological order:

England_Iron_Age
Yamnaya_Samara 49.75
Barcin_N 32.3
Hungary_HG 17.95

distance%=0.5318 / distance=0.005318

England_Roman
Yamnaya_Samara 45.65
Barcin_N 33.35
Hungary_HG 21

distance%=0.4668 / distance=0.004668

England_Anglo-Saxon
Yamnaya_Samara 44.95
Barcin_N 31.6
Hungary_HG 23.45

distance%=0.5409 / distance=0.005409

English_Cornwall
Yamnaya_Samara 44.55
Barcin_N 36.95
Hungary_HG 18.5

distance%=0.3699 / distance=0.003699

English_Kent
Yamnaya_Samara 45.2
Barcin_N 36.85
Hungary_HG 17.95

distance%=0.4875 / distance=0.004875

The full output is available in a zip folder HERE. I’m not claiming that these ancestry proportions are perfect, especially for Southern Europeans, who generally have very complex ancestry, but they do make a lot of sense.

One obvious problem with the Global 10 is that some of its dimensions, or PCs, exaggerate affinity between modern-day and Mesolithic Europeans. This is especially true for PC6. Hence, to try and mitigate this problem I decided to remove PC6 from the Global 10 datasheet used in my analysis.

To try these models on your own genome, remove PC6 from your Global 10 coordinates file, and use the data text files provided in the zip folder linked to above. It’s best to rely on the datasheets specifically designed for your ethnic group or region of Europe. But feel free to tweak my models. There’s no harm in experimenting if you’re cautious and sensible about it. Indeed, using Iberia_HG or Loschbour along with Hungary_HG appears to produce more accurate outcomes for many Western Europeans.

The important, but often neglected, point to keep in mind is that I designed the Global 10 to help replicate results from more reliable but technically less accessible methods, and not to challenge any generally accepted models.

In the near future, a wider choice of ancient samples should enable me to fine tune and improve the models. For instance, a slightly more eastern-shifted forager reference population than Hungary_HG, such as the yet to be published Lithuanian Narva samples (see here), will probably shift the results slightly for Northeast Europeans, perhaps by bringing down their Yamnaya-related ancestry proportions by a few per cent.

Moreover, adding a wide range of yet to be published Middle to Late Neolithic European samples, such as those from the Globular Amphora Culture (GAC), should prove an interesting exercise.

Please note that the discussion pertaining to this post is at my other blog HERE.

See also...

Global 10: A fresh look at global genetic diversity

Thursday, August 10, 2017

Basal-rich K7 & Global 10 updates (10/08/2017)


I've updated the Basal-rich K7 spreadsheet and the Global 10 datasheets with a plethora of ancient individuals and populations, including Anglo-Saxons, British Celts (labeled England_IA), Minoans, Mycenaeans, Bronze Age Iberians and many more.

Basal-rich K7 spreadsheet

Global 10 main datasheet

Global 10 ancient averages datasheet

Please keep in mind that the K7 can be somewhat conservative with minor ancestry proportions, especially Ancient North Eurasian (ANE) admixture, and low coverage samples can behave in odd ways in the Global 10. So when modeling ancestry with ancient samples it might be useful to stick to high coverage individuals that show consistent results. If you don't know what the Basal-rich K7 and Global 10 are, then these links will be useful.

The Basal-rich K7

Global 10: A fresh look at global genetic diversity

An nMonte and 4mix guide for the participants of the Basal-rich K7 and/or Global 10 tests

Please note that the discussion pertaining to this post is at my other blog HERE.

Monday, May 8, 2017

An nMonte and 4mix guide for the participants of the Basal-rich K7 and/or Global 10 tests


Copied from a thread at the Anthrogenica forum because unfortunately it seems that a lot of people can't access the post:

This is an nMonte and 4mix guide I have written for people who donated to the Eurogenes Project in order to take part in the Basal-rich K7 and/or Global 10 tests of that project and subsequently received their test results. For information on how to participate in one or both of the Basal-rich K7 and Global 10 tests, see the link below:

Fund-raising offer: Basal-rich K7 and/or Global 10 genetic map

In your results you receive from Davidski by email, you are provided with your Basal-rich K7 component percentages and your position on the Basal-rich K7 PCA if you took the Basal-rich K7 test, and your Global 10 PCA coordinates and your position on the Global 10 PCA if you took the Global 10 test. You will need your Basal-rich K7 component percentages and/or Global 10 PCA coordinates in order to make use of nMonte and 4mix, which allow you to be modeled as a mix different populations in varying ancestry percentages and varying distance levels based on either of your Basal-rich K7 and Global 10 results. You can download nMonte and 4mix from these links respectively:

nMonte

4Mix

Because that it can run multiple targets at the same time, I gave the link to 4mix_multi rather than classical 4mix. They are basically the same in all other aspects.

In order to use nMonte and 4mix you need to have the R software installed on your PC. You can download it from one of the mirrors here:

CRAN mirrors

Making a target file for Basal-rich K7:

Open Notepad and copy and paste the Basal-rich K7 component names and your Basal-rich K7 component percentages along with your name in this format:


Note the use of commas. Save the file as target.

If you will use your target file with 4mix_multi, you can add multiple targets to it. So if you have participated in the Eurogenes tests with multiple individuals, you can add them together to your target file if you will use it with 4mix_multi. This will allow you to get their 4mix results at the same run. Below is shown how to add multiple targets to your target file:


Note that classical 4mix and nMonte cannot run multiple targets at the same time, so your target file should have only one target if you will use it with classical 4mix or nMonte.

Making a target file for Global 10:

Open Notepad and copy and paste the Global 10 PCA coordinate names and your Global 10 PCA coordinates along with your name in this format:


Save the file as target.

As in the target file for Basal-rich K7, you can add multiple targets to your target file for Global 10 if you will use it with 4mix_multi.

Using nMonte with R:

First make an input file for the kind of modeling you want to make. Making an input file for nMonte is similar to making a target file (whether to use with Basal-rich K7 or Global 10). The difference is that, instead of yours or of other people you want to model, you add the Basal-rich K7 component percentages (if you will use it with Basal-rich K7) or Global 10 PCA coordinates (if you will use it with Global 10) of the population averages or individual population members from the Basal-rich K7 spreadsheet or Global 10 datasheet you want to use as references in your modeling. Below are the links of the Basal-rich K7 spreadsheet and Global 10 datasheet respectively:

Basal-rich K7 spreadsheet

Global 10 datasheet

Save the input file as input.

Here is an example of a Basal-rich K7 input file for nMonte:


Here is an example of a Global 10 input file for nMonte:


nMonte can run an endless number of reference population averages and individuals, so you can enrich the list generously in your nMonte input files.

Before using nMonte with R, make sure that the nMonte R files, input file and target file are in the same directory. Open R and click Change dir… from the File menu and choose the directory where the nMonte R files, input file and target file are located. Then write source(‘nMonte.r’) in the R command prompt (write source(‘nMonte2.r’) instead if you want to make use of more functions of nMonte) and press enter. Now write getMonte(‘input.txt’,’target.txt’), press enter and enjoy your result!

Using 4mix_multi with R:

In 4mix and its 4mix_multi version, you do not need to specify the reference population averages and/or individuals to use in your modeling in the input file. Instead, you can copy and paste all the population averages and individual population members from the Basal-rich K7 spreadsheet or Global 10 datasheet to the input file, albeit in the comma-separated format shown above. Once you hade make that input file, save it as Basal-rich_K7 if it consists of Basal-rich K7 data and as Global_10 if it consists of Global 10 data (the links of the Basal-rich K7 spreadsheet and Global 10 datasheet have already been provided in the section Using nMonte with R above).

Before using 4mix_multi with R, make sure that the 4mix_multi R file, input file and target file are in the same directory. Open R and click Change dir… from the File menu and choose the directory where the 4mix_multi R file, input file and target file are located. Then write source(‘4mix_multi.r’) in the R command prompt and press enter. In 4mix and its 4mix_multi version, reference specification is done in the command prompt.

So now for Basal-rich K7 you will write:

getMix(‘Basal-rich_K7.txt’,’target.txt’,’ref1’,’ref2’,’ref3’,’re f4’)

For Global 10 you will write:

andgetMix(‘Global_10.txt’,’target.txt’,’ref1’,’ref2’, ’ref3’,’ref4’)

In these arguments ref1, ref2, ref3 and ref4 refer to the names of the reference population averages and/or population individual members from the Basal-rich spreadsheet or Global 10 datasheet. An example for Basal-rich K7 would be:

getMix(‘Basal-rich_K7.txt’,’target.txt’,’Belarusian:average’,’Clovis:Anzick’,’Boncuklu_Neolithic:average’,’Palestinian:average’).

An example for Global 10 would be:

getMix(‘Global_10.txt’,’target.txt’,’Makrani’,’Levant_Neolithic:I1699’,’Bell_Beaker_Czech:RISE569’,’Latvian’).

Once you have specified the references for your modeling in the command prompt, you can press enter to enjoy the ensuing hurly burly!

Some tips:

Know thyself, i.e., choose the references in a way that would be most logical for your modeling in light of your known ancestry. Try to model yourself in many different ways to get a good sense of your ancestry from many different angles. Try to diminish the distance level to around 0.005 (0.5%) or lower levels in your modeling, but refrain from overfitting, 5 or 6 references would be enough in nMonte most of the time. You can safely remove references that show only tiny contributions in most cases. And most important of all: be patient, it might take days of trials and errors for you to find a good modeling of yourself. But always keep in mind that there is no set in stone rule in modeling with nMonte and 4mix.

Onur Dinçer
FTDNA Anatolia-Balkans-Caucasus project admin
https://www.familytreedna.com/groups/anatol-balkan-caucas/about
https://www.facebook.com/groups/800912433320422/

See also...

A more specific guide to modeling your genome with the Global 10/nMonte method:

Your ancient ancestry #1

Sunday, March 19, 2017

Fund-raising offer: Basal-rich K7 and/or Global 10 genetic map


I'm now taking donations for 2017. Anyone who donates $12 USD or $16 AUD, or more, will get the Basal-rich K7 ancestry proportions. Of course, you'll need to send me your genotype data for that to happen (Ancestry.com, FTDNA or 23andMe).

The Basal-rich K7

Please send your non-tax deductible donations via PayPal to eurogenesblog at gmail dot com. E-mail your genotype data to the same address. Please don't assume that I already have your data. I'll try and get back to everyone within a day and will put things on hold if that becomes an unrealistic target.

Using your Basal-rich K7 ancestry proportions, I'll show you where you cluster in the new and improved Fateful Triangle. Many people will probably land somewhere along the cline made up of Late Neolithic/Early Bronze Age Europeans and Middle/Late Bronze Age steppe herders and warriors. This, to me, looks like a cline produced by the expansion of early Indo-Europeans into Western and Central Europe.


For an extra donation of $16 USD or $21 AUD, those of you feeling more adventurous will also receive the Global 10 genetic map, and, more importantly, coordinates for ten dimensions.

A fresh look at global genetic diversity

The Basal-rich K7 is the best ancient ancestry test that I've been able to come up with. It correlates strongly with latest research reported in scientific literature. And, in fact, in some instances it probably trumps latest scientific literature.

For instance, Broushaki et al. 2016 characterized Early Neolithic farmers from the Zagros Mountains, Iran, as 62% Basal Eurasian and 38% Ancient North Eurasian-related (Figure S52). This, considering formal statistics like the D-stat below, with AfontovaGora3 (AG3) as the ANE proxy, is unlikely to be correct, despite the fact that AG3 is a relatively low quality sample.

D(Yoruba,Iran_Neolithic)(Villabruna,AfontovaGora3) 0.0223 Z 2.812

On the other hand, the Basal-rich K7 models the early Zagros farmers as 39.05% Ancient North Eurasian and 56.67% Basal-rich (which is probably a composite of Basal Eurasian and something Villabruna-related). To me this appears to be the more sensible solution.

Moreover, Lazaridis et al. 2016 characterized South Caspian forager Iran_HotuIIIb as more Basal Eurasian than the early Zagros farmers (Supplementary Information 4). The Basal-rich K7, on the other hand, shows the opposite. The D-stat below suggests that the Basal-rich K7 is closer to the truth.

D(Chimp,Ust_Ishim)(Iran_Neolithic,Iran_Hotu) 0.0156 Z 1.337

There are other such examples, and I might post them in the comments. In any case, the point I'm making is that the Basal-rich K7 is a solid piece of work and it's likely to remain relevant for a long time. Indeed, I'll be updating the Basal-rich K7 spreadsheet regularly as new ancient samples roll in, which means that you'll be able to model yourself as newly sampled ancient populations using the Basal-rich K7 ancestry proportions (for instance, see here).

The only problem with this test is that it's optimized for Eurasians. As a result, it might be sensible for anyone with significant (>5%) Sub-Saharan ancestry to skip the Basal-rich K7 and just ask for the Global 10 genetic map and coordinates.


You can use the Global 10 coordinates to model your ancient and recent fine-scale ancestry, just as you would using mixture proportions. In fact, I'd say the Global 10 coordinates are more useful in this respect than any mixture test, including the Basal-rich K7.

Thanks in advance for your support. Keep in mind that the more cash I raise the busier things will be on this blog in 2017, which, by all accounts, is shaping up to be the year for ancient DNA.