Search This Blog

Sunday, September 17, 2017

Ancient IBD/cM matrix analysis offer


I've had a few requests from personal genomics customers to stick their files into an Identity-by-Descent/cM matrix like the one at the link below. Also please check out the accompanying comments thread for ideas of what can be done with the output.

A Bronze Age dominion from the Atlantic to the Altai

I can do this for $15 (USD) per individual. Please e-mail the data and money (via PayPal) to eurogenesblog [at] gmail [dot] com. The deadline for sending through the data files (which, in this run, can only be from 23andMe, Ancestry or FTDNA) is this time Tuesday.

I'll send out the results to each participant over e-mail. However, participants are encouraged to post their results in the comments thread at my other blog here, so that they can be discussed and analyzed further.

Update 20/09/2017: The analysis is underway. Please don't send any more data files. If there's enough interest, I'll do another run soon.

Update 22/09/2017: I've just sent out the results to the participants in the form of two text files titled "ancients_only" and "full_column". The former is a matrix of overall shared haplotype tracts in centimorgans (cM) that includes the user and 65 ancient genomes, and the latter a list of haplotype tracts, also in cM, shared between the user and well over 3000 public samples.

So what can we do with these files? For one, we can look at them, because simply eyeballing these sorts of stats can be very informative. Sorting the data in some way and calculating population averages might help with that.

The "ancients_only" file can be used for slightly more advanced analyses. For instance, below is a Neighbor joining graph produced with the Past 3 program (freely available here). I simply loaded my "ancients_only" file into Past 3, selected all of the columns and rows, and then did this: Multivariate > Clustering > Neighbor joining. Note that I cluster on the same branch as Slav_Bohemia, and this makes perfect sense considering my Polish ancestry. By the way, I dropped Oetzi from this run because he was behaving strangely, which is not unusual for low coverage genomes. Click on the image and open in a new tab for a better view.

Indeed, Past 3 can do a lot of interesting things with matrix files; anything from linear models to rotating three dimensional plots. If you'd like to repeat the linear models from my above linked to blog post, then choose the relevant two columns in your matrix and go Model > Generalized Linear Model. You should see something like this.


Moreover, a matrix with the 3000+ public samples can be gotten here and combined, in part or in whole, with your other files so that you can analyze yourself alongside a larger number of individuals.

Sunday, September 10, 2017

Your ancient ancestry #1


This is the first of a series of guides to modeling your ancient ancestry with the Global 10/nMonte2 method.

I do already have a user guide for running Global 10 and Basal-rich K7 data with nMonte and 4Mix (see here). However, in this series I’m going to recommend specific models that produce results similar to those from my experiments with other methods, such as qpAdm, as well as from scientific literature. Hopefully, this will help users achieve more sensible and accurate outcomes, and avoid problems such as overfitting.

Let’s start with models for modern-day Europeans that focus on Yamnaya-related ancestry, which very likely represents a genetic signal of early Indo-European dispersals during the Early to Middle Bronze Age from the Pontic-Caspian steppe.

It’s now clear via a wide range of methods that about half of the genomes of modern-day Eastern and Northern Europeans, and up to about a quarter of the genomes of modern-day Southern Europeans, are derived from such Yamnaya-related sources. Any tests dealing with ancient European substructures that don’t, one way or another, reflect this robust inference must be considered inadequate.

So if my models are to be useful, then this is what they must show. And indeed they do. Here are a few examples focusing on modern-day and ancient England, in chronological order:

England_Iron_Age
Yamnaya_Samara 49.75
Barcin_N 32.3
Hungary_HG 17.95

distance%=0.5318 / distance=0.005318

England_Roman
Yamnaya_Samara 45.65
Barcin_N 33.35
Hungary_HG 21

distance%=0.4668 / distance=0.004668

England_Anglo-Saxon
Yamnaya_Samara 44.95
Barcin_N 31.6
Hungary_HG 23.45

distance%=0.5409 / distance=0.005409

English_Cornwall
Yamnaya_Samara 44.55
Barcin_N 36.95
Hungary_HG 18.5

distance%=0.3699 / distance=0.003699

English_Kent
Yamnaya_Samara 45.2
Barcin_N 36.85
Hungary_HG 17.95

distance%=0.4875 / distance=0.004875

The full output is available in a zip folder HERE. I’m not claiming that these ancestry proportions are perfect, especially for Southern Europeans, who generally have very complex ancestry, but they do make a lot of sense.

One obvious problem with the Global 10 is that some of its dimensions, or PCs, exaggerate affinity between modern-day and Mesolithic Europeans. This is especially true for PC6. Hence, to try and mitigate this problem I decided to remove PC6 from the Global 10 datasheet used in my analysis.

To try these models on your own genome, remove PC6 from your Global 10 coordinates file, and use the data text files provided in the zip folder linked to above. It’s best to rely on the datasheets specifically designed for your ethnic group or region of Europe. But feel free to tweak my models. There’s no harm in experimenting if you’re cautious and sensible about it. Indeed, using Iberia_HG or Loschbour along with Hungary_HG appears to produce more accurate outcomes for many Western Europeans.

The important, but often neglected, point to keep in mind is that I designed the Global 10 to help replicate results from more reliable but technically less accessible methods, and not to challenge any generally accepted models.

In the near future, a wider choice of ancient samples should enable me to fine tune and improve the models. For instance, a slightly more eastern-shifted forager reference population than Hungary_HG, such as the yet to be published Lithuanian Narva samples (see here), will probably shift the results slightly for Northeast Europeans, perhaps by bringing down their Yamnaya-related ancestry proportions by a few per cent.

Moreover, adding a wide range of yet to be published Middle to Late Neolithic European samples, such as those from the Globular Amphora Culture (GAC), should prove an interesting exercise.

Please note that the discussion pertaining to this post is at my other blog HERE.

See also...

Global 10: A fresh look at global genetic diversity