Search This Blog

Loading...

Sunday, August 19, 2012

ADMIXTURE and STRUCTURE tests aren’t formal mixture tests


Update 20/02/2013: Ancient Amerindian-like admixture in Europe - something doesn't add up

...

There are a lot of people out there who believe that programs like ADMIXTURE and STRUCTURE can accurately measure exotic admixtures in their genomes. But this is not true.

If run properly, and with the right reference samples, ADMIXTURE and STRUCTURE do indeed show very high accuracy in classifying the ethnic origins of individuals, even at intra-national level. However, this process relies on finding relative differences between the samples in the given run, based on modern allele frequencies, and doesn’t usually provide true admixture rates.

So the fact that someone gets, say, 100% European and 0% East Asian, doesn’t mean they don’t have East Asian admixture. That’s because the European cluster is unlikely to be purely European, but rather a composite of all the things that make up modern Europeans.

I recently e-mailed David Reich, a well-known population geneticist, asking him to give me some tips on how to find “true” levels of East Asian ancestry in Northern Europeans using the ADMIXTURE software. He was kind enough to reply, but basically said that he couldn’t help because he wasn’t an expert on ADMIXTURE. Also, he said that ADMIXTURE wasn’t a formal mixture test, and thus could easily give false results.

The quote below, from one of David Reich’s studies, Reconstructing Indian population history, explains the concept of the formal mixture test in more detail.

We developed a model to study the historical relationship of Indian groups to those worldwide, on the basis of the hypothesis that most groups can be approximated as a mixture of two ancestral populations followed by group-specific drift. To fit the model to the data, we computed the squared allele frequency difference between all pairs of groups, and chose parameters by minimizing the difference between observation and expectation (Supplementary Note 4). The idea of fitting allele frequency differentiation to historical models was first explored by Cavalli-Sforza and Edwards, and here we extend it to trees with mixture. This approach contrasts with the STRUCTURE algorithm, which fits data without a tree, or a tree in which many groups split simultaneously from an ancestral population followed by mixture. Although STRUCTURE is accurate for estimating individual mixture proportions in recently mixed groups, it is not clear whether its estimates of ancient mixture are biased because it does not model hierarchical relationships among groups, which could lead to inaccurate estimates of allele frequencies in ancestral populations. In contrast, we use a more realistic tree model, and provide a test of fit.

Another recent paper, The History of African Gene Flow into Southern Europeans, Levantines, and Jews, directly compared the results from a formal mixture test to those from STRUCTURE. Note, for instance, the large discrepancy between the Sub-Saharan admixture scores for the Sardinian sample obtained from formal and STRUCTURE tests - 2.9 vs. 0.2 respectively.


Indeed, it seems we can’t really be sure of results from PCA and MDS plots either. Here’s a quote from David Reich’s latest article, Reconstructing Native American population history.

In the Saqqaq genome paper, the authors co-analyzed the data they collected with data from diverse present-day populations from Siberia and the America. Based on the patterns that they observed in Principal Component Analysis, they argued that the Saqqaq have ancestry from a different stream of gene flow into America than Eskimo-Aleut speakers, Na-Dene speakers, and Southern Native Americans. However, this is not a formal test: the failure to cluster together in the first few principal components does not necessarily imply that populations are unrelated; just that they do not share much genetic drift on their common ancestral lineage.

This brings me to the last point, which is that there are some major surprises on the way about the genetic origins and structure of Europeans, because it seems we’ve learned very little from non-formal mixture analyses to date.

That latest David Reich paper mentions that all Europeans carry East/Central Asian admixture, with Northern Europeans having more of it than Sardinians. Remarkably, it also says that unadmixed non-Arctic rather than unadmixed Arctic Native Americans are genetically closer to Europeans, and this is due to the aforementioned Asian admixture in Europeans. The quotes below come from the supplementary information to the study.

A complication in computing this statistic is that Native American, Siberian, and East Asian populations are not all equally genetically related to West Eurasian populations, as we can see empirically from 4 Population Tests of the proposed tree (Yoruba, (French, (East Asian, Native American))) failing dramatically whether the East Asian population is Han, Chukchi, Naukan and Koryak. The explanation for this is outside the scope of this study (it has to do with admixture events in Europe, as we explain in another paper in submission). In practice, however, it means that we cannot simply use a European population like French to represent West Eurasians in Equation S3.2, since if we do this, Equation S3.2 may have a non-zero value for a Native American population, even without recent European admixture.

To address this complication, we took advantage of the fact that east/central Asian admixture has affected northern Europeans to a greater extent than Sardinians (in our separate manuscript in submission, we show that this is a result of the different amounts of central/east Asian-related gene flow into these groups). To quantify this, we computed the statistic f4(San, West Eurasian; Pop1, Pop2) for West Eurasian = Sardinian and West Eurasian = French, and for 24 Siberian and Native American populations (Pop1 and Pop2) (Figure S3.2). Figure S3.2 shows a scatterplot for all 190=20×19/2 possible pairs of these populations. Within non-Arctic Native populations, and within Arctic populations (East Greenland Inuit, Chukchi, Naukan and Koryak), the statistics are close to zero, consistent with their being (approximate) clades relative to West Eurasians. In contrast, there are deviations from zero when the comparisons are between non-Arctic Native and Arctic populations, with non-Arctic Native populations showing consistent evidence of being genetically closer to West Eurasians.

The observation of non-zero statistics when one of the Native populations is Arctic and the other is a more southern Native American population is a complication, since we would like Ancestry Subtraction to work not just for southern Native American populations, but also for northern North Americans who have inherited genetic material from multiple streams of Asian migration. However, the fact that Sardinian statistics are smaller than the French statistics by a constant factor (0.75), allows us to adjust for this difference by regression. Specifically, we can compute a linear combination S2 of the French and Sardinian statistics that subtracts out the effect of central/east Asian gene flow into West Eurasians and has an expected value of zero.


Now, unless I don’t quite get what‘s being said there, it seems as if Europeans mostly carry the type of East Asian ancestry that was present in the first human migration wave from Asia to the New World, which moved across the Bering Strait about 15,000 years ago.

So, did a migration wave from the same source also move into Europe at about the same time? If so, this would indicate that the East Asian admixture in Europeans found by David Reich is very old. Perhaps that’s why it’s not possible to measure it accurately using standard ancestry tools?


Citations…

David Reich et al., Reconstructing Indian population history, Nature, Vol 461|24 September 2009| doi:10.1038/nature08365

Moorjani P, Patterson N, Hirschhorn JN, Keinan A, Hao L, et al. (2011) The History of African Gene Flow into Southern Europeans, Levantines, and Jews. PLoS Genet 7(4): e1001373. doi:10.1371/journal.pgen.1001373

David Reich et al., Reconstructing Native American population history, Nature, Year published: (2012), DOI: doi:10.1038/nature11258


See also...

They had blond hair and light eyes, and came from the north…but they were racially impure


29 comments:

  1. Another recent paper, The History of African Gene Flow into Southern Europeans, Levantines, and Jews, directly compared the results from a formal mixture test to those from STRUCTURE. Note, for instance, the large discrepancy between the formal and STRUCTURE scores obtained for the Sardinian sample - 2.9 vs. 0.2 respectively.

    David, that 2.9 figure of the Moorjani et al. paper (2011) was obtained based on the false assumption that CEU, and NW Europeans in general, are pure Caucasoids. But Dienekes and the soon to be published paper of Reich et al. all show that northern Europeans (whether NE or NW) are clearly more Mongoloid-admixed than southern Europeans. So the real Negroid admixture of Sardinians is much less than 2.9%.

    ReplyDelete
  2. Polako I have two questions
    1. If this Subsaharan ancestry in Sardinians is bigger than previously thought does that mean that it is most likely present in Northern Europeans too in an analogy that ancient Asian ancestry is present in all Europeans but that it is allegedly higher in Northern Europeans?
    2. Will you be able to make calculator based on this formal tree method and when if so?

    ReplyDelete
  3. 1. I haven't seen any evidence of autosomal Sub-Saharan African admixture north of the Alps and Carpathians. But there are Sub-Saharan mtDNAs floating around there, so maybe there is some autosomal stuff as well?

    2. Nope, I can't make a calculator out of a formal mixture test. But the software for formal tests will be available soon online, so if you learn to use the public datasets that are also online, you can probably run your own data in such a test.

    ReplyDelete
  4. If this Subsaharan ancestry in Sardinians is bigger than previously thought does that mean that it is most likely present in Northern Europeans too in an analogy that ancient Asian ancestry is present in all Europeans but that it is allegedly higher in Northern Europeans?

    The relatively high Negroid ancestry estimates of the Moorjani et al. paper (2011) for Sardinians and southern Caucasoids in general were shown to be false by Dienekes and, more importantly, by the recently published Patterson et al. paper (2012), which was written by some of the same authors as the Moorjani et al. paper (2011), including Moorjani herself.

    ReplyDelete
  5. The relatively high Negroid ancestry estimates of the Moorjani et al. paper (2011) for Sardinians and southern Caucasoids in general were shown to be false by Dienekes and, more importantly, by the recently published Patterson et al. paper (2012), which was written by some of the same authors as the Moorjani et al. paper (2011), including Moorjani herself.

    I am talking about things written on this page.
    Did Moorjani et al. used the method described by Reich as written on this page?

    ReplyDelete
  6. Here's the only quote from the Patterson et al. paper about the Moorjani et al. paper.

    "There is some modest level of sub-Saharan (probably west African-related) gene flow from Africa into Sardinia as is shown by analyses in MOORJANI et al. (2011), but no evidence for gene flow from the San (Bushmen) which is indeed historically most unlikely."

    BTW, Moorjani was a co-author on the Patterson et al. paper. It doesn't seem to me that he was correcting himself there or anything.

    ReplyDelete
  7. I am talking about things written on this page.

    The relevant things written on this page depend on the false assumption that the conclusions of Moorjani et al. (2011) are valid and trustable.

    Did Moorjani et al. used the method described by Reich as written on this page?

    For detailed information about the method used by Moorjani et al. (2011), see:

    http://dienekes.blogspot.com/2011/04/sub-saharan-admixture-in-west-eurasian.html

    Here's the only quote from the Patterson et al. paper about the Moorjani et al. paper.

    It is a passing reference and does not provide any numbers. If the conclusions of the Moorjani et al. (2011) were trustable, Patterson et al. (2012) would almost certainly refer to its conclusions more openly and treat them accordingly. They don't because its conclusions were clearly demonstrated to be invalid by the results of the Patterson et al. paper (2012). The results of Patterson et al. (2012) clearly demonstrate that Sardinians are much less Negroid-admixed than Moorjani et al. (2011) claim. If some of the authors of the Patterson et al. paper (2012) were not also co-authors in the Moorjani et al. paper (2011), I think they would openly declare that they demonstrated the falsity of its conclusions, because that is what the results of the Patterson et al. paper (2012) clearly do. Subjects of the Moorjani et al. paper (2011) need a re-examination, and I expect such a re-examination in the near future from some of the same people who co-authored it and the Patterson et al. paper (2012).

    ReplyDelete
  8. The results of Patterson et al. (2012) clearly demonstrate that Sardinians are much less Negroid-admixed than Moorjani et al. (2011) claim.

    and also southern Caucasoids in general

    ReplyDelete
  9. Thanks for the info Onur but it seems strange that a blogger can debunk analysis of respected researchers.

    ReplyDelete
  10. I am saying that because Reich is still a proponent of a tree method.

    ReplyDelete
  11. Petar,

    As I have been trying to explain to you on this thread, it is not only Dienekes the blogger who debunked the conclusions of the Moorjani et al. paper (2011), but, more importantly, also the Patterson et al. paper (2012). In both papers Reich, Moorjani and Patterson are co-authors, so by co-authoring the Patterson et al. paper (2012) they actually debunked themselves. The methods applied in the Patterson et al. paper (2012) are much more appropriate for racial admixture analysis than the methods of the Moorjani et al. paper (2011). In the Moorjani et al. paper (2011) they made the mistake of treating NW Europeans as if they are pure Caucasoids. But in the Patterson et al. paper (2012), just like Dienekes, they demonstrated that all northern Europeans are actually ancient Mongoloid-admixed and clearly more so than southern Europeans and especially West Asians. This enabled much more accurate estimation of the amount of Negroid admixture and, as a result, Patterson et al. (2012) demonstrated that southern Caucasoid populations have much less Negroid admixture than the amounts claimed in the Moorjani et al. paper (2011). Please read both papers and see the differences for yourself, they are both free access.

    ReplyDelete
  12. ^ So, Onur, what was the difference in Sub-Saharan African admixture for Sardinians reported by Moorjani et al. 2011 vs Patterson et al. 2012?

    ReplyDelete
  13. So, Onur, what was the difference in Sub-Saharan African admixture for Sardinians reported by Moorjani et al. 2011 vs Patterson et al. 2012?

    Patterson et al. (2012) refrain from giving a certain value for the amount of Negroid admixture for Sardinians or any other population. But all the values that are found in their population tests that can be used as the amount of Negroid admixture in the Caucasoid populations tested are lower than the ones proposed by Moorjani et al. (2011). According to their results, southern Caucasoids clearly have much less Negroid admixture than the values proposed by Moorjani et al. (2011). Of course, the amounts of Negroid admixture in various Caucasoid populations must be further clarified for more precision. That is why in one of my above posts I wrote:

    "Subjects of the Moorjani et al. paper (2011) need a re-examination, and I expect such a re-examination in the near future from some of the same people who co-authored it and the Patterson et al. paper (2012)."

    ReplyDelete
  14. Onur,
    You are saying that tree method is an accurate method but they got their results based on wrong assumption?

    ReplyDelete
  15. So the final conclusion that Northern Europeans have 2-3 ancient East/South Asian admixture and that Sardinians do not have
    Is it possible that none of that is correct?

    ReplyDelete
  16. You are saying that tree method is an accurate method but they got their results based on wrong assumption?

    Moorjani et al. (2011) used the right programs but used them in a wrong way as a result of wrong assumptions.

    Patterson et al. (2012) used the right programs and used them in a right way as a result of right assumptions.

    So the final conclusion that Northern Europeans have 2-3 ancient East/South Asian admixture and that Sardinians do not have
    Is it possible that none of that is correct?


    The ancient Mongoloid admixture in Europeans detected by both Patterson et al. (2012) and Dienekes is from a northern Mongoloid (probably Siberian) source according to the results of both Patterson et al. (2012) and Dienekes. This is perfectly in line with the fact the ancient Mongoloid admixture peaks in NE Europe and only slightly diminishes as one moves to NW Europe but significantly diminishes as one moves to southern Europe and completely or almost completely disappears when we arrive in West Asia and Sardinia.

    ReplyDelete
  17. Northern Europeans have around 10% North Eurasian admixture. That's what the Patterson et al. paper basically found.

    ReplyDelete
  18. Northern Europeans have around 10% North Eurasian admixture. That's what the Patterson et al. paper basically found.

    More correctly, NE Eurasian admixture, as it is Mongoloid, and specifically Siberian/Amerindian type Mongoloid. This has huge implications for the estimations of Negroid admixture in Caucasoid populations. Northern Europeans (whether NW or NE) cannot be used as a proxy for pure Caucasoids anymore, and the relatively high Negroid admixture estimations of Moorjani et al. (2011) for southern Caucasoid populations cannot be true. BTW, I should add that the admixture estimation techniques have improved since the days of the Moorjani et al. paper (2011), and so Patterson et al. (2012) had the advantage of employing these newer techniques.

    ReplyDelete
  19. This is perfectly in line with the fact the ancient Mongoloid admixture peaks in NE Europe and only slightly diminishes as one moves to NW Europe but significantly diminishes as one moves to southern Europe and completely or almost completely disappears when we arrive in West Asia and Sardinia.

    NW and Central Europe so far showed 0% of any Mongoloid admixture when compared to NE Europe. Why do you say "only slightly" even if we take Patterson et al in consideration? What are percentages according to Patterson et al?

    Northern Europeans have around 10% North Eurasian admixture. That's what the Patterson et al. paper basically found.

    Well this is quite a news. So did Patterson et al found that Northern Europeans have additional 10% Syberian admixture? Does that mean that Finns for example are more than 7% Syberian?
    Did not Dionekes said that it is around 2-3%?
    Does this means that calculators on Gedmatch are worthless?

    ReplyDelete
  20. Will from now on cluster of European nations look different?
    I know I ask silly questions but please be kind to answer.

    ReplyDelete
  21. Perhpas we should wait for yearor two before they make another calculator which will show completely different data again :)

    ReplyDelete
  22. NW and Central Europe so far showed 0% of any Mongoloid admixture when compared to NE Europe. Why do you say "only slightly" even if we take Patterson et al in consideration? What are percentages according to Patterson et al?

    Petar, from your words, I understand that your knowledge of population genetics is pretty poor. You should first learn how to read the results. Anyway, Patterson et al. (2012) and Dienekes' and Razib's blogs have all the answers to your questions.

    ReplyDelete
  23. Onur, indeed I do not have and that is the reason of me asking all this "obvious" questions relative to informed people.
    I am simply asking from where does the figure 10% come from because I thought we are talking about 2-3% not 10%. Dionekes said that Northern Europeans are 2.5% shifted towards CHB.

    ReplyDelete
  24. ^ Let me correct myself, Europeans have around 10% North Eurasian admixture, but that rises towards the North and Northeast, peaking in Finns and North Russians at over 20%.

    However, the results for Finns and North Russians are probably confounded by more recent Uralic admixture.

    In any case, I don't think precise figures for these sorts of ancient admixtures are important, because they're so old and hard to measure. Most important is the fact that we know they're there.

    Recent admixtures are easier to measure, and more relevant to how we view ourselves in terms of ethnicity. So that’s what I’d focus on as a personal genomics customer.

    ReplyDelete
  25. Petar,

    The low Mongoloid admixture figures for Europeans that you are referring to were all obtained with less adavanced tools like ADMIXTURE and STRUCTURE, which are not good at detecting ancient racial admixtures. More advanced tools like ADMIXTOOLS and TreeMix are much better at detecting ancient racial admixtures, and as a result, with these tools we are now estimating the amounts of racial admixtures much better than we previosly did or thought we did with ADMIXTURE and STRUCTURE. With the advanced tools, we now know that Europeans have much more Mongoloid admixture than ADMIXTURE and STRUCUTRE can detect.

    ReplyDelete
  26. Onur,
    I guess we figured out that a long time ago and my question which was related to you answer was asked in a relative terms when we compare big differences in Eurasian admixture because I found it strange that Southern Europeans have zero Northeurasian admixture and that Northern have 10% but as you can see Polako corrected himself and now things start to make sense for me.

    Polako,
    Taking in consideration your last post indeed it does not change anything because everything stays the same relative to numbers and after all the good news is that Sardinians and other specific southern Europeans are much less admixed ( or none ) admixed.

    ReplyDelete
  27. ....less admixed with subsaharan Africans that is....:)

    ReplyDelete
  28. ^ Yes, it's awesome, I'm actually planning to move to Sardinia eventually so I can enjoy being surrounded by almost pure Neolithic farmers.

    ReplyDelete
  29. I guess we figured out that a long time ago and my question which was related to you answer was asked in a relative terms when we compare big differences in Eurasian admixture because I found it strange that Southern Europeans have zero Northeurasian admixture and that Northern have 10% but as you can see Polako corrected himself and now things start to make sense for me.

    David just gave the European Mongoloid admixture average roughly speaking. But, even as he acknowledges, there is a significant difference between northern Europeans and southern Europeans in terms of the amount of the Mongoloid admixture. Southern Europeans have much less Mongoloid admixture than northern Europeans on average. As a general rule, southern Europeans have less than 10% Mongoloid admixture while northern Europeans have more than 10% Mongoloid admixture (being as much as more than 20% in some NE European populations). French and Bulgarians, as expected, are transitional in this regard and have a Mongoloid admixture of about 10%, the European average.

    ReplyDelete