Search This Blog

Loading...

Thursday, October 20, 2011

Erroneous results from Dodecad (aka. Dienekes)


A while back, Dienekes welcomed "peer review" of his work, which I thought was very commendable. I recently spotted a serious error in his analysis, and let him know about it over at his blog. I was hoping to see a correction, and also an admission that his methodology was faulty. Unfortunately, this hasn't happened to date, so I thought I'd describe the problem in detail here.

In the blog entry Yunusbayev et al. (2011) data assessed with Dodecad v3, Dienekes analyzed samples with ADMIXTURE in "supervised" mode using allele frequencies obtained from a run that didn't include these samples. He posted the results in a spreadsheet, which can bee accessed here.

Obviously, my area of interest is the genetic ancestry of Poles, other Balto-Slavs, and nearby populations. So it only took me a matter of seconds to notice that something was off about the results for several of these groups. For instance, Poles are listed in the spreadsheet as 34.5% West European, and 44.3% East European. On the other hand, the more easterly Ukrainians show 38.5% West European, and only 31.5% East European. Also, the Mordvinian sample from near the Volga scores 38.1% West European, and only 32.5% East European.

The first port of call when checking the validity of such results is to see whether they gel with geography. Clearly these results don't. So either something isn't right, or there are factors that work against the general rule of genes = geography. When I alerted Dienekes of these seemingly implausible figures, he was in favor of the second scenario. His reply was as follows:

Ukrainians' higher West/east European ratio makes perfect sense as it is transitional to both the Caucasus (where there are even higher such ratios) and to the Balkans. Their ratio is exactly what one might expect from their geographical position vis a vis. Russians, Belorussians, and Balts, ie. , populations with a high E/W ratio.

Mordvins are also in line with other Uralic populations (Finns, Selkups) in having an inverted European ratio relative to Balto-Slavs.

Err...no, the results don't make perfect sense. They make no sense at all. There's no way these Ukrainians can be described as transitional to the Balkans and the Caucasus compared to Poles, even if the term is used very loosely. Below are two MDS plots. The first one shows that the same Ukrainians (UA) used by Dienekes do not cluster closer to the Balkans than Poles do (PL), and only barely closer, on average, than the Belorussians. The second plot shows that Ukrainians (UA), Poles (PL) and Belorussians (BY) are all about the same distance from the Caucasus.








In theory, it's possible to argue that the plots above produced different results to Dienekes' analysis because they used only the two most significant dimensions of genetic variation. On the other hand, ADMIXTURE works in a very different way, and so can reveal details past the first two dimensions. But that would be a stretch, because generally speaking, when a population appears to be transitional between two others in an ADMIXTURE run, such results are often very easily reproduced with MDS/PCA plots.

Moreover, I've actually analyzed the same and similar samples with ADMIXTURE and have been unable to reproduce Dienekes' results. In other words, as per geography, Ukrainians are less Western European than Poles, and more Eastern European. This shows up in my latest Eurasian K=10 run (see here), where, on the balance of all the components, the Ukrainians and Mordvinians are more Eastern than Poles.

Below are two PCAs, the first one shows the bizarre results produced using data from Denekes' spreadsheet, with Mordvinians clustering with Ukrainians and Hungarians along Component 1. The result is more reliable along Component 2, because that seems to be picking up North Eurasian admixture in the Mordvinians and Russians, which is much lower in Hungarians, Poles ad Belorussians. The second plot is based on my K=10, and shows a more expected result all round, with the Mordvinians lining up with both Russian samples (RU and North Russian) along Component 1, and also very close to the North Russians along Component 2. They also cluster with the same North Russians in Yunusbayev et al., rather than with the Ukrainians.








A whole range of PCA plots can be produced using the data from the supervised Dodecad V3 and my Eurasian K=10, in which the former results look at least a little out of whack with reality, while the latter appear as expected.

Interestingly, Dienekes' new
euro7 analysis supports the results obtained by me. In this experiment, the same Ukrainians and Mordvinians were used in the initial run that set up the clusters, and came out amongst the most Northeastern European and least Northwestern + Southwestern European samples on the sheet. Now that makes perfect sense.

So what happened? Are these euro7 components different enough to make the results better match geography? Yes, they're a lot more in tune with reality due to a higher quality dataset, with more samples from key areas of Europe and Caucasus. However, it's also clear that the supervised analysis produced erroneous results. It's obvious that it's not always possible to correctly analyze samples with allele frequencies from ADMIXTURE runs in which they were not included, especially versus those that were.

Now that the sampling is better, Dienekes' euro7 shows the previously mentioned Uralic Selkups to have a higher level of membership in the cluster that peaks in Balto-Slavs, than in those which peak in Northwestern and Southwestern Europeans. This is obviously a turn-around from his Dodecad V3 result. So which is correct? Strictly speaking, they're both correct, because the components that form in ADMIXTURE runs are dependent on the allele frequencies in the dataset used, and the number of K (clusters) set by the user. These clusters might peak in different groups depending on the dataset, but the results will usually make pretty good sense in relative terms. Indeed, on the balance of their overall results, across all the ancestral components in the V3 and euro7, the Selkups don't appear very different. They cluster in generally the same area relative to the other samples. See, for instance, their positions on two PCAs based on the V3 and euro7. So unlike the supervised results, it's not possible to outright declare the unsupervised Dodecad V3 results as erroneous.

However, I would say that the appearance of such a dominant Western European-based cluster as seen in the V3 is, at the very least, surprising. For instance, why would the Siberian Selkups carry more allele frequencies that appear Western European than Eastern European? The Uralic theory proposed by Dienekes really doesn't seem plausible. I don't know how many times Dienekes repeated his experiment to see if the results were stable, but scientists often run their experiments as many as 100 times each, and then publish the most consistent results.

If Dienekes obtained those results from multiple runs, and it was a stable effort, then that's fine. However, the Western European-based cluster still looks unusual enough to treat it with great caution. Suffice to say that it's not something that can be reliably used to theorize about the peopling of Europe, or the genetic ancestry of linguistic groups, like the Uralics. Dienekes did this, which I thought was very naive of him. But it was even more naive of many people to take his musings seriously. I don't believe that he'll ever be able to produce similar results with his updated dataset (like the higher West/East European ratio in the Ukrainians, Mordvinians and Selkups).

Obviously, there's nothing wrong with experimentation. That's what science and genome blogging are all about. We're not just here to provide a genetic ancestry service, but also to try and unravel mysteries that are taking scientists years to get around to via the convoluted peer review system in journals. Mistakes will happen, because boundaries are being pushed, but these mistakes have to be corrected.

Update: Dienekes attempts to strike back...and trips up again


15 comments:

  1. Interestingly, Dienekes' new euro7 analysis supports the results obtained by me.

    euro7 appeared two weeks before your K=10, so it is rather backwards

    http://dodecad.blogspot.com/2011/09/euro7-calculator.html

    However, it's also clear that the supervised analysis produced erroneous results. It's obvious that it's not always possible to analyze samples with allele frequencies from ADMIXTURE runs in which they were not included.

    Congratulations on discovering the difference between training and test data. Not a very exciting observation, however, that allele frequencies of inferred components will change when additional individuals are used to train the model: it would be surprising if they didn't.

    PS: I have already covered what happens and why when the model is trained on different sets of data in detail here:

    http://dienekes.blogspot.com/2011/10/further-caution-on-admixture-estimates.html

    Thanks for the free publicity!

    ReplyDelete
  2. Make a correction to your spreadsheet.

    In fact, you should probably remove it, because you're just providing erroneous data. What's the point?

    ReplyDelete
  3. What I'm seeing is that few people are really prepared to question Dienekes and test his work, no matter what he says or does. This really needs to change, because he now has a lot readers across the globe. Many of these people consider him an authority on human genetics and physical anthropology, and take his word over that of scientists being published in major journals.

    Both Dienekes and you are amateur in genetics and physical anthropology; both of you are no authority in those disciplines. What is valuable in your and Dienekes' blogs is that 1) both of you report results of recent academic papers on human genetics and physical anthropology, and 2) both of you make your own research on human genetics using a large set of samples and populations that we cannot find in academic papers so often and in very diverse topics and geographical areas. I myself do not take Dienekes' and your conclusions and comments - whether for academic publications or for your own research - at face value and I always read them with critical eyes. I have many times criticized Dienekes' methodology (including sampling) in his research.

    ReplyDelete
  4. In fact, you should probably remove it, because you're just providing erroneous data. What's the point?

    The point is complete transparency.

    ReplyDelete
  5. well, most people don't have the time/ability to replicate unfortunately :-( i had higher hopes that other genome bloggers would keep appearing, but that hasn't happened....

    ReplyDelete
  6. whoever who messes with Dienekes the Spartan must die. Eurogenes is a tendentious tool to favour Poles...LAMEEEE!

    ReplyDelete
  7. I am no expert on Admixture runs or the various datasets used in the runs. I just want to put this layman's opinion into the ring of contention.

    I severed my connection with Mr. D after his K=10 run, so I am not in his K=12 run, but I can see the results for the K=12 run are anomalous. You are an expert with Balto-Slavs, and a little with the "Finns". I consider the K=12 run anomalous for Greeks, South Italians/Sicilians and Ashkenazim Jews. Something smells off in the Plaka. A bit of genetic legerdemain to make Greeks less like South Italians/Sicilians, Anatolian Turks, Levantine Arabs and Ashkenazim Jews. More ryebread and less falafel. The Oracle usage of his K=12 runs is similarly all smoke and mirrors. I would prefer reality to artificially trying to blanch Greeks.

    ReplyDelete
  8. This discussion isn't about who's ADMIXTURE runs are better, or more useful or whatever.

    It's about a problem that no one was aware of until recently, including me.

    It's a serious problem, and I think it's actually quite good that another genome blogger (ie. me) raised the flag, and not some scientific ethics committee somewhere. It'd be a bit crap if we all got shut down, or denied access to data, due to an honest mistake.

    ReplyDelete
  9. Exposing that mule's stench about Ukrainians. Someone had to do it. Great job.

    ReplyDelete
  10. Ukraine fell under the dominance of the Chernyakhov culture. Also it has been suggested by Jean Manco that the Basques came from the Cucuteni-Tryptillians, most of which were based in western Ukraine.

    http://dna-forums.org/index.php?/blog/2/entry-186-basques-from-cucuteni/

    Also, didn't your past run also show Ukrainians to have unusually high "north European" and "Atlantic" scores compared to the surrounding groups?

    http://img189.imageshack.us/img189/4875/germanen.gif

    ReplyDelete
  11. **ATTENTION**

    This is not about Ukrainians. It's about the fact that wrong methodology was used.

    Even if that wrong methodology showed correct results (which it actually didn't in this case), it still shouldn't be used, because it's wrong.

    What Dienekes did can only be done if there's a disclaimer present saying that unsupervised/supervised groups can't always be compared to each other. But Dienekes didn't provide such a disclaimer, and even denied there was any issue when I first alerted him about the strange results in his spreadsheet.

    How hard is that to understand?

    ReplyDelete
  12. "Also it has been suggested by Jean Manco that the Basques came from the Cucuteni-Tryptillians, most of which were based in western Ukraine"..

    Cucuteni-Trypollie was based in the Moldavias (and nearby parts of Ukraine, but not most of it, as well as Transylvania). CT was a Danubian culture on local Neolithic and Paleolithic substrate.

    In any case, Jean's conjecture is a total nonsense. While she makes a great job keeping the ancient DNA database, her prehistory speculations are untenable in almost all cases, both archaeologically as genetically.

    ReplyDelete
  13. @ Maju

    Yeah her website is great for the listing of ancient DNA. Her ideas and other parts of site are quite shite like your ideas on BAsques.

    ReplyDelete
  14. Ukraine fell under the dominance of the Chernyakhov culture - Actually if it wasn't for the Turkic nomads, likely the Goths would of withstood the Slavic expansion, like Romania/Moldova and we would have a Germanic nation in eastern Europe.

    Also, didn't your past run also show Ukrainians to have unusually high "north European" and "Atlantic" scores compared to the surrounding groups?

    http://img189.imageshack.us/img189/4875/germanen.gif

    ReplyDelete
  15. These Ukrainian samples are very Eastern European. This shows very clearly in all correctly run comparisons.

    Only a complete hack job on their genomes can make them look significantly western.

    Look at Dienekes euro7 results. They're correct, because they're unsupervised.

    ReplyDelete