Search This Blog


Sunday, February 26, 2012

Genetic substructures within the HapMap CEU sample (and Eurogenes' Northwest Europeans)

In this experiment I attempt to characterize more precisely the origins of some of the individuals from the HapMap CEU cohort. These samples are described by the HapMap project as Utah Americans of Western and Northern European descent. But this doesn't seem to be exactly true for at least two of them, who actually come out very Central European in all my tests. Moreover, it's obvious that some of the samples fit nicely into very specific areas of Western and Northern Europe. For instance, at this level of resolution, a few could pass as Irish, and others for Danes or even Swedes. Below is a quick and dirty ADMIXTURE analysis designed specifically for this experiment.

Key: Red = Sub-Saharan African, Yellow = Southern European, Green = North-Central European, Aqua = North Atlantic, Blue = Baltic, Pink = East Asian. See spreadsheet for details.

Based on the K=6 results it's fair to say that at least six of the CEU samples might pass for unmixed Scandinavians, most likely Danes or southern Swedes (NA12003, NA12057, NA12248, NA12249, NA12776 and NA12875). At least five could be confused for Irish or western British samples (NA10850, NA12005, NA12006, NA12386 and NA12812). The two Central European-like Utahns stick out from the CEU set due to their unusually high Baltic scores (NA11917 and NA12286). From the little I know about the CEU samples, I'd say that these two were of eastern or southeastern German origin. But they might have fairly recent ancestry from further east than that. My own MDS analysis (first image below) and a PCA plot from Lao et al. 2008 (second image, slightly edited by me to remove article text) confirm that such Scandinavian-like, German-like and Irish-like individuals do exist in the CEU set.

I think this experiment is very useful for a number of reasons. Firstly, it shows that the CEU set is not a homogeneous one, and carries clear substructures that can be picked up via fairly basic means. However, this doesn't make the CEU samples less valuable, but more so, due to the lack of public access to continental Northwestern European samples. Secondly, the test reveals some interesting information about the genetic substructures within Northwestern Europe. Here are some of my observations:

- Scandinavians often show very high levels of the North-Central European component, and moderately high levels of the North Atlantic component. Many also carry clear amounts of the Baltic component, but, as a rule, lower levels of the Southern European component.

- Germans mainly differ from the Scandinavians in that they carry the Southern European component at appreciable amounts. They show variable amounts of the Baltic component, with those from eastern Germany carrying the highest levels.

- Irish project members, especially those from western Ireland, show very high levels of the North Atlantic component, but low levels of the Southern European component.

- Western British samples, like those from Cornwall or western Scotland, are generally very similar to the Irish, mainly in that they carry the North Atlantic component at high levels. However, they often show somewhat higher levels of the Southern European component.

I'm eventually going to test these classifications of the CEU samples with ChromoPainter, which is by far the most accurate tool for such things at the moment. Unfortunately, it's also a lot of hard work and computationally intensive, so it might take a few weeks. I do have the allele frequencies from the above ADMIXTURE run, and it is possible to make a stand alone test from them. However, I'm not certain that's a good idea at present, due to the small number of samples involved. It might be worth doing when the right samples swell in number, so I can run a more robust analysis. In particular, I need more people from Ireland, Scotland and Scandinavia.


Oscar Lao et al, Correlation between Genetic and Geographic Structure in Europe, Current Biology, Volume 18, Issue 16, 1241-1248, 26 August 2008, doi:10.1016/j.cub.2008.07.049


LeviZoe said...

As an actual "CEU" who is familiar with the history of many "CEU's", I can tell you with certainty that there are definitely Scandinavian, Scottish and Irish CEU's. I have examples of all of those genetic types in my DNA profile. I don't think it is a big secret, either. Most early Mormon converts were culled from Denmark and England. Others entered the gene pool by virtue of being early Americans. I'm aware of some early Italian conversions as well.

My husband is also ethnically a CEU, and also has ancestors from all over the UK (including Welsh) as well as Denmark. Looking at the people in our state, I would say the Denmark theme is not underrepresented. We both did the Dodecad test -- I test as 88.3% CEU & the remainder Balkan, as my best result. My husband tests at about 88.4% CEU and the remainder tests as Portuguese as the number one probability. Both of our families come from early Mormon stock. I am actually also related to the founder. My mother probably fits the ethnic profile somewhat, but her ancestors aren't so 'Mormon pioneer-y' as the rest.

What I'm curious about is the ethnic minorities which show in your graph. They are so tiny as to almost question their legitimacy. Part of my quest in learning about my DNA has been to search out those small variables. In my own results, a definite African segment appears and must be associated with an original CEU, because it comes from my dad. Given the well-documented racial (and/or racist, depending on your point of view...) nature of the Mormon church, with a clear deference to whiteness, the little segments of minority ancestry become more significant and beseech an explanation.

I'm particularly curious about Native American segments that would surface, due to the doctrinal assertion that Native Americans are descendants of Jews. Of course I'm not trying to place any causal link between the two groups myself. Just curious about what actually is present in real-life CEUs (as in, the ones who live in Utah).

Larry said...

I am curious as to why you have selected the CEU group in the US to study? Why are they of interest above say Mennonites, Amish, Quakers or other sociolo-religeous groups? Is it because the CEU are more diverse or mixed than these other groups and really represent a cross-section of American immigrants as they were distributed along the migration route from NY to Utah in the mid 19th century? It seems the selection of individuals into the movement was more or less randomly drawn from the general population of the times. However, there were probably some biases due to the ethnicities of the original group members.