One of my more enthusiastic project members has come up with an idea of putting us all on a map, so we can compare our results with geography in mind. Thanks NO1! You'll need a Gmail account to include your profile on the map. And please base the location on your ancestry (averaged out if you're mixed) rather than place of residence.
View Eurogenes BGA Participant map in a larger map
Wednesday, May 18, 2011
Friday, May 13, 2011
This week saw the release of a large paper on the genetics of the Sorb ethnic minority in Germany (Veeramah et al. 2011). It was an okay effort, and only okay, due to the patchy sampling strategy (see here). However, it included an interesting analysis of some of the HGDP groups, which are often used in ancestry projects as references to flesh out so called "ancestral components". I've actually become very skeptical of using some of these HGDP sets for such purposes because of their quirky behavior in various comparisons. Some of them, like the French Basques, Sardinians, and Orcadians, often act as if they were extreme isolates within Europe, and possibly suffering from a lack of outbreeding or extreme genetic drift.
This can be a huge problem, because it can skew results for other groups. For instance, the isolates might create clusters that don't represent anything but their own peculiar allele frequencies, which can then dominate analyses. This is my observation anyway, but those who have a lot of experience with ADMIXTURE or PLINK will no doubt notice the same. Here are a few useful quotes from the Veeramah et al. paper.
As may be expected, the French Basque and Sardinians show evidence of substantial genetic isolation. The Sardinians, closely followed by the Basque, had extremely high pairwise FST values compared with all other populations, with the minimum FST with their geographic neighbors being 0.0053 (SD vs IT) and 0.0047 (BSQ vs ES), respectively (Supplementary Table 1).
The MAF spectra (Supplementary Figure 6), although highly distorted because of SNP ascertainment, also show the Sardinians and Basques to have a noticeable excess of monomorphic SNPs. This excess suggests that some SNPs that are polymorphic in Europe may have been driven to extinction/fixation at a higher rate or never existed at all in these populations, consistent with genetic isolation. They were also both clear outliers with regard to the number of homozygous segments detected (Figure 3c).
PCA analysis using the POPRES/HGDP merge (Figure 2b) showed both the Basque and Sardinians to be highly differentiated from all other Europeans, with clearly discrete median bootstrap distributions (Figure 2d).
The Orcadians are almost completely indistinguishable from individuals from the United Kingdom and Ireland.
Analysis of the Sorbs and Orcadians did indicate some subtle signals of genetic isolation, albeit not to the same extent as the Basque and Sardinians.
The Sorbs were relatively unremarkable with regard to ROH (Figures 3a and b), but the Orcadians had the highest median average length of individual ROH segments (Figure 3d), suggesting a very recent history of inbreeding.
I agree with all of this, except I would urge the authors to look at the behavior of most of the Orcadians on PCA-MDS plots in dimensions other than 1&2. They form an elongated cluster in their own space and are easily distinguishable from their neighbors from Britain and Ireland. Hence, I generally only use 5 or 6 of the Orkney samples; those that stick closest to the Irish and Brits across all dimensions. Anyway, below is the bootstrap analysis (figure d) mentioned in the quote above, along with the PCA it was based on. I'm not familiar with this methodology, but it appears to be an attempt to minimize the effects of a lack of samples from some groups, like those from Northern and Eastern Europe. BSQ = French Basques, SD = Sardinians, OR = Orcadians.
Krishna R Veeramah et al., Genetic variation in the Sorbs of eastern Germany in the context of broader European genetic diversity, European Journal of Human Genetics advance online publication 11 May 2011; doi: 10.1038/ejhg.2011.65
Sunday, May 1, 2011
Believe it or not, samples occasionally get mixed up in studies. Zack A. of Harappa recently listed good examples of such cases in the Behar et al. dataset (see Behar Paniya). I believe I found a similar problem in the Rasmussen et al. collection, with at least two fully European Russians being passed off as native Siberians. The samples are Koryak GSM558848 and Chukchi GSM558866. Based on detailed analyses of this pair, I suspect that the first is of Western Russian origin, and the second from a different part of Russia, possibly further north. For instance, let's take a look at some Identical by State (IBS) Z scores, based on 238K SNPs, starting with the Koryak:
vs. Rasmussen Koryaks
Koryak GSM558852 -2.312
Koryak GSM558854 -3.164
Koryak GSM558845 -3.211
Koryak GSM558850 -3.281
Koryak GSM558849 -3.291
vs. Eurogenes and Behar Russians
RU12 RU12 0.6699
RU GSM536913 1.001
RU GSM536914 0.6719
RU18 RU18 0.9004
RU16 RU16 1
The putative Koryak sample is matching his top five Koryaks at 2 or 3 standard deviations below the Koryak mean. What this shows is that this individual is a genetic outlier from the group, and at the very least harbors significant non-Koryak ancestry. However, the same sample is also matching Russians better than the Russians are matching each other on average (note the positive Z scores). The alleged Chukchi sample shows similar behavior.
vs. Rasmussen Chukchi
Chukchi GSM558867 -2.716
Chukchi GSM558872 -2.583
Chukchi GSM558877 -2.557
Chukchi GSM558870 -2.981
Chukchi GSM558878 -2.978
vs. Eurogenes and Behar Russians
RU15 RU15 0.5433
RU17 RU17 0.9265
RU GSM536914 0.9759
RU11 RU11 0.6654
RU GSM536913 0.394
An MDS analysis of these individuals against French, Yakut and Buryat samples shows clearly that they're not of Siberian origin. They cluster with the French.
In other experiments using ADMIXTURE, I noticed that the Koryak showed considerable "Southern Baltic" influence, which is typical of Western Russians and Belorussians. Indeed, this sample even exhibited relatively high affinity to Northwestern Europe in some comparisons. I've co-opted both of the putative Siberians into my intra-North European analysis as RU21 and RU22 respectively (see here). Anyone looking for Russian samples significantly different from the Vologda set offered by the HGDP should also consider using these two. They're a valuable addition to any project focusing on genetic diversity in Eastern and Northern Europe.