Model-based genetic ancestry estimation algorithms, like ADMIXTURE and STRUCTURE, have some limitations. As I see it, their main problems are that they a) can't tell the difference between more recent admixture and very ancient clines in genetic diversity b) can't pick up admixture that comes in the form of one or two very small segments, and c) don't show where in the genome the admixture is found.
So today I'd like to offer a solution; a simple and lightweight, but very clever program combo called RHHcounter/RHHmapper (see bottom of the post for details).
Imagine, for example, a white American carrying a couple of tiny segments of West African origin, from an ancestor who lived 250 years ago, and an eastern Finn with no Asian ancestors in the last 4000 years or more. If we run an inter-continental ADMIXTURE analysis with these two, it's very likely the American will score 100% European, while the eastern Finn will probably come out around 9% North and East Asian due to really old Uralic influence.
That sort of thing isn't an issue when comparing the genetic structure of populations, including their ancient admixtures. Eastern Finns are indeed genetically closer to North Asians compared to white Americans, and that's basically what ADMIXTURE is picking up on. However, if the focus is on the individual, this is likely to be a problem. Our hypothetical American might be aware of that African ancestor, with solid paperwork backing up their genealogical connection, but he's pulling his hair out because nothing's showing up via genetic tests.
So let's take a look at a real life example of how RHHcounter can pick up segments of potentially recent Sub-Saharan African origin...
I put together a data set of over 350 samples that showed less than 2% non-West Eurasian influence in various ADMIXTURE analyses, and clustered in or very near Europe on MDS plots. I then let RHHcounter search these samples for genotypes with less than 0.005% frequency amongst them. The samples originating from North of the Alps and Carpathians scored 5-15 heterozygote hits each, usually widely dispersed around the genome. However, in a few Americans apparently of North European descent, the heterozygotes took the form of small segments.
Don Conrad of Genomes Unzipped, who's raw data I recently co-opted into my project, also showed a couple of such tiny segments, despite coming out 100% European in an ADMIXTURE run. These were located on chromosomes 7 and 13, and marked by just four SNPs each. You can see them on his RHHmapper Chromosome Mosaic below.
Admittedly, these don't seem like much, but in the context of my analysis, with the particular samples and thresholds I used, they do look relatively unusual for someone of Northern European origin. Indeed, in Don's case, the SNPs they contain also show heterozygote genotypes that appear distinctly Sub-Saharan African. I checked their characteristics via the very handy tools at the SPSmart and dbSNP websites.
Just for comparison, below is another Genomes Unzipped Chromosome Mosaic. This one belongs to Daniel MacArthur, and it also shows a few hits. But these heterozygotes are spread around the genome in a fairly random way, and don't appear to form segments. So as far as I can tell, it's much harder to make a case for relatively recent non-European admixture in Daniel's case.
Some of my European project members have already received their Chromosome Mosaics, and I plan to send out many more in the coming weeks. I recommend that everyone checks out their results carefully, without jumping to conclusions. Look for hits that form sizeable segments, and indeed, much more sizeable than Don's. If you do find any, study the genotypes within these at SPSmart and dbSNP, as per above, to possibly characterize their biogeographic origins. Also, keep in mind that factors like genetic diversity might be a factor - for instance, Southern Europeans have a lot more genetic diversity than Northern Europeans, so they might show more hits on their Chromosome Mosaics.
We describe a novel approach for evaluating SNP genotypes of a genome-wide association scan to identify “ethnic outlier” subjects whose ethnicity is different or admixed compared to most other subjects in the genotyped sample set. Each ethnic outlier is detected by counting a genomic excess of “rare” heterozygotes and/or homozygotes whose frequencies are low (less than 1%) within genotypes of the sample set being evaluated. This method also enables simple and striking visualization of non-Caucasian chromosomal DNA segments interspersed within the chromosomes of ethnically admixed individuals. We show that this visualization of the mosaic structure of admixed human chromosomes gives results similar to another visualization method (SABER) but with much less computational time and burden. We also show that other methods for detecting ethnic outliers are enhanced by evaluating only genomic regions of visualized admixture rather than diluting outlier ancestry by evaluating the entire genome considered in aggregate. We have validated our method in the Wellcome Trust Case Control Consortium (WTCCC) study of 17,000 subjects as well as in HapMap subjects and simulated outliers of known ethnicity and admixture. The method's ability to precisely delineate chromosomal segments of non-Caucasian ethnicity has enabled us to demonstrate previously unreported non-Caucasian admixture in two HapMap Caucasian parents and in a number of WTCCC subjects. Its sensitive detection of ethnic outliers and simple visual discrimination of discrete chromosomal segments of different ethnicity implies that this method of rare heterozygotes and homozygotes (RHH) is likely to have diverse and important applications in humans and other species.
Ralph E. McGinnis, Visualizing Chromosome Mosaicism and Detecting Ethnic Outliers by the Method of “Rare” Heterozygotes and Homozygotes (RHH) , Human Molecular Genetics, 2010, Vol. 19, No. 13 2539–2553, doi:10.1093/hmg/ddq102