Search This Blog


Wednesday, November 24, 2010

Taking a closer look at your "Chromosome Mosaic" results

I thought I'd reiterate in more detail how it's possible to get the most out of the RHHcounter/RHHmapper data I sent out this week. First of all, please read carefully the journal article I linked to, which has a thorough description of how the authors found previously unreported Sub-Saharan African admixture in two European American HapMap samples from Utah.

Secondly, it's very important to realize that these kinds of tests rely on more than just isolated matches to show something meaningful. In other words, one SNP hit with one, two or three other project members doesn't mean much in this context. You need multiple SNPs that show very similar genotype frequencies in tens or hundreds of samples. As mentioned previously, the
SPSmart website is a very good resource for that sort of thing.

Tick the "CEPH U. Stanford HGDP" box and press "metasearch" > Tick all the continental boxes (ie. Africa, America, etc.) > Paste your SNP (or SNPs) into the "SEARCH BY SNPS" text box > Press next > Press search

That should give you a set of pie charts and figures showing the frequency of the alleles for that SNP in 7 biogeographic zones. Like this...

And you can double check these results at the dbSNP website by clicking on the SNP in question @ SPSmart...

OK, so if you're GG or GT for that particular SNP, it's pretty obvious this could be a sign of African ancestry. But, as per above, that doesn't mean much by itself. Check whether there are other SNPs being flagged in the same area with similarly African specific results, and indeed whether there's a sign of a possible segment of African origin there. The more SNPs, and the clearer the segment, the more reliable the result.

It's also possible to get an idea of the percentage of your genome covered by this segment by checking its start and end points in base pairs, and then working out its size in Mbs (FYI, the human genome is about 3000 Mb in size). But that would just be a rough guide, because the size of the segment might change with a different threshold for rare genotype detection. For example, if I specify that all genotypes with an incidence of less than 0.005% are flagged, then that might show up a smaller segment then if I go down to 0.05%. Oh, and it might also pay to check the recombination rate for that area of genome, because if it's in a so called cold spot, then that segment could be old...really old.