Search This Blog


Wednesday, November 24, 2010

Taking a closer look at your "Chromosome Mosaic" results

I thought I'd reiterate in more detail how it's possible to get the most out of the RHHcounter/RHHmapper data I sent out this week. First of all, please read carefully the journal article I linked to, which has a thorough description of how the authors found previously unreported Sub-Saharan African admixture in two European American HapMap samples from Utah.

Secondly, it's very important to realize that these kinds of tests rely on more than just isolated matches to show something meaningful. In other words, one SNP hit with one, two or three other project members doesn't mean much in this context. You need multiple SNPs that show very similar genotype frequencies in tens or hundreds of samples. As mentioned previously, the
SPSmart website is a very good resource for that sort of thing.

Tick the "CEPH U. Stanford HGDP" box and press "metasearch" > Tick all the continental boxes (ie. Africa, America, etc.) > Paste your SNP (or SNPs) into the "SEARCH BY SNPS" text box > Press next > Press search

That should give you a set of pie charts and figures showing the frequency of the alleles for that SNP in 7 biogeographic zones. Like this...

And you can double check these results at the dbSNP website by clicking on the SNP in question @ SPSmart...

OK, so if you're GG or GT for that particular SNP, it's pretty obvious this could be a sign of African ancestry. But, as per above, that doesn't mean much by itself. Check whether there are other SNPs being flagged in the same area with similarly African specific results, and indeed whether there's a sign of a possible segment of African origin there. The more SNPs, and the clearer the segment, the more reliable the result.

It's also possible to get an idea of the percentage of your genome covered by this segment by checking its start and end points in base pairs, and then working out its size in Mbs (FYI, the human genome is about 3000 Mb in size). But that would just be a rough guide, because the size of the segment might change with a different threshold for rare genotype detection. For example, if I specify that all genotypes with an incidence of less than 0.005% are flagged, then that might show up a smaller segment then if I go down to 0.05%. Oh, and it might also pay to check the recombination rate for that area of genome, because if it's in a so called cold spot, then that segment could be old...really old.


Lechu said...

dear Davidski,

here are the SPSmart programmers, just wanting to thank you for using our tool for the exact purpose it was build to, but also to indicate that the metasearch option allows not only to look into one dataset (you are using the ceph browser directly) but on the other ones too at the same time. this option may allow you not only to see what you are looking for, but also to compare the results in different reference repositories.

maybe the stanford ceph is dense enough for your query, but using hapmap too (we provide the latest #28 release) will probably add some extra insight to your demands. we are indeed currently waiting for a paper to be accepted in order to release 1000 Genome's pilot 1 data too, and we plan to include the recently released 1000 Genomes SNP information on our browser by the end of the year.

kindly regards,

lechu said...

just a quick note to let you know that we've finally included 1000 Genomes data on SPSmart. due to the recent release of part of the final data it took us a little more than expected, but we've been recently accepted a paper on BMC Bioinformatics where you'll soon be able to read about it. again thank you for using the tool for what it was first conceived.

by the way, if you are dealing with ancestry wouldn't you be interested in a set of markers that have been forensicly studied to achieve such goal? they are only 34, but they seem to show great granularity. you may access them though the same SPSmart interface: