Thursday, March 23, 2017

The Limits Of Consumer Genomic Ancestry Reports

Razib has a nice discussion of the limitations of ancestry assignment methods in consumer genomic products like 23andMe. He makes a couple of key points:

* South Asia has ten times as much intra-regional variation as Northern Europe, but genomics companies currently make little effort to disaggregate this variation outside Europe.

* Consumer genomic companies are in a bind, because the genomic structure of Europe doesn't track the modern national boundaries well. 

Some of the genomic structure of Europe is more clear at a sub-national region level that may cross national boundaries, and some of it is related to events too deep in prehistory to to well known to the general public (e.g. the hunter-gather, first wave Neolithic, and early Bronze Age population shifts plus some minor or only locally significant matters like admixture arising during European colonial empire periods, and major population disruptions in Austria-Hungary). But, without a lot of context, those distinctions don't make much sense to typical personal genomics company customers.


Some excerpts from his fuller explanation (bold emphasis in original, italic emphasis and material in brackets is mine), tl;dr: 
The genetic differences and distance between various South Asian groups are far higher than those between various Northern European groups. Depending on the statistic measure you use intra-South Asian variation is about one order of magnitude greater than intra-Northern European differences. This is due to geographic partitioning, the caste system, and differential admixture in South Asians between extreme diverged ancestral elements (about half of South Asian ancestry is very similar to Europeans and Middle Easterners, and half of it is extremely different, so how far you are from the 50 percent mark determines a lot). 
In Northern Europe there is very little genetic variation from the British Isles all the way the Baltic. The reason for this is historical: massive population turnover in the region 4,500 years ago means that much of the genetic divergence between the groups dates to the Bronze Age. It is this the genetic divergence, the variation, that is the raw material for the inferences and proportions you see in ancestry calculators. There’s just not that much raw material for Northern Europeans. . . . 
As I have stated many times, racial background is to various extents both biological and social. When it comes to the difference between Lithuanians and Nigerians the biological differences due to evolutionary history are straightforward, and clear and distinct. You can generate a phylogenetic history and perform a functional analysis of the differences. Additionally, you also have to note that the social differences exist, but are not straightforward. Like Lithuanians Nigerians of Igbo background are generally Roman Catholic, while most other Nigerians are not. 
. . .
If a direct-to-consumer genetic testing company tells you that you are 90 percent Northern European and 10 percent West African, that is a robust result that has a clear historical genetic interpretation. . . .  In contrast, notice how 23andMe . . . tells people they are “French-German,” and not French or German. What the hell is a “French-German”? Someone from Alsace-Lorraine? A German descendent of Huguenots? Obviously not. 
. . . 
“French-German” is a cluster almost certainly because there are no clear and distinct genetic differences between French and Germans. . . . France and Germany have a lot of local structure even among people of indigenous ancestry. Germans from the Rhineland are quite often genetically closer to French from Normandy than they are to Germans from eastern Saxony. . . . Germans from the eastern regions are Germanized Slavs. Some Germans from the north exhibit strong affinities to Scandinavians, while Germans from Bavaria and Austria are classically Central European (whatever that means). The average German is distinct from the average French person, but the genetic clustering of the two groups is not clear and distinct.  
. . .
The . . . evolutionary genetic history [of Northern Europeans] is one where there are far fewer differences. The data do not fit a model that makes much sense to the average consumer (e.g., “you descend from a mix of Bronze Age migrants from the west-central steppe of Eurasia and Mesolithic indigenous hunter-gatherers and Neolithic farmers”). 
There are indeed, however, cases where assignments by 23andMe are just flat dubious, but only in relatively isolated cases for some pretty technical reasons relating to the sizes of the samples used to train the model.

Among the most notable being the tendency to treat a significant part of Korean ancestry as Japanese. From a historical perspective, any significant percentage genomic component shared by Koreans and Japanese people almost certainly arises from Yaoyi migration from Korea to Japan about two thousand years ago which contributed significantly to the modern Japanese ethnogenesis, and not from Korean admixture with Japanese populations during Japanese rule (which could account for a small and regionally specific admixture signal but not the massive one which 23andMe routinely suggests in its ancestry analysis). But, the model doesn't know that and automatically makes a different assumption.

No comments: