Thursday, November 8, 2012

Neanderthal Admixture Revisited

The conclusion of the lastest (open access) PLOS paper on the circumstances involved in a Neanderthal admixture with modern humans states:

The date of 37,000–86,000 years BP [for Neanderthal admixture/genetic overlap with modern humans as measured by linkage disequalibrium genetic methods] is too recent to be consistent with the “ancient African population structure” scenario, and strongly supports the hypothesis that at least some of the signal of Neandertals being more closely related to non-Africans than to Africans is due to recent gene flow. These results are concordant with a recent paper by Yang et al that analyzed joint allele frequency spectra in Africans, non-Africans and Neandertals, to reject the ancient structure scenario.

After the present paper was accepted, Eriksson and Manica showed, using an Approximate Bayesian Computation approach, that models of ancient substructure can produce a signal of Neandertals sharing more derived alleles with non-Africans than with Africans (that is, they can account for the observation that D-statistics are significantly different from zero). The same observation was made in our earlier papers on the draft Neandertal and Denisovan genomes where we introduced D-statistics. However, the new statistics we focus on here as well as the statistics focused on by Yang et al show that ancient structure alone cannot explain these signals.

One possibility that we have not ruled out is that both ancient structure and gene flow occurred in the history of non-Africans. In the simulations reported in Table 1, we show that in this scenario, the ancient structure will tend to make the date estimate older than the truth but by not more than 15%, so that the date of 37,000–86,000 should still provide a valid bound while the less conservative estimate of 47,000–65,000 years should be interpreted as an upper bound on the date of gene flow.

Further, we have not been able to differentiate amongst variants of the recent gene flow scenario: a single episode or multiple episodes of gene flow or continuous gene flow over an extended period of time. Our date has a clear interpretation as the time of last gene exchange under a scenario of a single instantaneous gene flow event. In the other scenarios, the date is expected to represent an average over the times of gene flow and should be interpreted as an upper bound on the time of last gene exchange.

While recent gene flow from Neandertals into the ancestors of modern non-Africans is a parsimonious model that is consistent with our results, our analysis cannot reject the possibility that gene flow did not involve Neandertals themselves, but instead populations that were more closely related to Neandertals than any extant populations are today. Thus, the date should be interpreted as the last period of time when genetic material from Neandertals or an archaic population related to Neandertals entered modern humans.

Genetic analyses by themselves offer no indication of where gene flow may have occurred geographically. However, the date in conjunction with the archaeological evidence suggests that the two populations likely met somewhere in Western Eurasia. An attractive hypothesis is the Middle East, where archaeological and fossil evidence indicate that modern humans appeared before 100,000 years ago (as reflected by the modern human remains in Skhul and Qafzeh caves), Neandertals expanded around 70,000 years ago (as reflected for example by the Neandertal remains at Tabun Cave), and modern humans re-appeared around 50,000 years ago.

Our genetic date estimates, which have a mostly likely range of 47,000–65,000 years ago (and are confidently below 86,000 years ago), are too recent to be consistent with the appearance of the first fossil evidence of modern humans outside of Africa—that is, our date makes it unlikely that the Neandertal genetic material in modern humans today could arise exclusively due to the gene flow involving the Skhul/Qafzeh modern humans—and instead point to gene flow in a more recent period, possibly when modern humans carrying Upper Paleolithic technologies expanded out of Africa.
Note: I include almost the entire conclusion rather than select excerpts, despite copyright to capture numerous important qualifiers to the conclusion in the original and feel that this constitutes fair use particularly in light of the fact that this is an open access, basic science publication.

LD methods of dating are far less controversial than mutation rate dates. They rely on some basic and well established features of the recombination process at each generation, and essentially measure how well shuffled SNPs are as a result of that process, rather than the number of mutations against some baseline found in a genome.

This study, together with other recent research, establishes that:

1. Neanderthal admixture with Eurasian modern humans (or admixture with archaic hominins more closely related to Neanderthals than to modern humans or Denisovans) did take place sometime in the Middle Stone Age or Upper Paleolithic era, rather than simply be an artifact of ancient population structure, even though there may be some contribution from ancient population structure.

It is clear from ancient DNA and other evidence that there was indeed significant population structure among both Neanderthals and modern humans in the Middle Stone Age and Upper Paleolithic. Ancient DNA shows some level of regional differentiation in Neanderthal genetics. Similarly, Eurasian uniparental mtDNA and Y-DNA phylogenies are derived from only a small subset of Africa's population genetic diversity (unsurpisingly, the subset likely to have origins in the vicinity of the geographic areas in Africa where modern humans first left that continent).

It is not at all clear that Eurasian modern humans had much internal population structure prior to a schism between West Eurasian and East Eurasian populations, although uniparental genetic phylogenies are not inconsistent with the possibility that there could have been two or three subgroups of Eurasians with significant population structure.

It is also not clear how much of the current internal genetic diversity in West Eurasians and in East Eurasians that is not clearly attributable to mutations arising in situ in the first populations present in a geographic area, was present from the start, as opposed to arising from subsequent wave of migration. A material part of the genetic diversity in modern populations, at least, is derived from later migration waves and was not present when the first modern humans arrived at particular location in Europe and Asia. Likewise, some of the genetic diversity present in the earliest Eurasian populations must have involved genetic features found only in populations that have since gone extinct.

2. Most of the admixture evidence in modern humans alive today took place before modern humans arrived in Europe, but after they left Africa.

3. At least some of the admixture took place at a time closer to an "Out of Arabia" date than to an "Out of Africa" date given the increasing archaeological evidence for an Out of Africa date before 100kya. (No one seriously argues that modern Eurasians are descendants predominantly from initial Out of Africa migrants via Spain or Italy, rather than via Israel or southern Arabia.)

4. The study does not resolve whether admixture happened before, after, or both before and after, the ancestors of modern Eurasians split into West Eurasian and East Eurasian populations.

There is relatively little overlap between the Neanderthal SNPs found in West Eurasians and the Neanderthal SNPs found in East Eurasians, suggesting that either parallel admixture events as the source of at least some admixture in these populations, or founder effects at the time of the West Eurasian-East Eurasian schism prior to Neanderthal admixture reaching a point of fixation are the most likely source of this distinction. This schism is nearly complete no further east than the India-Burma border.

5. Neither this study, nor previous ones, provide much insight into whether admixture was a punctuated event or a gradual process over millenia of co-existence. The absolute number of admixture events and the effective population size of the early Eurasians at the time of Neanderthal admixture are not very tightly constrained. We do know that there is no Neanderthal mtDNA or Y-DNA in any modern human now living (out of more than a hundred thousand people tested in a way that oversamples potentially significant outliers) or in any ancient DNA from a modern human. We do know that East Eurasian Neanderthal admixture preceded Denisovan admixture in the proto-populations that gave rise to Papuans and aboriginal Australians.

6. There is considerable reason to believe from ancient DNA evidence and circumstantial evidence that indigeneous European hunter-gatherer populations (and populations with large demographic contributions from these populations that transitioned in food production methods) had much higher levels of Neanderthal admixture than modern European or Asian populations, as a result of additional Neanderthal admixture taking place upon arrival in Europe. This persisted in parts of Europe at least until the Copper age (ca. 3500 BCE).

The most plausible explanation for why this is no longer the case is that subsequent waves of migration after Neanderthals went extinct a little less than 30,000 years ago, by people who lacked this elevated level of Neanderthal admixture into Europe because their ancestors were from places where Neanderthals ceased to be present much earlier, have diluted the admixture levels seen in modern European populations. This suggests that there were very significant demic contributions from outside Europe (or at least from the European far fringe) to European population genetics within the last 5,500 years or so.

Source: Sriram Sankararaman, Nick Patterson, Heng Li, Svante Pääbo, David Reich
"The Date of Interbreeding between Neandertals and Modern Humans," PLOS Genetics, October 2012 link.

7 comments:

Maju said...

Link?

In any case it's clear they are misreading the "molecular clock" with the typical systematic bias that gives them too recent dates. If their MC algorithms would make any sense they'd come with a much earlier date and not those.

Andrew Oh-Willeke said...

Thanks for asking, I screwed up the coding of the link so it didn't show up, and in pennance, I have put the full reference in the body text.

As to the systematic bias, the molecular clock's relevance is not the same in LD methods as it is in mutation rate dating. Effective population size upon conclusion of admixture, generation length, and population history (e.g. with or without bottlenecks) are pretty much the only working parts. The generation length assumed is 29 years, which is very standard. The other assumptions vary by model tested and are disclosed in detail at S2. As to the impact of the mutation rate, the tested the entire range of plausible values and found that the indirect role that this plays in their method did not materially impact the result.

"S2.5 Effect of the mutation rate[:] Mutation rate has an indirect effect on our estimates – the mutation rate affects the proportion of ascertained SNPs that are likely to be introgressed. We varied the mutation rate to 1 × 10−8 and 5×10−8 in the RGF II model with no European bottleneck and again obtained consistent estimates(Table S1)."

Maju said...

Thanks for the link. But in what regards age estimates all is subject to the same kind of bias in calibration: if you imagine that human and chimp separated 5 or 7 Ma ago and the real date is 8 to 13 Ma... or if you imagine that the Out of Africa is 50 Ka ago and the real date is 125 to 80 Ka instead... The mutation rate does not exist in a vacuum but in the context of assumptions like these.

But I'll tell you more when I read it... tomorrow.

Maju said...

Well, I've been reading it more in depth and all I can say is that the word "calibration" is not anywhere to be found. For any kind of age estimate you need calibration points such as the Pan-Homo split or the first Neanderthal bone ever (and there are assumptions in each of these choices). One should ideally have several calibration points to be able to estimate a realistic C.I.

But without any calibration, it does not matter how good is the mutation rate (or LD rate if you prefer) estimate is because we have no idea of how fast mutations or recombination "butchering" of the genome goes in real life.

terryt said...

"In any case it's clear they are misreading the 'molecular clock'"

Molecular-clockology is irrelevant to the main point: hybridism between Neanderthal and modern. The paper basically claims that it was not a single event, a concept Maju seems unable to grasp.

"One possibility that we have not ruled out is that both ancient structure and gene flow occurred in the history of non-Africans".

I think that 'ruling it out' would be a huge mistake. It is surely a very likely scenario.

"Further, we have not been able to differentiate amongst variants of the recent gene flow scenario: a single episode or multiple episodes of gene flow or continuous gene flow over an extended period of time".

The second seems more likely to me.

"The most plausible explanation for why this is no longer the case is that subsequent waves of migration after Neanderthals went extinct a little less than 30,000 years ago, by people who lacked this elevated level of Neanderthal admixture into Europe because their ancestors were from places where Neanderthals ceased to be present much earlier, have diluted the admixture levels seen in modern European populations. This suggests that there were very significant demic contributions from outside Europe (or at least from the European far fringe) to European population genetics within the last 5,500 years or so".

I agree completely.

andrew said...

The "molecular clock" in LD is the same, from first principles, at every generation for any given number of SNPs.

Linkage disequilibrium "is the occurrence of some combinations of alleles or genetic markers in a population more often or less often than would be expected from a random formation of haplotypes from alleles based on their frequencies. It is not the same as linkage, which is the presence of two or more loci on a chromosome with limited recombination between them. The amount of linkage disequilibrium depends on the difference between observed and expected (assuming random distributions) allelic frequencies. Populations where combinations of alleles or genotypes can be found in the expected proportions are said to be in linkage equilibrium.

The level of linkage disequilibrium is influenced by a number of factors, including genetic linkage, selection, the rate of recombination, the rate of mutation, genetic drift, non-random mating, and population structure."

In this model, they are assuming random allellic recombination, selectively neutral SNPs (even if the total package is not random), random mating within the specified populations, population structure as specified in S2, and mutation rates within the specified ranges.

There is nothing to "calibrate" other than the generation length and the population structure. The former they assume (but this only influences the final conversion from number of generations to number of years), the later they treat as independent variables to see how they influence the dependent variables which they compare to the actual data.

LD methods are pretty much pure math. You aren't calibrating against any particular archaeological date in the way that you do with mutation rate dating.

Essentially, what LD dating is doing is equivalent to estimating how many times several dozens decks of cards that were all shuffled the same number of times were suffled when the original order of the cards is known, the suffling process is random, and the current arrangement of the cards is known.

Maju said...

Shuffling cards is made precisely so their order cannot be predicted. The more you shuffle the less you can know. Get an accomplice from your family or friends and try to find out how many times the deck was shuffled only from the position of the cards in the end: you'll fail miserably.

Unless you do a magician trick, i.e. you cheat. But cheating is not acceptable in science so... that would bring us to the wrong conclusion.

Estimating age based on LD, if it can be done at all, requires like with mutation rate, of making a number of assumptions like population size, etc. Actually the most common use of LD in population genetics is to infer the level of inbreeding (greater LD = greater inbreeding = smaller effective population size) so you can't never avoid the effective population size problematic, as happens with other methods.

Those assumptions should be explained properly, accepted for what they are (educated guesses) and therefore accept the results also for what they are: educated guesses based on other educated guesses.

A cynic would say: wild speculation based on other wild speculation masked with a good deal of fancy looking algorithms, which are mere operations through which the speculations go and provide guarantee of nothing at all.

Even if the educated guesses are all made in good faith and with good knowledge of the matter at hand, they will always be lucky hunches after all and therefore a calibration (or better several) is necessary to confirm of reject the assumptions injected to the algorithmic machine.