The People Who Went Down the Rivers: Origin of the Sino-Tibetan Language Family

I recently received a question from Quas Lacrimas:

“What (if anything) do you make of the fact that Proto-Tibetan and Proto-Sinitic were sister languages, but Tibetans and Han are so genetically disparate?”

My first response was that, assuming the question itself was correct, then one group must have conquered the other group, imparting its language but not its DNA.

On further reflection, though, I decided it’d be best to check whether the question’s initial premises were correct.

Sino-Tibetan, it turns out, is a legit language family:

The Sino-Tibetan languages, in a few sources also known as Tibeto-Burman or Trans-Himalayan, are a family of more than 400 languages spoken in East Asia, Southeast Asia and South Asia. The family is second only to the Indo-European languages in terms of the number of native speakers. The Sino-Tibetan languages with the most native speakers are the varieties of Chinese (1.3 billion speakers), Burmese (33 million) and the Tibetic languages (8 million). Many Sino-Tibetan languages are spoken by small communities in remote mountain areas and as such are poorly documented.

Map of the Sino-Tibetan language family
Red: Chinese; Yellow: Tibetan; Brown: Karen; Green: Lolo-Burmese; Orange: Other

But the claim that Tibetans and Chinese people are genetically disparate looks more questionable. While the Wikipedia page on Sino-Tibetan claims that, “There is no ethnic unity among the many peoples who speak Sino-Tibetan languages,” in the next two sentences it also claims that, “The most numerous are the Han Chinese, numbering 1.4+ billion(in China alone). The Hui (10 million) also speak Chinese but are officially classified as ethnically distinct by the Chinese government.”

But the Chinese government claiming that a group is an official ethnic group doesn’t make it a genetic group. “Hui” just means Muslim, and Muslims of any genetic background can get lumped into the group. I actually read some articles about the Hui ages ago, and as far as I recall, the category didn’t really exist in any official way prior to the modern PRC declaring that it did for census purposes. Today (or recently) there are some special perks for being an ethnic minority in China, like exceptions to the one-child policy, which lead more people to embrace their “Hui” identity and start thinking about themselves in this pan-Chinese-Muslim way rather than in terms of their local ethnic group, but none of this is genetics.

So right away I am suspicious that this claim is more “these groups see themselves as different” than “they are genetically different.” And I totally agree that Tibetan people and Chinese people are culturally distinct and probably see themselves as different groups.

For genetics, let’s turn back to Haak et al’s representation of global genetics:

Haak et all’s full dataset










Just in case you’re new around here, the part dominated by bright blue is sub-Saharan Africans, the yellow is Asians, and the orange is Caucasians. I’ve made a map to make it easier to visualize the distribution of these groups:

Asian, Australian, and Melanesian ethic groups (including Indian, Middle Eastern, and Chinese) from Haak et al’s dataset

This dataset doesn’t have a Tibetan group, but it does have the Nepalese Kusunda, Mongolic Tu (a Mongolic-language speaking people in China), and the Burmese Lahu. So it’s a start.

The first thing that jumps out at me is that the groups in the Sino-Tibetan language family do not look all that genetically distinct, at least not on a global scale. They’re more similar than Middle Easterners and Europeans, despite the fact that Anatolian farmers invaded Europe several thousand years ago.

The Wikipedia page on Sino-Tibetan notes:

J. A. Matisoff proposed that the urheimat of the Sino-Tibetan languages was around the upper reaches of the Yangtze, Brahmaputra, Salween, and Mekong. This view is in accordance with the hypothesis that bubonic plague, cholera, and other diseases made the easternmost foothills of the Himalayas between China and India difficult for people outside to migrate in but relatively easily for the indigenous people, who had been adapted to the environment, to migrate out.[68]

The Yangtze, Brahmaputra, Salween and Mekong rivers, as you might have already realized if you took a good look at the map at the beginning of the post, all begin in Tibet.

Since Tibet was recently conquered by China, I was initially thinking that perhaps an ancient Chinese group had imposed their language on the Tibetans some time in the remote past, but Tibetans heading downstream and possibly conquering the people below makes a lot more sense.

oh look, it’s our friends the Ainu

According to About World Languages, Proto-Sino-Tibetan may have split into its Tibeto- and Sinitic- branches about 4,000 BC. This is about the same time Proto-Indo-European started splitting up, so we have some idea of what a language family looks like when it’s that old; much older, and the languages start becoming so distinct that reconstruction becomes more difficult.

But if we look at the available genetic data a little more closely, we see that there are some major differences between Tibetans and their Sinitic neighbors–most notably, many Tibetan men belong to Y-Chromosome haplogroup D, while most Han Chinese men belong to haplogroup O with a smattering of Haplogroup C, which may have arrived via the Mongols.

According to Wikipedia:

The distribution of Haplogroup D-M174 is found among nearly all the populations of Central Asia and Northeast Asia south of the Russian border, although generally at a low frequency of 2% or less. A dramatic spike in the frequency of D-M174 occurs as one approaches the Tibetan Plateau. D-M174 is also found at high frequencies among Japanese people, but it fades into low frequencies in Korea and China proper between Japan and Tibet.


It is found today at high frequency among populations in Tibet, the Japanese archipelago, and the Andaman Islands, though curiously not in India. The Ainu of Japan are notable for possessing almost exclusively Haplogroup D-M174 chromosomes, although Haplogroup C-M217 chromosomes also have been found in 15% (3/20) of sampled Ainu males. Haplogroup D-M174 chromosomes are also found at low to moderate frequencies among populations of Central Asia and northern East Asia as well as the Han and Miao–Yao peoples of China and among several minority populations of Sichuan and Yunnan that speak Tibeto-Burman languages and reside in close proximity to the Tibetans.[5]

Unlike haplogroup C-M217, Haplogroup D-M174 is not found in the New World…

Haplogroup D-M174 is also remarkable for its rather extreme geographic differentiation, with a distinct subset of Haplogroup D-M174 chromosomes being found exclusively in each of the populations that contains a large percentage of individuals whose Y-chromosomes belong to Haplogroup D-M174: Haplogroup D-M15 among the Tibetans (as well as among the mainland East Asian populations that display very low frequencies of Haplogroup D-M174 Y-chromosomes), Haplogroup D-M55 among the various populations of the Japanese Archipelago, Haplogroup D-P99 among the inhabitants of Tibet, Tajikistan and other parts of mountainous southern Central Asia, and paragroup D-M174 without tested positive subclades (probably another monophyletic branch of Haplogroup D) among the Andaman Islanders. Another type (or types) of paragroup D-M174 without tested positive subclades is found at a very low frequency among the Turkic and Mongolic populations of Central Asia, amounting to no more than 1% in total. This apparently ancient diversification of Haplogroup D-M174 suggests that it may perhaps be better characterized as a “super-haplogroup” or “macro-haplogroup.” In one study, the frequency of Haplogroup D-M174 without tested positive subclades found among Thais was 10%.

Haplogroup D’s sister clade, Haplogroup E, (both D and E are descended from Haplogroup DE), is found almost exclusively in Africa.

Haplogroup D is therefore very ancient, estimated at 50-60,000 years old. Haplogroup O, by contrast, is only about 30,000 years old.

On the subject of Han genetics, Wikipedia states:

Y-chromosome haplogroup O3 is a common DNA marker in Han Chinese, as it appeared in China in prehistoric times. It is found in more than 50% of Chinese males, and ranging up to over 80% in certain regional subgroups of the Han ethnicity.[100] However, the mitochondrial DNA (mtDNA) of Han Chinese increases in diversity as one looks from northern to southern China, which suggests that male migrants from northern China married with women from local peoples after arriving in modern-day Guangdong, Fujian, and other regions of southern China.[101][102] … Another study puts Han Chinese into two groups: northern and southern Han Chinese, and it finds that the genetic characteristics of present-day northern Han Chinese was already formed as early as three-thousand years ago in the Central Plain area.[109]

(Note that 3,000 years ago is potentially a thousand years after the first expansion of Proto-Sino-Tibetan.)

The estimated contribution of northern Hans to southern Hans is substantial in both paternal and maternal lineages and a geographic cline exists for mtDNA. As a result, the northern Hans are the primary contributors to the gene pool of the southern Hans. However, it is noteworthy that the expansion process was dominated by males, as is shown by a greater contribution to the Y-chromosome than the mtDNA from northern Hans to southern Hans. These genetic observations are in line with historical records of continuous and large migratory waves of northern China inhabitants escaping warfare and famine, to southern China.

Interestingly, the page on Tibetans notes, ” It is thought that most of the Tibeto-Burman-speakers in Southwest China, including the Tibetans, are direct descendants from the ancient Qiang.[6]

On the Qiang:

The term “Qiang” appears in the Classic of Poetry in reference to Tang of Shang (trad. 1675–1646 BC).[14] They seem to have lived in a diagonal band from northern Shaanxi to northern Henan, somewhat to the south of the later Beidi. They were enemy of the Shang dynasty, who mounted expeditions against them, capturing slaves and victims for human sacrifice. The Qiang prisoners were skilled in making oracle bones.[15]

This ancient tribe is said to be the progenitor of both the modern Qiang and the Tibetan people.[16] There are still many ethnological and linguistic links between the Qiang and the Tibetans.[16] The Qiang tribe expanded eastward and joined the Han people in the course of historical development, while the other branch that traveled southwards, crosses over the Hengduan Mountains, and entered the Yungui Plateau; some went even farther, to Burma, forming numerous ethnic groups of the Tibetan-Burmese language family.[17] Even today, from linguistic similarities, their relative relationship can be seen.

So here’s what I think happened (keeping in mind that I am in no way an expert on these subjects):

  1. About 8,000 years ago: neolithic people lived in Asia. (People of some sort have been living in Asia since Homo erectus, after all.) The ancestors of today’s Sino-Tibetans lived atop the Tibetan plateau.
  2. About 6,000 years ago: the Tibetans headed downstream, following the course of local rivers. In the process, the probably conquered and absorbed many of the local tribes they encountered.
  3. About 4,000 years ago: the Han and Qiang are ethnically and linguistically distinct, though the Qiang are still fairly similar to the Tibetans.
  4. The rest of Chinese history: Invasion from the north. Not only did the Mongols invade and kill somewhere between 20 and 60 million Chinese people in the 13th century, but there were also multiple of invasions/migrations by people who were trying to get away from the Mongols.

Note that while the original proto-Sino-Tibetan invasion likely spread Tibetan Y-Chromosomes throughout southern China, the later Mongol and other Chinese invasions likely wiped out a large percent of those same chromosomes, as invaders both tend to be men and to kill men; women are more likely to survive invasions.

Most recently, of course, the People’s Republic of China conquered Tibet in 1951.

I’m sure there’s a lot I’m missing that would be obvious to an expert.

Let’s Talk Genetics (Polish and German)

source: Big Think: Genetic map of Europe

Continuing with our discussion of German/Polish history/languages/genetics, let’s look at what some actual geneticists have to say.

(If you’re joining us for the first time, the previous two posts summarize to: due to being next door to each other and having been invaded/settled over the millennia by groups which didn’t really care about modern political borders, Polish and German DNA are quite similar. More recent events, however, like Germany invading Poland and trying to kill all of the Poles and ethnic Germans subsequently fleeing/being expelled from Poland at the end of the war have created conditions necessary for genetic differentiation in the two populations.)

So I’ve been looking up whatever papers I can find on the subject.

In Contemporary paternal genetic landscape of Polish and German populations: from early medieval Slavic expansion to post-World War II resettlements, Rebala et al write:

The male genetic landscape of the European continent has been shown to be clinal and influenced primarily by geography rather than by language.1 One of the most outstanding phenomena in the Y-chromosomal diversity in Europe concerns the population of Poland, which reveals geographic homogeneity of Y-chromosomal lineages in spite of a relatively large geographic area seized by the Polish state.2 Moreover, a sharp genetic border has been identified between paternal lineages of neighbouring Poland and Germany, which strictly follows a political border between the two countries.3 Massive human resettlements during and shortly after the World War II (WWII), involving millions of Poles and Germans, have been proposed as an explanation for the observed phenomena.2, 3 Thus, it was possible that the local Polish populations formed after the early Slavic migrations displayed genetic heterogeneity before the war owing to genetic drift and/or gene flow with neighbouring populations. It has been also suggested that the revealed homogeneity of Polish paternal lineages existed already before the war owing to a common genetic substrate inherited from the ancestral Slavic population after the Slavs’ early medieval expansion in Europe.2 …

We used high-resolution typing of Y-chromosomal binary and microsatellite markers first to test for male genetic structure in the Polish population before massive human resettlements in the mid-20th century, and second to verify if the observed present-day genetic differentiation between the Polish and German paternal lineages is a direct consequence of the WWII or it has rather resulted from a genetic barrier between peoples with distinct linguistic backgrounds. The study further focuses on providing an answer to the origin of the expansion of the Slavic language in early medieval Europe. For the purpose of our investigation, we have sampled three pre-WWII Polish regional populations, three modern German populations (including the Slavic-speaking Sorbs) and a modern population of Slovakia. …

AMOVA in the studied populations revealed statistically significant support for two linguistically defined groups of populations in both haplogroup and haplotype distributions (Table 2). It also detected statistically significant genetic differentiation for both haplogroups and haplotypes in three Polish pre-WWII regional populations (Table 2). The AMOVA revealed small but statistically significant genetic differentiation between the Polish pre-war and modern populations (Table 2). When both groups of populations were tested for genetic structure separately, only the modern Polish regional samples showed genetic homogeneity (Table 2). Regional differentiation of 10-STR haplotypes in the pre-WWII populations was retained even if the most linguistically distinct Kashubian speakers were excluded from the analysis (RST=0.00899, P=0.01505; data not shown). Comparison of Y chromosomes associated with etymologically Slavic and German surnames (with frequencies provided in Table 1) did not reveal genetic differentiation within any of the three Polish regional populations for all three (FST, ΦST and RST) genetic distances. Moreover, the German surname-related Y chromosomes were comparably distant from Bavaria and Mecklenburg as the ones associated with the Slavic surnames (Supplementary Figure S2). MDS of pairwise genetic distances showed a clear-cut differentiation between German and Slavic samples (Figure 2). In addition, the MDS analysis revealed the pre-WWII populations from northern, central and southern Poland to be moderately scattered in the plot, on the contrary to modern Polish regional samples, which formed a very tight, homogeneous cluster (Figure 3).

Nicolaus Copernicus, Polish astronomer famous for developing heliocentric model of the solar system

This all seems very reasonable. Modern Poland is probably more homogenous than pre-war Poland in part because modern Poles have cars and trains and can marry people from other parts of Poland much more easily than pre-war Poles could, and possibly because the war itself reduced Polish genetic diversity and displaced much of the population.

Genetic discontinuity along the Polish-German border also makes sense, as national, cultural, and linguistic boundaries all make intermarriage more difficult.

The Discussion portion of this paper is very interesting; I shall quote briefly:

Kayser et al3 revealed significant genetic differentiation between paternal lineages of neighbouring Poland and Germany, which follows a present-day political border and was attributed to massive population movements during and shortly after the WWII. … it remained unknown whether Y-chromosomal diversity in ethnically/linguistically defined Slavic and German populations, which used to be exposed to intensive interethnic contacts and cohabit ethnically mixed territories, was clinal or discontinuous already before the war. In contrast to the regions of Kaszuby and Kociewie, which were politically subordinated to German states for more than three centuries and before the massive human resettlements in the mid-20th century occupied a narrow strip of land between German-speaking territories, the Kurpie region practically never experienced longer periods of German political influence and direct neighbourhood with the German populations. Lusatia was conquered by Germans in the 10th century and since then was a part of German states for most of its history; the modern Lusatians (Sorbs) inhabit a Slavic-speaking island in southeastern Germany. In spite of the fact that these four regions differed significantly in exposure to gene flow with the German population, our results revealed their similar genetic differentiation from Bavaria and Mecklenburg. Moreover, admixture estimates showed hardly detectable German paternal ancestry in Slavs neighbouring German populations for centuries, that is, the Sorbs and Kashubes. However, it should be noted that our regional population samples comprised only individuals of Polish and Sorbian ethnicity and did not involve a pre-WWII German minority of Kaszuby and Kociewie, which owing to forced resettlements in the mid-20th century ceased to exist, and also did not involve Germans constituting since the 19th century a majority ethnic group of Lusatia. Thus, our results concern ethnically/linguistically rather than geographically defined populations and clearly contrast the broad-scale pattern of Y-chromosomal diversity in Europe, which was shown to be strongly driven by geographic proximity rather than by language.1 …

Two main factors are believed to be responsible for the Slavic language extinction in vast territories to the east of the Elbe and Saale rivers: colonisation of the region by the German-speaking settlers, known in historical sources as Ostsiedlung, and assimilation of the local Slavic populations, but contribution of both factors to the formation of a modern eastern German population used to remain highly speculative.8 Previous studies on Y-chromosomal diversity in Germany by Roewer et al17 and Kayser et al3 revealed east–west regional differentiation within the country with eastern German populations clustering between western German and Slavic populations but clearly separated from the latter, which suggested only minor Slavic paternal contribution to the modern eastern Germans. Our ancestry estimates for the Mecklenburg region (Supplementary Table S3) and for the pooled eastern German populations, assessed as being well below 50%, definitely confirm the German colonisation with replacement of autochthonous populations as the main reason for extinction of local Slavic vernaculars. The presented results suggest that early medieval Slavic westward migrations and late medieval and subsequent German eastward migrations, which outnumbered and largely replaced previous populations, as well as very limited male genetic admixture to the neighbouring Slavs (Supplementary Table S4), were likely responsible for the pre-WWII genetic differentiation between Slavic- and German-speaking populations. Woźniak et al18 compared several Slavic populations and did not detect such a sharp genetic boundary in case of Czech and Slovak males with genetically intermediate position between other Slavic and German populations, which was explained by early medieval interactions between Slavic and Germanic tribes on the southern side of the Carpathians. Anyway, paternal lineages from our Slovak population sample were genetically much closer to their Slavic than German counterparts. …

Note that they are discussing paternal ancestry. This does not rule out the possibility of significant Slavic maternal ancestry. Finally:

Our coalescence-based divergence time estimates for the two isolated western Slavic populations almost perfectly match historical and archaeological data on the Slavs’ expansion in Europe in the 5th–6th centuries.4 Several hundred years of demographic expansion before the divergence, as detected by the BATWING, support hypothesis that the early medieval Slavic expansion in Europe was a demographic event rather than solely a linguistic spread of the Slavic language.

Marian Rejewski, Polish mathematician and cryptologist who reconstructed the Nazi German military Enigma cipher machine sight-unseen in 1932

I left out a lot of interesting material, so I recommend reading the complete discussion if you want to know more about Polish/German genetics.

But what about the maternal contribution? Luckily for us, Malyarchuk et al have written Mitochondrial DNA analysis in Poles and Russians:

Mitochondrial DNA (mtDNA) sequence variation was examined in Poles (from the Pomerania-Kujawy region; n = 436) and Russians (from three different regions of the European part of Russia; n = 201)… The classification of mitochondrial haplotypes revealed the presence of all major European haplogroups, which were characterized by similar patterns of distribution in Poles and Russians. An analysis of the distribution of the control region haplotypes did not reveal any specific combinations of unique mtDNA haplotypes and their subclusters that clearly distinguish both Poles and Russians from the neighbouring European populations. The only exception is a novel subcluster U4a within subhaplogroup U4, defined by a diagnostic mutation at nucleotide position 310 in HVS II. This subcluster was found in common predominantly between Poles and Russians (at a frequency of 2.3% and 2.0%, respectively) and may therefore have a central-eastern European origin. …

The analysis of mtDNA haplotype distribution has shown that both Slavonic populations share them mainly with Germans and Finns. The following numbers of the rare shared haplotypes and subclusters were found between populations analyzed: 10% between Poles and Germans, 7.4% between Poles and Russians, and 4.5% between Russians and Germans. A novel subcluster U4-310, defined by mutation at nucleotide position 310 in HVS II, was found predominantly in common between Poles and Russians (at frequency of 2%). Given the relatively high frequency and diversity of this marker among Poles and its low frequency in the neighbouring German and Finnish populations, we suggest a central European origin of U4-310, following by subsequent dispersal of this mtDNA subgroup in eastern European populations during the Slavonic migrations in early Middle Ages.

In other words, for the most part, Poles, Russians, Germans, and even Finns(!) (who do not speak an Indo-European language and are usually genetic outliers in Europe,) all share their maternal DNA.

Migrants, immigrants, and invaders tend disproportionately to be male (just look at any army) while women tend to stay behind. Invading armies might wipe each other out, but the women of a region are typically spared, seen as booty similar to cattle to be distributed among the invaders rather than killed. Female populations therefore tend to be sticky, in a genetic sense, persisting long after all of the men in an area were killed and replaced. The dominant Y-chromosome haplogroup in the area (R1a) hails from the Indo-European invasion (except in Finland, obviously,) but the mtDNA likely predates that expansion.

These data allow us to suggest that Europeans, despite their linguistic differences, originated in the common genetic substratum which predates the formation of the most modern European populations. It seems that considerable genetic similarity between European populations, which has been revealed by mtDNA variation studies, was further accelerated by a process of gene redistribution between populations due to the multiple migrations occurring in Europe during the past milenia…

It is interesting, though, that recent German invasions of Poland left very little in the way of a genetic contribution. I’d wager that WWII was quite a genetic disaster for everyone involved.

If you want more information, Khazaria has a nice list of studies plus short summaries on Polish DNA.

On Germanic and Polish DNA

Distribution of Y-chromosomal haplogroup I1a in Europe.

Commentator Unknown123 asks what we can tell about the differences between German and Polish DNA. Obviously German is here referring to one of the Germanic peoples who occupy the modern nation of Germany and speak a Germanic language. But as noted before, just because people speak a common language doesn’t necessarily mean they have a common genetic origin. Germans and English both speak Germanic languages , but Germans could easily share more DNA with their Slavic-language speaking neighbors in Poland than with the English.

According to Wikipedia, the modern Germanic peoples include Afrikaners, Austrians, Danes, Dutch, English, Flemish, Frisians, Germans, Icelanders, Lowland Scots, Norwegians, and Swedes.[225][226]

And here is a map that is very suggestive of Viking raiders:

(It’s also not a bad map of the distribution of Germanic peoples in 750 BC.)

Wikipedia states:

It is suggested by geneticists that the movements of Germanic peoples has had a strong influence upon the modern distribution of the male lineage represented by the Y-DNA haplogroup I1, which is believed to have originated with one man, who lived approximately 4,000 to 6,000 years somewhere in Northern Europe, possibly modern Denmark … There is evidence of this man’s descendants settling in all of the areas that Germanic tribes are recorded as having subsequently invaded or migrated to.[220][v] However, it is quite possible that Haplogroup I1 is pre-Germanic, that is I1 may have originated with individuals who adopted the proto-Germanic culture, at an early stage of its development or were co-founders of that culture. Should that earliest Proto-Germanic speaking ancestor be found, his Y-DNA would most likely be an admixture of the aforementioned I1, but would also contain R1a1a, R1b-P312 and R1b-U106, a genetic combination of the haplogroups found among current Germanic speaking peoples.[221] …

Haplogroup I1 accounts for approximately 40% of Icelandic males, 40%–50% of Swedish males, 40% of Norwegian males, and 40% of Danish Human Y-chromosome DNA haplogroups. Haplogroup I1 peaks in certain areas of Northern Germany and Eastern England at more than 30%. Haplogroup R1b and haplogroup R1a collectively account for more than 40% of males in Sweden; over 50% in Norway, 60% in Iceland, 60–70% in Germany, and between 50%–70% of the males in England and the Netherlands depending on region.[222]

Note, though, that this map has some amusing results; clearly it’s a more Nordic distribution than specifically German, with “Celtic” Ireland just as Nordic as much of England and Germany.

Wikipedia also states:

According to a study published in 2010, I-M253 originated between 3,170 and 5,000 years ago, in Chalcolithic Europe.[1] A new study in 2015 estimated the origin as between 3,470 and 5,070 years ago or between 3,180 and 3,760 years ago, using two different techniques.[2] It is suggested that it initially dispersed from the area that is now Denmark.[8]

A 2014 study in Hungary uncovered remains of nine individuals from the Linear Pottery culture, one of whom was found to have carried the M253 SNP which defines Haplogroup I1. This culture is thought to have been present between 6,500 and 7,500 years ago.[12]


In 2002 a paper was published by Michael E. Weale and colleagues showing genetic evidence for population differences between the English and Welsh populations, including a markedly higher level of Y-DNA haplogroup I in England than in Wales. They saw this as convincing evidence of Anglo-Saxon mass invasion of eastern Great Britain from northern Germany and Denmark during the Migration Period.[13] The authors assumed that populations with large proportions of haplogroup I originated from northern Germany or southern Scandinavia, particularly Denmark, and that their ancestors had migrated across the North Sea with Anglo-Saxon migrations and DanishVikings. The main claim by the researchers was:

“That an Anglo-Saxon immigration event affecting 50–100% of the Central English male gene pool at that time is required. We note, however, that our data do not allow us to distinguish an event that simply added to the indigenous Central English male gene pool from one where indigenous males were displaced elsewhere or one where indigenous males were reduced in number … This study shows that the Welsh border was more of a genetic barrier to Anglo-Saxon Y chromosome gene flow than the North Sea … These results indicate that a political boundary can be more important than a geophysical one in population genetic structuring.”

In 2003 a paper was published by Christian Capelli and colleagues which supported, but modified, the conclusions of Weale and colleagues.[14] This paper, which sampled Great Britain and Ireland on a grid, found a smaller difference between Welsh and English samples, with a gradual decrease in Haplogroup I frequency moving westwards in southern Great Britain. The results suggested to the authors that Norwegian Vikings invaders had heavily influenced the northern area of the British Isles, but that both English and mainland Scottish samples all have German/Danish influence.

But the original question was about Germany and Poland, not England and Wales, so we are wandering a bit off-track.

source: Big Think: Genetic map of EuropeLuckily for me, Wikipedia helpfully has a table of European Population Genetic Substructure based on SNPs[48][59]. We’ll be extracting the most useful parts.

A score of “1” on this graph means that the two populations in question are identical–fully inter-mixing. The closer to 1 two groups score, the more similar they are. The further from one they score, (the bigger the number,) the more different they are.

Why isn't it in English? Oh, well. We'll manage.
Here is a potentially relevant map of the neolithic cultures of Europe

For example, the most closely related peoples on the graph are Austrians and their neighbors in southern Germany and Hungary (despite Hungarians speaking a non-Indo-European language brought in by recent steppe invaders.) Both groups scored 1.04 relative to Austrians, and a 1.08 relative to each other.

Northern and southern Germans also received a 1.08–so southern Germans are about as closely related to northern Germans as they are to Hungarians, and are more closely related to Austrians than to northern Germans.

This might reflect the pre-Roman empire population in which (as we discussed in the previous post) the Celtic cultures of Hallstatt and La Tene dominated a stretch of central Europe between Austria and Switzerland, with significant expansion both east and west, whilst the proto-Germanic peoples occupied northern Germany and later spread southward.

The least closely related peoples on the graph are (unsurprisingly) the Sami (Lapp) town of Kuusamo in northeastern Finland and Spain, at 4.21. (Finns are always kind of outliers in Europe, and Spaniards are kind of outliers in their own, different way, being the part of mainland Europe furthest from the Indo-European expansion starting point and so having received fewer invaders.

So what does the table say about Germans and their neighbors?

source: Big Think: Genetic map of Europe

Northern Germany:
South Germany 1.08
Austria 1.10
Hungary 1.11
Sweden 1.12
Czech Repub 1.15
Poland 1.18
France 1.25
Bulgaria 1.32
Switzerland 1.36

Southern Germany:
Austria 1.04
North Germany 1.08
Hungary 1.08
France 1.12
Czech Repub 1.16
Switzerland 1.17
Bulgaria 1.19
Latvia 1.20
Sweden 1.21
Poland 1.23


Czech Repub 1.09
Hungary: 1.14
Estonia 1.17
North Germany 1.18
Russia 1.18
Austria 1.19
Lithuania 1.20
South Germany 1.23
Latvia: 1.26
Bulgaria 1.29
Sweden 1.30
Switzerland 1.46

Obviously I didn’t include all of the data in the original table; all of the other sampled European groups, such as Italians, Spaniards, and Finns are genetically further away from north and south Germany and Poland than the listed groups.

So northern Germany and Poland are quite closely related–even closer than northern Germans are to the French (whose country is named after a Germanic tribe, the Franks, who conquered it during the Barbarian Migrations at the Fall of the Roman Empire,) or the Swiss, many of whom speak German. By contrast, southern Germany is more closely related to France and Switzerland than to Poland, but still more closely related to the Poles than Italians or Spaniards.

To be continued…

Race: The Social Construction of a Biological Reality, pt 2

Note: This post still contains a lot of oversimplification for the sake of explaining a few things.

Welcome back to our discussion of the geographic dispersion of humanity. On Tuesday, we discussed how two great barriers–the Sahara desert and the Himalayas + central Asian desert–have impeded human travelers over the millennia, resulting in three large, fairly well-defined groups of humans, the major races: Sub-Saharan Africans (SSA), Caucasians, and east Asians.

Of course, any astute motorist, having come to a halt at the Asian end of our highway, might observe that there is, in fact, a great deal of land in the world that we have not yet explored. So we head to the local shop and pick up a better map:


Our new map shows us navigational directions for getting to Melanesia and Australia–in ice age times, it instructs us, we can drive most of the way. If there isn’t an ice age, we’ll have to take a boat.

900px-oceania_un_geoscheme_-_map_of_melanesia-svgThe people of Melanesia and Australia are related, the descendants of one of the first groups of humans to split off from the greater tribe that left Africa some 70k ago.

As the name “Melanesian” implies, they are quite dark-skinned–a result of never having ventured far from the equatorial zone.

Today, they live in eastern Indonesia, Papua New Guinea, Australia, and a smattering of smaller islands. (Notably, the Maori of New Zealand are Polynesians like the Hawaiians, not Melanesians, descendants of a different migration wave that originated in Taiwan.)

Fijian mountain warrior
Fijian mountain warrior with curly, “African” style hair

There is some speculation that they might have once been wider-spread than they currently are, or that various south-Asian tribes might be related to them, (eg, “A 2009 genetic study in India found similarities among Indian archaic populations and Aboriginal people, indicating a Southern migration route, with expanding populations from Southeast Asia migrating to Indonesia and Australia,”) but I don’t think any mainland group would today be classed as majority Melanesian by DNA.

They may also be related to the scattered tribes of similarly dark-skinned, diminutive people known as the Negritos:

Males from the Aeta people (or Agta) people of The Philippines, are of great interest to genetic, anthropological and historical researchers, as at least 83% of them belong to haplogroup K2b, in the form of its rare primary clades K2b1* and P* (a.k.a. K2b2* or P-P295*).[7] Most Aeta males (60%) carry K-P397 (K2b1), which is otherwise uncommon in the Philippines and is strongly associated with the indigenous peoples of Melanesia and Micronesia. Basal P* is rare outside the Aeta and some other groups within Maritime South East Asia. …

Naural blond hair
Two Melanesian girls from Vanatu (blond hair is common in Melanesian children.)

A 2010 study by the Anthropological Survey of India and the Texas-based Southwest Foundation for Biomedical Research identified seven genomes from 26 isolated “relic tribes” from the Indian mainland, such as the Baiga, which share “two synonymous polymorphisms with the M42 haplogroup, which is specific to Australian Aborigines“. These were specific mtDNA mutations that are shared exclusively by Australian aborigines and these Indian tribes, and no other known human groupings.[12]

A study of blood groups and proteins in the 1950s suggested that the Andamanese were more closely related to Oceanic peoples than African Pygmies. Genetic studies on Philippine Negritos, based on polymorphic blood enzymes and antigens, showed they were similar to surrounding Asian populations.[13]

Negrito peoples may descend from Australoid Melanesian settlers of Southeast Asia. Despite being isolated, the different peoples do share genetic similarities with their neighboring populations.[13][14] They also show relevant phenotypic (anatomic) variations which require explanation.[14]

In contrast, a recent genetic study found that unlike other early groups in Malesia, Andamanese Negritos lack the Denisovan hominin admixture in their DNA. Denisovan ancestry is found among indigenous Melanesian and Australian populations between 4–6%.[15][16]

Australian Aboriginal man
Australian Aboriginal man

However, the Negritos are a very small set of tribes, and I am not confident that they are even significantly related to each other, rather than just some short folks living on a few scattered islands. We must leave them for another day.

The vast majority of Aborigines and Melanesians live in Australia, Papua New Guinea, and nearby islands. They resemble Africans, because they split off from the rest of the out-of-Africa crew long before the traits we now associate with “whites” and “Asians” evolved, and have since stayed near the equator, but they are most closely related to–sharing DNA with–south Asians (and Indians.)

So we have, here, on the genetic level, a funny situation. Melanesians are–relatively speaking–a small group. According to Wikipedia, thee are about 12 million Melanesians and 606,000 Aborigines. By contrast, Tokyo prefecture has 13 million people and the total Tokyo metro area has nearly 38 million. Meanwhile, the Han Chinese–not a race but a single, fairly homogenous ethnic group–number around 1.3 billion.

Of all the world’s peoples, Melanesians/Aborigines are most closely related to other Asians–but this is a distant relationship, and those same Asians are more closely related to Caucasians than to Aborigines.

As I mentioned on Tuesday, the diagram, because it is 1-dimensional, can only show the distance between two groups at a time, not all groups. The genetic distance between Caucasians and Aborigines is about 60 or 50k, while the distance between Asians and Caucasians is around 40k, but the distance between Sub-Saharan Africans and ALL non-SSAs is about 70k, whether they’re in Australia, Patagonia, or France. Our map is not designed to show this distance, only the distances between individual pairs.

Some anthropologists refer to Bushmen as "gracile," which means they are a little shorter than average Europeans and not stockily built
Some anthropologists refer to Bushmen as “gracile,” which means they are a little shorter than average Europeans and not stockily built

Now if we hopped back in our car and zoomed back to the beginning of our trip, pausing to refuel in Lagos, we’d note another small group that has been added to the other end of the map: the Bushmen, aka the Khoi-San people. Wikipedia estimates 90,000 San and doesn’t give an estimate for the Khoi people, but their largest group, the Nama, has about 200,000 people. We’ll estimate the total, therefore, around 500,000 people, just to be safe.

The Bushmen are famous for being among the world’s last hunter-gatherers; their cousins the Khoi people are pastoralists. There were undoubtedly more of them in the past, before both Europeans and Bantus arrived in southern Africa. Some people think Bushmen look a little Asian, due to their lighter complexions than their more equatorial African cousins.


Various Y chromosome studies show that the San carry some of the most divergent (oldest) human Y-chromosome haplogroups. These haplogroups are specific sub-groups of haplogroups A and B, the two earliest branches on the human Y-chromosome tree.[48][49][50]

Mitochondrial DNA studies also provide evidence that the San carry high frequencies of the earliest haplogroup branches in the human mitochondrial DNA tree. This DNA is inherited only from one’s mother. The most divergent (oldest) mitochondrial haplogroup, L0d, has been identified at its highest frequencies in the southern African San groups.[48][51][52][53]

I loved that movie
The late Nǃxau ǂToma, (aka Gcao Tekene Coma,) Bushman star of “The Gods Must be Crazy,” roughly 1944-2003

In a study published in March 2011, Brenna Henn and colleagues found that the ǂKhomani San, as well as the Sandawe and Hadza peoples of Tanzania, were the most genetically diverse of any living humans studied. This high degree of genetic diversity hints at the origin of anatomically modern humans.[54][55]

Recent analysis suggests that the San may have been isolated from other original ancestral groups for as much as 100,000 years and later rejoined, re-integrating the human gene pool.[56]

A DNA study of fully sequenced genomes, published in September 2016, showed that the ancestors of today’s San hunter-gatherers began to diverge from other human populations in Africa about 200,000 years ago and were fully isolated by 100,000 years ago … [57]

So the total distance between Nigerians and Australian Aborogines is 70k years; the distance between Nigerians and Bushmen is at least 100k years.

When we zoom in on the big three–Sub-Saharan Africans, Caucasians, and Asians–they clade quite easily and obviously into three races. But when we add Aborigines and Bushmen, things complicate. Should we have a “race” smaller than the average American city? Or should we just lump them in with their nearest neighbors–Bushmen with Bantus and Aborigines with Asians?

I am fine with doing both, actually–but wait, I’m not done complicating matters! Tune in on Monday for more.

Ethnic Groups of India, Pakistan, Asia, and Australia


Source: Haak et al., Massive Migration from the steppe was a source for Indo-European langauges in Europe.

Note: There is a territorial dispute between India and Pakistan. I am not trying to wade into that dispute or pass judgment on who really controls what. Also, I don’t know what distinguishes the 4 Gujarati samples, so they’re just in ABC order.

And finally, greater Asia (plus Australia):


Note that I had to leave off some groups from this map that appeared on earlier maps, like most of the Caucasian ethnicities. (Note that central Siberia is not actually as badly sampled as it looks, because this is a Mercator projection which makes Siberia look bigger than it actually is. Yes, I know, I don’t like Mercator projections, either, but it’s hard to find a nice, blank map with Asia on the left and Alaska on the right, and a cylindrical projection allows me to just switch the two halves without messing up the angles of the continents.)

And we’re done!

So who is White?

“White” is a nebulous category. “Black” is actually easier to define, because there’s a pretty hard boundary (the Sahara) between black Africa and everywhere else. To be fair, there are also groups like the Bushmen (who are more tawny brownish,) and the Pygmies who are genetically separate from other sub-Saharan Africans by over 100,000 years, but these are pretty small on the global scale. But “whites” and “Asians” occupy the same continent, and thus shade into each other.

If we use a strictly skin tone definition (as the world “white” implies) we can just pull up a map of global skin tone variation:

source: Wikipedia
source: Wikipedia

Of course, this implies that either Spaniards and Finns aren’t white, or Chinese and Eskimos are. Either way is fine, of course, though this would contradict most people’s usage. (And I kind of question that data on the Finns:

credit: The Postnational Monitor
credit: The Postnational Monitor)

These composites of faces from around the world offer us some more data, though depending on how they were made, they may not accurately reflect skin tone in all countries (ie, if the creator relied on pictures of famous people available on the internet, then these will reflect local beauty norms than group averages.)

(Plus, I wonder why the Romanians are pink.)

J. B. Huang has taken some of the Eurasian faces from this set and gone through the effort of trying to quantitize their shapes, as displayed in this graph (at least, that’s what I think they’re doing):

all_embeddingInterestingly, while some of the faces cluster together the way you might expect–China, Taiwan, Korea, and Japan are all near each other, as are Belgium and the Netherlands–many of the groupings are near random, eg, Mongolia, Turkey, and the Philippines. Hungary and Austria are closer to India and Japan than to Poland or Finland. The European faces are all over the map.

Maybe this doesn’t mean anything at all, or maybe it means that there’s a lot of variation in European faces.

This is actually not too surprising, given that modern Europeans are genetically descended from three different groups who conquered the peninsula in successive waves, leaving more or less of their DNA in different areas: the hunter gatherers who were there first, followed by farmers who spread out from Anatolia (modern Turkey,) followed by the “Indo-Europeans” aka the Yamnaya, who were part hunter gatherer (by DNA, not profession) and part another group whose origins have yet to be located, but which I call the “teal people” because their DNA is teal on Haak’s graph.

Oh yes, we are getting to Haak.

Click for full size
From Haak et al.

This isn’t the full graph, but it’s probably enough for our purposes. The European countries show a characteristic profile of Orange, Dark Blue, and Teal. (By contrast, the east Asian countries, which cluster closely together on the facial map, are mostly yellow with only a bit of red.)

Obviously DNA isn’t actually colored. It’s just a visual aid.

Haak’s graph makes it fairly easy to rule out the groups that are definitely different (at least genetically.) The American Indians, Inuit, West Africans, Chinese, and Aborigines are distinctly out. This leaves us with Europe, the Middle East, North Africa, India, and parts of central Asia/Siberia:


The Orange-centric region, which Haak et al arranged to display the movements of the Anatolian farmer people.


The heavily teal Indian section (The middle part, from Hazara-Tlingit, are obviously not Indian).

siberiaAnd finally some Siberian DNA.

Now, I could stare at these all day; I love them. They tell so many fascinating stories about people and where they went. Of the three ancestries found in Europeans, the oldest, the dark blue (hunter-gatherers,) is found throughout India, Siberia, and even the Aleutian islands (though I caution that some of this could just be because of Russians raping the Aleuts back in the day.) The dark blue appears to hit a particular low point in the Caucuses region, which of course is about where the teal got its start.

The orange–Anatolian farmers–shows up throughout the Middle East and Europe, but is near totally absent in India and Siberia. (Not much farming in Siberia!)

At a lower resolution (not pictured,) India, central Asia, and Siberia appear to have a mix of–broadly speaking–“European” and “Asian” ancestry. (Not too surprising, since they are in the middle of the continent.) Obviously the middle of Asia is a big crossroads between different groups–red (Siberian) yellow (east Asian) teal and dark blue, and bits of the same DNA that shows up in the Eskimo (Inuit) and Aleuts.

But this is all kind of complicated. Luckily for us, this is only one way to visualize DNA–I’ve got others!

Credit Robert Lindsay, Beyond Highbrow
Credit Robert Lindsay, Beyond Highbrow

If you’re not familiar with these sorts of trees, the basic story is that geneticists gathered DNA samples (from spit, I think, which is pretty awesome,) from ethnic groups from all over the world, and then measured how many genes they have in common. More genes in common = groups more closely related to each other. Fewer genes = more genetic distance from each other.

Since different genetic samples and computer models are different, different teams have produced slightly different genetic trees.

Note that since the tree is constructed by comparing # of genes two groups have in common, a group could end up in a particular spot because it is descended from a common ancestor with other nearby groups, or because of mixing between two groups. Ashkenazi Jews, for example, cluster with southern Europeans because they’re about half Italian (and obviously half ancient Israeli.) Here’s another chart, giving us another perspective:

I totally stole this from Razib Khan, didn't I?
I totally stole this from Razib Khan–though he got it from here.

This chart also shows us genetic differences between groups, with strong clustering among African and East Asians, respectively, and then a sort of scattered group of Europeans and Indians (South Asians.)

Also credit Robert Lindsay
Also credit Robert Lindsay

Neither of these graphs shows Siberians or central Asians in great detail, because they are tiny groups, but I think it’s safe to say the Siberians at least cluster near their neighbors, the other Asians and far-north Americans.

The central and south Asians, though, are quite the interesting case!

Between archaeology and genetics, we’ve been able to trace the path of human expansion, from central Africa to the world:

I think this map came from that recent article about possibly finding traces of the first out-of-Africa event in Papuans.
I think this map came from that recent article I discussed in the post about possibly finding traces of the first out-of-Africa event in Papuans.

Since this post is already image heavy, here is a graph showing finer detail on European and North African groups, Moroccans, (Berbers), Aleut woman, Sardinians, Sami (Lapps), Iranians, Gujarati, (another), Dravidian, Brahmin, Dalits, Altai, Uyghur, Selkup. (Look at the pictures!)

Well, ultimately, there’s no hard division between most ethnic groups or races–you can draw dividing lines where you want them. The term “white” implies dermal paleness, of course, so you may prefer a narrower definition for “white” than “Caucasian.” Greater minds than mine have already covered the subject in more authoritative detail, of course. I merely offer my thoughts for entertainment.

Why Geneticists get touchy about Epigenetics

Disclaimer: I am not a geneticist. For those of you who are new here, this is basically a genetics fan blog. I am trying to learn about genetics, and you know what?

Genetics is complicated.

I fully admit that here’s a lot of stuff that I don’t know yet, nor fully understand.

Luckily for me, there are a few genetics basics that are easy enough to understand that even a middle school student can master them:

  1. “Evolution” is the theory that species change over time due to some individuals within them being better at getting food, reproducing, etc., than other individuals, and thereby passing on their superior traits to their children.
  2. “Genes,” (or “DNA,”) are the biological code for all life, and the physical mechanism by which traits are passed down from parent to child.
  3. “Mendel squares” work for modeling the inheritance of simple traits
  4. More complicated trait are modeled with more complicated math
  5. Lamarckism doesn’t work.

Lamarck was a naturalist who, in the days before genes were discovered, theorized that creatures could pass on “acquired” characteristics. For example, an animal with a relatively normal neck in an area with tall trees might stretch its neck in order to reach the tastiest leaves, and then pass on this longer-neck to its children, who would also stretch their necks and then pass on the trait to their children, until you get giraffes.

A fellow with similar ideas, Lysenko, was a Soviet Scientist who thought he could make strains of cold-tolerant wheat simply by exposing wheat kernels to the cold.

We have the luxury of thinking that Lysenko’s ideas sound silly. The Soviet peasants had to actually try to grow his wheat, and scientists who pointed out that this was nonsense got sent to the gulag.

The problem with Lamarckism is that it doesn’t work. You can’t make wheat grow in Antarctica by sticking it in your freezer for a few months and animals don’t have taller babies just because you stretch their necks.

So what does this have to do with epigenetics?

Pop science articles talk about epigenetics as if it were Lamarckism. Through the magic of epigenetic markers, acquired traits can supposedly be passed down to one’s children and grandchildren, infinitely.

Actual epigenetics, as scientists actually study it, is a real and interesting field. But the effects of epigenetic changes are not so large and permanent as to substantially change most of the way we model genetic inheritance.


Epigenetics is, in essence, part of how you learn. Suppose you play a disturbing noise every time a mouse smells cherries. Pretty soon, the mouse would learn to associate “fear” and “cherry smell,” and according to Wikipedia, this gets encoded at the epigenetic level. Great, the mouse has learned to be afraid of cherries.

If these epigenetic traits get passed on to the mouse’s children–I am not convinced this is possible but let’s assume it is–then those children can inherit their mother’s fear of cherries.

This is pretty neat, but people take it too far when they assume that as a result, the mouse’s fear will persist over many generations, and that you have essentially just bred a new, cherry-fearing strain of mice.

You, see, you learn new things all the time. So do mice. Your epigenetics therefore keep changing throughout your life. The older you are, the more your epigenetics have changed since you were born. This is why even identical twins differ in small ways from each other. Sooner or later, the young mice will figure out that there isn’t actually any reason to be afraid of cherries, and they’ll stop being afraid.

If people were actually the multi-generational heirs of their ancestors’ trauma, pretty much everyone in the world would be affected, because we all have at least one ancestor who endured some kind of horrors in their life. The entire continent of Europe should be a PTSD basket case due to WWI, WWII, and the Depression.

Thankfully, this is not what we see.

Epigenetics has some real and very interesting effects, but it’s not Lamarckism 2.0.

Updated Tentative map of Neanderthal DNA

Picture 1

Based on my previous tentative map of archaic DNA, plus recent findings, eg Cousins of Neanderthals left DNA in Africa, Scientists Report. As usual, let me emphasize that this is VERY TENTATIVE.

Basically: Everyone outside of Africa has some Neanderthal DNA. It looks like the ancestors of the Melanesians interbred once with Neanderthals; the ancestors of Europeans interbred twice; the ancestors of Asians interbred three times.

Small amounts of Neanderthal DNA also show up in Africa, probably due to back-migration of people from Eurasia.

Denisovan DNA shows up mainly in Melanesians, but I think there is also a very small amount that shows up in south east Asia, some (or something similar) in Tibetans, and possibly a small amount in the Brazilian rainforest.

Now some kind of other archaic DNA has been detected in the Hazda, Sandawe, and Pygmies of Africa.

Native Americans and Neanderthal DNA

Since “Do Native Americans have Neanderthal DNA?” (or something similar) is the most popular search that leads people to my blog, I have begun to suspect that a clarification is in order.

Native Americans (Indians) are not Neanderthals. They are not half or quarter or otherwise significantly Neanderthal. If they were, they would have very noticeable fertility problems in mixed-race relationships.

They may have slightly higher than average Neanderthal admixture than other groups, but that is extremely speculative I don’t know of any scientists who have said so. We’re talking here about quite small amounts, like 0.5%, most of which appears to code for things like immune response and possibly some adaptations for handling long, cold winters. None of this appears to code for physical traits like skull shape, which have been under different selective pressures over the past 40,000 years.

As much as I would love to discover a group with significant Neanderthal DNA, that’s just not something we’ve found in anyone alive today.

Sorry, guys.