Ancestry-Constrained Phylogenetic Analysis Supports the Indo-European Steppe Hypothesis

Friday, March 11th, 2005, 04:18 AM
Discussion of Indo-European origins and dispersal focuses on two hypotheses. Qualitative evidence from reconstructed vocabulary and correlations with archaeological data suggest that Indo-European languages originated in the Pontic-Caspian steppe and spread together with cultural innovations associated with pastoralism, beginning c. 65005500 bp. An alternative hypothesis, according to which Indo-European languages spread with the diffusion of farming from Anatolia, beginning c. 95008000 bp, is supported by statistical phylogenetic and phylogeographic analyses
of lexical traits. The time and place of the Indo-European ancestor language therefore remain disputed.

Here we present a phylogenetic analysis in which ancestry constraints permit more accurate inference of rates of change, based on observed changes between ancient or medieval languages and their modern descendants, and we show that the result strongly supports the steppe hypothesis.

Positing ancestry constraints also reveals that homoplasy is common in lexical traits, contrary to the assumptions of previous work. We show that lexical traits undergo recurrent evolution due to recurring patterns of semantic and morphological change.

This article has three main goals. First, we show that statistical phylogenetic analysis supports the traditional steppe hypothesis about the origins and dispersal of the Indo-European language family. We explain why other similar analyses, some of them widely publicized, reached a different result.

Second, for skeptics about phylogenetic methodology, we suggest that the agreement between our findings and the independent results of other lines of research confirms the reliability of statistical inference of reconstructed chronologies.

Finally, for linguistic phylogenetic research, we argue that analyses grounded in the evolutionary properties of the traits under study yield more reliable results. Our discussion makes reference to ancestry relationships, for example between Old Irish and two modern languages descended from it, Irish and Scots Gaelic, and draws on what can be learned from direct observation of changes over historical time.

In our phylogenetic analyses, we introduce ancestry constraints and show
that they result in more realistic inferences of chronology. Our article is organized as follows.

We first give background information about the steppe andAnatolian hypotheses, and about earlier phylogenetic analyses (1), and discuss lexical traits (2) and linguistic ancestry relationships (3).

We then describe our data and some measurements made directly on the data (4), explain our phylogenetic methods (5), and summarize our experimental results (6). Finally, we discuss the effects of advergence (7) and ancestry constraints in phylogenetic modeling (8), followed
by conclusions (9) and appendices with details about methods and results.1

