- Linguistic and genetic scholars team up to study Australian settlement.
- Computational linguistics mines 800k word database to confirm genetic research.
- Modern scientific tools are revealing Aboriginal population movement.
Unearthing the prehistory of a language is like doing archaeology in language instead of dirt.
So says Claire Bowern, associate professor of linguistics at Yale. Bowern is a pioneer in the field of Bayesian phylogenetic linguistics; treating language branching much like a tree branching as it grows, she traces the lineage of languages.
Originally from Australia, Bowern has a home-grown interest in preserving endangered languages, such as those languages of the Pama-Nyungan family she is currently focusing on.
“I’m interested in how languages can help us figure out the past. Language is an important and reliable way to find out about prehistory, but many of the languages that are useful for this type of work are also highly endangered.”
Her most recent project marries a computational linguistic approach with analyses from an international team of geneticists. In research recently published in Nature, her archeology has unearthed an uncanny match between the spread of language and the genomic record of Australian settlement as analyzed by Anna-Sapfo Malaspinas, et al.
To study the history of language, Bowern notes, one could opt for an analog pen and paper method. But for a quicker, deeper analysis, a scientist now has the advantage of combining traditional methods with algorithms and supercomputers.
Funded in part by the US National Science Foundation (NSF), her work starts with the assumption that languages, like many other human systems, change systemically, and can be investigated as we would a biological system.
To arrive at her conclusions about the development of the Pama-Nyungan language family, she looked to the CHIRILA database compiled by her lab, beginning in 2007 and still continuing. CHIRILA (Contemporary and Historical Reconstruction in the Indigenous Languages of Australia), is a collection of 780,000 words from all around Australia.
"Language is an important and reliable way to find out about prehistory, but many of the languages that are useful for this type of work are also highly endangered." ~ Claire Bowern.
Bowern examined word roots (cognates) shared among the 28 languages in the family that were represented in Malaspinas’ genetic survey. She first coded words for ‘cognacy’ – words are cognate if they are assumed to descend from the same common source. The observed word patterns were then plugged into a computer model to evaluate which language trees most likely reflected those word patterns, using a Markov Chain Monte Carlo algorithm.
Then she compared the resultant linguistic tree with genomic data from colleague Eske Willerslev’s survey. (Willerslev’s genetic survey was combined and analyzed on the UBELIX cluster at the University of Bern.) Bowern’s Pama-Nyungan linguistic tree structure surprised her geneticist colleagues, as it mirrored the general genetic migration observed across the Australian continent in large measure.
“I was very surprised that the language and gene trees matched so closely,” says Bowern. “I had no prior information about genetic patterns within Australia, so any result would have been interesting, but the closeness of the match was surprising. I was surprised how clear it was.”
Looking closely at hundreds of genomic sequences from Aboriginal Australians and Papua New Guinea, Willerslev’s team found only one migration into Australia.
In sum, the genetic and linguistic surveys both agreed that within the last 10,000 years, Aboriginal populations moved from the northeast to the southwest of Australia. Bowern’s analysis suggests the migration occurred in waves, with new languages overlaying older ones as populations grew and moved.
“Ten years ago, we didn’t really know the fine detail of how Pama-Nyungan languages were related to one another,” Bowern summarizes. “The recent genetics research has revealed a lot about the history of Aboriginal people, and we now know a lot more about how Australian languages are related to one another, and how language change in general works.”
Bowern’s computational research is proving to be a strong ally for genetic researchers as they piece together the story of our past.