New MIT algorithm automatically deciphers lost languages

A new AI system can automatically decipher a lost language that’s no longer understood — without knowing its relationship to other languages.

Researchers at MIT CSAIL developed the algorithm in response to the rapid disappearance of human languages. Most of the languages that have existed are no longer spoken, and at least half of those remaining are predicted to vanish in the next 100 years.

The new system could help recover them. More importantly, it could preserve our understanding of the cultures and wisdom of their speakers.

[Read: What audience intelligence data tells us about the 2020 US presidential election]

The 💜 of EU tech

The latest rumblings from the EU tech scene, a story from our wise ol' founder Boris, and some questionable AI art. It's free, every week, in your inbox. Sign up now!

The algorithm works by harnessing key principles from historical linguistics, such as the predictable ways in which languages use sound substitutions. The researchers give the example of a word with a “p” in a parent language possibly changing to a “b” in its descendent, but probably not to a “k” due to the difference in pronunciation.

These kinds of patterns are then turned into computational constraints. This allows the model to segment words from an ancient language and map them to a related language.

The algorithm can also identify different language families. For example, their method suggested that Iberian is not related to Basque, supporting recent scholarship.

The project was led by MIT Professor Regina Barzilay, who last month won a $1 million award from the world’s largest AI society for her pioneering work on drug development and breast cancer detection.

She now wants to expand the work to identifying the semantic meaning of words — even if we don’t know how to read them.

“For instance, we may identify all the references to people or locations in the document which can then be further investigated in light of the known historical evidence,” Barzilay said in a statement.

“These methods of ‘entity recognition’ are commonly used in various text processing applications today and are highly accurate, but the key research question is whether the task is feasible without any training data in the ancient language.”

You can read the full research study here.

Story by Thomas Macaulay

Managing editor

Thomas is the managing editor of TNW. He leads our coverage of European tech and oversees our talented team of writers. Away from work, he e (show all) Thomas is the managing editor of TNW. He leads our coverage of European tech and oversees our talented team of writers. Away from work, he enjoys playing chess (badly) and the guitar (even worse).

Get the TNW newsletter

Get the most important tech news in your inbox each week.

New MIT algorithm automatically deciphers lost languages

Get the TNW newsletter

Also tagged with

UK’s answer to DARPA backs synthetic muscles and e-skin in new robotics project

Surging European defence stocks signal ‘huge potential’ for military tech startups

Discover TNW All Access

‘Sorry, I didn’t get that’: AI misunderstands some people’s words more than others

ASML rebounds from DeepSeek hit, expects AI advance to boost demand for chips