Using AI to reclaim Native languages
Imagine putting on a virtual reality headset and entering a world where you can explore communities, like Missoula, except your character, and everyone you interact with, speaks Salish, Cheyenne or Blackfoot.
Imagine having a device like Amazon’s Alexa that understands and speaks exclusively in Indigenous languages.
Or imagine a digital language playground in Facebook’s Metaverse, where programmers create interactive games to enhance Indigenous language learning.
Michael Running Wolf, a Northern Cheyenne man who is earning his Ph.D. in computer science, wants to make these dreams a reality.
Running Wolf grew up in Birney, a town with a population of 150 just south of the Northern Cheyenne Reservation. He spent most of his childhood living without electricity.
Running Wolf can speak some Cheyenne, but he wants Indigenous language learning to be more accessible, immersive and engaging.
And he believes artificial intelligence is the solution.
Running Wolf is one of a handful of researchers worldwide who are studying Indigenous languages and AI. He works with a small team
of linguists and data scientists, and together, they analyze Indigenous languages and work to translate them into something a computer can interpret.
If his team can accomplish this, Running Wolf reasons, then perhaps AI can be used to help revitalize Indigenous languages everywhere.
Alexa, can you speak Cheyenne?
Running Wolf worked for Amazon’s Alexa project for four years and became deeply familiar with the product.
As he worked on the software, he wondered, “Could something like Alexa speak Cheyenne?”
But time and again, Running Wolf said the initiative would die, because “it takes a lot.”
Google Assistant, for example, employs tens of thousands of contractors and the technology requires lots of data, including millions of hours of annotated audio spoken in different languages.
“It’s a huge, monumental effort,” Running Wolf said. “And if you’re looking at Montana tribes, like the Northern Cheyenne, no one has that (data). We’re looking at millions of hours of audio, and at best, we have maybe around 100 or so, and some tribes don’t have anything.”
Aside from the lack of data, it didn’t take long for Running Wolf to run into another barrier — the nature of Indigenous languages themselves.
European languages, Running Wolf explained, differ from most Indigenous languages in syntax and morphology. Indigenous languages in Montana, Running Wolf said, are generally polysynthetic, meaning words are composed of many word parts with independent meanings. In English, which is considered an isolated language, the words “the red cars” are three different words with different meanings. But in polysynthetic languages, “the red cars” can be summed up in one word.
The highly polysynthetic nature of Indigenous languages, Running Wolf said, “is an unsolved challenge in artificial intelligence.”
The problem relates to automatic speech recognition technology. Running Wolf explained that when one speaks to a voice recognition device, like Alexa or Siri, the technology usually uploads the audio file, converts the file to text, then interprets the meaning of the text and then responds by doing a task or answering a question.
The current technology assumes all languages are a type of English. It’s common, he said, for voice AI to translate a French command into English, interpret the command in English and then translate the response back to French. This process works for languages like Spanish, French and German, which have similar morphologies to English, but it doesn’t work for Indigenous languages.
Running Wolf remains determined to solve these problems. His research now focuses on the Wakashan language family, which he said are the “most highly polysynthetic.”
If Running Wolf can figure out how to get these languages to communicate with AI, his efforts could open doors for revitalizing Indigenous languages across the country.
Why does this matter?
From the 1800s to the 1970s, many Native children were forced to attend government-funded Christian boarding schools, where they were emotionally, physically and sexually abused. The explicit mission of these schools was cultural genocide, historically referred to as "kill the Indian, save the child," and tribes have suffered language and culture loss as a result.
“It was illegal to speak our language, and our culture was underground for a long time,” Running Wolf said. “It’s typical for children of my generation, for our grandparents to restrict us from speaking our language because they didn’t want us to suffer the humiliation of having the language ripped from our throats.”
Running Wolf said that reclaiming Indigenous languages can help Native people and their communities connect with culture and maintain their identity.
But there are other benefits as well. Running Wolf said that people, statistically, who speak more than one language, on average, make more money than those who speak only one language. He also said that people who are bilingual are scientifically happier and make healthier life choices. He said learning languages is a useful life skill, and the same skillsets can be used when learning to play music or do math.
While Running Wolf envisions a virtual reality where people can immerse themselves entirely in Native languages, he also envisions a world where Native communities exercise sovereignty over their language reclamation efforts.
He thinks AI language programs will inspire community members to learn how to program. He imagines that tribal governments will want to invest in AI. And he thinks a growing body of people will spend time learning lucrative technology skills as well as their Native languages.
“Sometimes, anthropologists want to preserve Native languages so other anthropologists can learn it,” Running Wolf said. “(This work) is centered around growing the language and reinforcing the language.”
This story originally appeared on Missoulian.com