Background Content

Language and Culture (13-17)

How AI can help Indigenous language revitalization, and why data sovereignty is important

August 12, 2024

‘It’s just going to be like a pencil,’ says one AI language camp co-ordinator

Man presenting.
Michael Running Wolf is a former engineer for Amazon’s Alexa and co-founder of Lakota AI code camp. (Submitted by Michael Running Wolf)

CBC Indigenous: Indigenous language experts working in computer science say Artificial Intelligence is a useful tool in language revitalization but communities must prioritize the ownership of their data.

“It’s just going to be like a pencil. It’s useful but it’s not going to save our language,” said Michael Running Wolf, a former engineer for Amazon’s Alexa and co-founder of Lakota AI Code camp, a summer program for high school students where they gain experience developing mobile apps that incorporate Indigenous knowledge and methods.

Running Wolf is Lakota, Cheyenne and Blackfeet and grew up on the Northern Cheyenne reservation in Montana. Despite the low-tech home he was raised in — often without running water or electricity — his mother, who engineered microchips for Hewlett-Packard, taught him math and physics by kerosene lamp.

“It was their [parents] perspective that technology was not incompatible with Indigenous ways of knowing,” he said.

This, along with being surrounded by speakers of his traditional language while growing up, encouraged his current work using AI to help support Indigenous language revitalization.

There are limitations, Running Wolf said, like sparse data and the polysynthetic nature of many Indigenous languages.

An efficient AI, for example, can take 50,000 hours of English to create automatic speech recognition. Most Indigenous languages have so few speakers there is insufficient data to train AI, he said, and AI cannot recognize or understand things it’s never seen before and requires information to replicate.

Also, languages such as Cheyenne and Blackfeet are polysynthetic and fusional, meaning prefixes and suffixes blend into words so the roots are not apparent.

He said he intends to overcome these limitations by working with communities to develop a manageable data set that will train AI.

“We generate 500 phrases in Makah and Kwak’wala, defined by the community and also the rules of the language, obviously, and we trained the AI to recognize those 500 phrases and those 500 phrases are used in curricula,” he said.

“So the goal here is that, when they go to a classroom, they get their exercise in person and then they can go home and practise using the AI.”

Running Wolf emphasized the importance of the community’s agency in their language revitalization, particularly when it comes to AI.

“We have to have our own engineers. We need to have our own computer scientists using the software … We need to have sovereignty over our own data, set the terms and that’s the only way to build this AI,” he said.

He pointed to a recent dispute between the Standing Rock Sioux and a corporation that copyrighted their language materials. He said he’s mindful of the reciprocal nature of giving back to the communities he’s working with and said they’re working with lawyers to create contracts that ensure any data collected remains with the community.

Running Wolf said AI requires a lot of data to upgrade and there are companies and academics who want a community’s data because there’s potential revenue to sell to companies like Google, Microsoft and Meta.

‘Sweat equity’

Robbie Jimerson, who has a PhD in computing and information sciences and speaks Seneca, developed a Seneca and Oneida speech recognition system as part of his dissertation. He agrees that people need to be guardians of their own data and said he is grateful for his time spent listening to the first language speakers in his community.

Man getting PhD.
As a part of his dissertation, Robbie Jimerson developed a Seneca and Oneida speech recognition system. (Submitted by Shannon Hill)

“To me, there’s nothing better than, you know, having a conversation in Seneca with somebody,” Jimerson said.

“There was a lot of sweat equity that went into it to train these models. You need a data set, right? So who’s going to create those data sets…. For me, being a speaker, I was able to do both of those things.”

ABOUT THE AUTHOR

Candace Maracle, Reporter

Candace Maracle is Wolf Clan from Tyendinaga Mohawk Territory. She has a master’s degree in journalism from Toronto Metropolitan University. She is a laureate of The Hnatyshyn Foundation REVEAL Indigenous Art Award. Her latest film, a micro short, Lyed Corn with Ash (Wa’kenenhstóhare’) is completely in the Kanien’kéha language.

RELATED STORIES