Skip to content

Pc scientist helps protect endangered language for future generations

Gyalrong textbook. Credit score: College of Sheffield

A Chinese language language prone to extinction is being stored alive for future generations with the assistance of Division of Pc Science analysis.

Utilizing pure language processing (NLP)—computational processes designed to know speech and textual content as people can—the Gyalrong language and the wealthy cultural historical past it carries are being preserved.

Gyalrong, which is spoken by a really restricted inhabitants in China’s Sichuan Province, is estimated to this point again over 1,000 years however is now thought to have fewer than 33,000 audio system.

Most native audio system are aged and with many younger folks leaving the villages wherein it’s spoken to hunt work in city areas, fewer and fewer folks have the chance to study the language from elders.

It’s estimated that the decline of the language—which has little in the best way of written information and is taken into account very troublesome to study—will turn into irreversible over the following few a long time.

Xutan Peng, a Ph.D. scholar on the College’s Division of Pc Science, is utilizing his analysis to hurry up the manufacturing of a textbook to show the endangered language to native faculty kids.

“Many individuals say language is the DNA of a tradition,” mentioned Xutan.

“If the language dies the reminiscence of this wealthy tradition is in peril of being misplaced endlessly. Issues comparable to previous tales handed to their kids and grandchildren by elders will likely be no extra, and it will likely be unimaginable for future generations to study the tradition and traditions.”

His approach takes Gyalrong texts and summarizes them into Mandarin utilizing an automatic course of. As such, language documentation work that might take a linguist months or years by immersing themselves within the tradition could be completed way more quickly.

“One technique to think about it’s that there are two libraries, aspect by aspect, with the identical structure and structure however with one solely supplying Mandarin texts, and the opposite Gyalrong,” mentioned Xutan.

“If two related books, masking related material, are within the corresponding location in each libraries and you progress each buildings into one location, you possibly can align the 2 to establish patterns.

“So, so long as we’re capable of grasp sure ceaselessly used phrases, we will use this method to make educated guesses to piece the jigsaw collectively.”

You possibly can learn extra concerning the course of, often called cross-lingual phrase embedding (CLWE), within the papers “Cross-Lingual Phrase Embedding Refinement by ℓ1 Norm Optimization” and “Understanding Linearity of Cross-Lingual Phrase Embedding Mappings.” The approach used on documenting Gyalrong additionally attracts on analysis from Xutan’s earlier paper, “Summarizing Historic Textual content in Fashionable Languages.”

The outcomes of Xutan’s work are already bearing fruit, with a small group of Chinese language faculty kids, whose households can converse at the very least some Gyalrong, studying from and offering suggestions on a textbook. It’s hoped this primary model will likely be adopted by additional volumes as extra information is collected.

Its success has even caught the eye of documentary makers, who’ve featured the story on China Central Tv.

“It is a distinctive and really satisfying undertaking to work on,” Xutan added.

“And though it might be restricted in scope, we’re making an actual impression on society. It additionally suggests a really brilliant future for this kind of approach in serving to to protect endangered languages.”

Xutan plans to discover how the approach might be tailored to assist doc different endangered languages.

Dr. Mark Stevenson, a senior lecturer within the pure language processing analysis group, mentioned, “Endangered languages, like Gyalrong, face an actual threat of extinction. This undertaking exhibits how NLP, together with work carried out inside Sheffield’s NLP analysis group, may also help protect them for future generations.”

Offered by College of Sheffield

quote: Pc scientist helps protect endangered language for future generations (2023, January 12) retrieved 12 January 2023 from

This doc is topic to copyright. Aside from any truthful dealing for the aim of personal examine or analysis, no half could also be reproduced with out the written permission. The content material is supplied for info functions solely.

Leave a Reply

Your email address will not be published. Required fields are marked *