Editor's note: From April 13-19 2021, Blessing Sibanda will be hosting the @DigiAfricanLang rotating Twitter account, which explores how technology can be used to revitalize African languages. Read more about the campaign here.
Shona, an official language of Zimbabwe, is one of the most spoken Bantu languages, with an estimated 10.8 million speakers. But while there are a number of established historical and literary resources in Shona, the visibility of the language online is far from encouraging.
Blessing Sibanda, a software engineer, is at the forefront of the effort to have the Shona language recognized in the field of natural language processing, the area of artificial intelligence concerned with making computers “understand” and decode written and spoken language.
Adéṣinà Ọmọ Yoòbá of Rising Voices spoke with Blessing to find out more about his work on gaining recognition for the Shona language in machine learning contexts and online.
Adéṣinà Ọmọ Yoòbá (AOY): Could you please tell us about yourself?
Blessing Kudzaishe Sibanda (BKS): My name is Blessing Kudzaishe Sibanda. I am a software engineer and researcher. I am currently pursuing a Master’s degree in Computer Science at Namibia University of Science and Technology. My research interests are in Computer Vision and natural language processing (NLP).
I am part of Masakhane, where I am conducting NLP research on low-resource languages, specifically concentrating on my native language Shona. I have been particularly involved in the machine translation and named-entity recognition (NER) initiatives for Shona.
AOY: What is the current status of your language offline, as well as online?
BKS: Shona is very active and vibrant offline. According to the website mustgo, it is one of the most spoken Bantu languages, with an estimated 10.7 million speakers. It has a rich history and literature with a lot of a documentation. Although it has great resources offline, there still a lot potential to increase its presence online.
AOY: What do you think are the biggest challenges facing your language community with regard to digital communications or creating digital content in their mother language?
BKS: I think in most African countries, inclusive of my country, the biggest challenge is that of having the language of the former colonisers as the first language. Although this has been of advantage when communicating on an international stage, it has made our own native languages not be appreciated as they should. This has to some extent made those who speak for example English be thought of as more educated or of a higher social class whilst looking down upon our own native languages. So, you find people preferring the foreign language more than the native one.
With this, I think the biggest issue which needs to be dealt with is the perspective of the native language, it needs to be shifted to make it more appealing to use whether in business or on the internet, so that people can better embrace it. From a technological standpoint, it is lagging behind in terms of tools and software which make it easier and accessible to use online. This is mostly because of the lack of resources which can be used to produce those tools. Tools and services like dictionaries, translation, keyboards (for languages with diacritics), speech, amongst others, will pay off in the long run as they make it easier to communicate with our own language online.
AOY: In your opinion, what are some of the steps that could be taken in the short term to encourage increased use of the language on the internet?
BKS: I believe the integration of local languages into everyday online tools and technologies that we use can encourage the increased use of the language on the internet. For example, with Google Search, if you could switch the main language to Shona and everything gets displayed in Shona. This should also be the option for social media sites, where most of the languages from Africa are not covered for personalising display language.
Another example is with Wikipedia. People can contribute pages using their own native languages which in a way increases the discoverability of these languages. I believe another step, as earlier noted, is investment in research and products which utilize native languages.
From a technological perspective, there is a lot of ground to cover compared to the most widely spoken languages to be at par in terms of research and products. This is an impediment, as we cannot fully express our own language whilst making those unfamiliar with it able to understand it, as the internet has made the world a global village. With these concerted efforts I believe the use of native languages online can be encouraged.
AOY: What is your primary motivation for working to see your language and culture available on the internet?
BKS: My motivation is to see a lot of activity when it comes to NLP technologies for Shona, and to have tools that are comparable in performance to the widely adopted languages on the internet. I believe competent tools will inevitably encourage the use of the language on the internet, and for this to happen the native speakers should be at the forefront of researching and producing these tools as they better understand their own language.
This is why I adore and am part of the Masakhane Community. It is bringing the native speakers to work and research on their own languages. It is lowering the barrier of entry into working with these technologies, providing a community and mentors to help along this journey of producing these tools. In addition to this, it is publishing research which will make it easier for others to continue this work and make sure native languages are represented in technology.