The Welsh Language’s Digital Toolbox · Global Voices
Delyth Prys

This post is part of a special Global Voices series on Welsh language and digital media in collaboration with Hacio'r Iaith.
When I go to international meetings nowadays I’m amazed at how English has become the general language of communication. It’s become everyone’s second language, and in some domains it’s difficult for any other language to be heard. It’s especially hard for minority languages such as Welsh.
One domain where English has been dominant from the beginning is computing and the Internet. But new digital media also provide opportunities for all languages to be used, heard and seen.
The author of this post, Delyth Prys (pictured right, with the Welsh Minister for Education) is the Head of the Language Technology Unit (@techiaith) based at Canolfan Bedwyr, Bangor University, Wales. An experienced terminologist, she leads a mixed team of linguists and software developers working with Welsh in a multilingual environment.
Developing language resources
Part of the challenge for Welsh has been to develop language resources so that people can use Welsh easily on the web and in digital media. Major languages get plenty of funding, but minor languages have to work smarter, recycle resources, and make a little go a long way.
We started working on Welsh terminology in 1993, and developed a Welsh spelling and grammar checker early on. We’ve reused a lot of the resources developed in the early days, building them up and taking them with us as the technology got better and the internet became an important part of our lives.
One of the key early decisions was to use a computer database to store and develop terminology, rather than work in the traditional way on paper. This meant that we’ve been able to keep on developing dictionaries and publish them quickly and cheaply, on CD, on the web and more recently as mobile apps.
The latest dictionary doesn’t have a paper edition at all, and Termiadur Addysg is evolving to include online games and other materials for students, teachers and parents. It has also been the starting point for Termau.org, a national terminology portal, where over twenty terminology dictionaries can be accessed together.
The recycling doesn't end there though. We used the wordlists to improve our spelling and grammar checkers, and to produce these in multiple formats, for open source and proprietary word processing. Our own Cysgliad software has proved very popular, including both Cysill (spelling and grammar checker) and Cysgeir (compendium of electronic dictionaries).
The free online version of Cysill is widely used by bloggers and tweeters, as well as school kids, students, and even journalists. Such tools are a great help to writers in minority languages who often don’t have much confidence in writing in their native tongue because it is excluded from so much of official and public life.
The 1990s were good years for the Welsh language; there was a new Welsh Language Act in 1993, improving greatly the status of Welsh, and in 1998 the Welsh National Assembly was established, giving Wales a limited form of self government.
A third important event happened in 1995, with the publication of the Welsh Academy Dictionary. This comprehensive English to Welsh dictionary has been crucial to helping translators and administrators deliver bilingual policies in Wales. However, revising and updating it in the traditional way would have been slow and expensive. Instead we digitized it, creating both an online version which is freely available on the web and a master database where it will be easy to make changes and additions.
WISPR Speech Technology
Speech technology & translation
Speech technology has been another area where we’ve tried to develop resources for Welsh. It was difficult to get commercial backing for such a venture, because of the small size of the Welsh market. However, by working together with Irish colleagues in three Dublin universities, and with the help of an European grant, we were able to develop resources for both Welsh and Irish.
We called this project WISPR (Welsh and Irish Speech Processing Resources) and it led to useful text-to-speech resources for both Welsh and Irish. The Welsh resources have been freely licensed so that they can be incorporated in software developed by other providers.  For example, Ivona, a Polish text-to-speech company has used these resources to help them create natural sounding Welsh voices, and RNIB Cymru, an association supporting blind and partially sighted people, is working with them to develop the technology to enhance web accessibility for blind and partially sighted Welsh speakers.
More recently we’ve started looking at translation technology between English and Welsh. One new project we are taking part in is a collaboration between the National Library of Wales and other institutions to put archives from the First World War on the web. Some of this material is in Welsh, and up until now it’s been difficult for non-Welsh speakers to use it. We’re developing aids to do cross-lingual searches, so if you look up a particular battle, for example, it will show you material about it written in either language. We will also include gist translation from Welsh to English and an embedded bilingual dictionary. We foresee many other uses for these aids so people will feel that they can use their own language on the web and still reach a wider audience.
There is much more we would like to do to enable Welsh to flourish in the new multimedia world. With new technology moving so fast we need to make certain that Welsh is not left behind. Getting funding and people with the right skills within our community is still challenging, but sharing and reusing resources enables a little to go a long way. Being part of a global community helps all minority languages, and the next challenge will be to develop closer cooperation with other communities in a similar position.