Eight Challenges Indian-Language Wikipedias Need to Overcome · Global Voices
Subhashish Panigrahi

From language input to Unicode standards, Indian-language Wikipedias need a sustained effort from the community. Credit: Johann Dréo CC BY 2.0/Flickr
A version of this post was previously published on The Wire.
Even after a decade of existence, Indian language Wikipedias are not yet known to many Indian language speakers. Wikipedia, the largest available encyclopedia made in the human history, is what it is today because of the hundreds and thousands of volunteer editors. But while native-language Wikipedias are becoming game-changers in other corners of the world, the scenario in India is skewed. Here, from my point of view, are some of the challenges that Indian-language Wikipedias are currently facing.
The language communities of many of the Indian languages are such that many of them do not know how to search for information online, in language typed in their script. Some of these communities even believe that because Google’s home page does not display their script, their language does not exist on the Internet. Google, which starting with five Indian languages as options in its interface, now has has nine Indian languages. But this does not stop a Santali or Manipuri user from searching in Unicode Ol chiki (the script for Santali) or in Unicode Meithei (the script for Manipuri). Google, or any search engine, for that matter, will display anything available in any script on the Internet. But the perceived lack of this very function is keeping many people from connecting to the Internet, and to Wikipedia in particular.
Wikipedia  is written by people like you and me. And everything, from writing to editing, is done by volunteers. And anybody can correct the mistakes and inaccuracies that exist in many Wikipedia articles. While several Indian languages are spoken by millions of people, the Wikipedia editor communities for these languages are very small, with only a handful editors contributing to editing those language versions of Wikipedia. In January this year, Hindi Wikipedia, for instance, had only 89 editors, while the total number of Hindi speakers is over 550 million.
A vast majority of people in India do not know how to type in their own language. There is also little documentation instructing users about language input. Even though many government-run schools in India are seeing a proliferation in computer use and Internet access, native language input and other essential training of basic computing are not widely taught in schools in all states. This is unfortunate, as there is a wide variety of free software for native-language input, and the challenges of typing in Indian languages that existed a few years back are now almost non-existent.
With over 1 billion mobile phone users, India's 15% internet penetration rate will soon start growing at a faster pace. This in turn—along with the tough competition that will compel TSPs to lower data charges—will help many Indians get access to the Internet. If these people are not educated about native language input they will be victims of the English-centric Internet and fail to enjoy the virtues of the former. Many Indians who have smartphones need full Indian language support, and especially built-in input methods, to be able to contribute to Wikipedia in their own languages.
The relative lack of native language content on the Internet is another major factor in the low adoption of Indian language Wikpedias. According to an Internet and Mobile Association of India survey conducted in 2012, over 6% of the population is left behind with regard to joining the online sphere simply because of a scarcity of content in their languages. Take my home state of Odisha, for instance: while the Kerala state government’s official tourism portal is available in Odia and other Indian languages, at the time of writing the Odisha government’s tourism portal had no information in Odia. It is unfortunate that our languages are neglected largely within our own states.
Instead of adopting the Unicode standard, many traditional media outlets continue to use non-standard variants of the ASCII/ISCII script encoding systems  Unicode, a global standard, has been available for Indian languages for almost 25 years now, but most of India's vernacular print media has failed to adopt it. As a result, many popular Indian-language newspapers are unavailable in Unicode on the open Internet.
The majority of the information published on the Internet, and by the Indian government, in particular, is copyrighted. This paywalled garden of copyright restrictions restricts access of this information and prevents people from sharing and learning more. Wikipedia, on the other hand, is distributed under a Creative Commons Share-Alike license that allows anyone to make use of the content, and even distribute it commercially. Opening up information for the masses under free license regime could make it easily accessible to millions of people.
Many people in India cannot read, speak or write, and the country has over 60 million people with some form of hearing impairment. There is a desperate need for a high-quality text-to-speech and speech-to-text engines for people with physical disabilities. These products also be freely available so that those who cannot afford to buy expensive proprietary software like JAWS can contribute to Wikipedia in their languages. Many of the text-to-speech engines available today for Indian languages sound so mechanical that it is difficult for the average speaker to use them.
Subhashish Panigrahi is an educator and free knowledge evangelist, and currently works for Communications, Program Capacity & Learning at Wikimedia Foundation, and Access to Knowledge at the Centre for Internet and Society.  Portions of this article are taken from a speech Subhashish gave at BHASHA: Indian Languages Digital Festival in New Delhi.