Pakistan: Internet and the challenge of language · Global Voices
Ivan Sigal

This post is part of our special coverage Languages and the Internet.
Pakistan today would seem primed for rapid growth in internet use. The country has had explosive growth of FM radio, satellite and cable TV set in motion by regulatory changes that allow non-state ownership of  mass media. Cell phone use has also skyrocketed, with over 90 million subscribers. With a growing middle class that numbers some 30-40 million in a country of some 180 million people, Internet use should also see similar growth.
However, there are several constraints that mitigate that expansion, both structural, as in chronic electricity shortages, and social, particularly focused on language. Literacy hovers at around 50% in Pakistan, but while most people understand Urdu, Pakistan’s national language, less that 10% of the population speaks and writes it as a native. Provincial languages such as Sindhi, Punjabi, Pashto, and Balochi, as well regional languages such as Seraiki and Kashmiri are native languages for the majority of the population, and English is the official language of governance.
This language fragmentation has consequences for internet use. No one Pakistani language effectively serves both the reading and content creation needs of Pakistan’s netizens. As a consequence, English remains the popular choice online. In an interview, Adnan Rehmat of Intermedia Pakistan says that English is an “aspirational” language, a marker for education and access to resources, and because English provides access to a global linguistic community. Additionally, several regional Pakistani languages such as Punjabi are primarily oral languages, without strong literary cultures.
Fouad Bajwa, writing on Internet’s Governance, describes the problem further:
A key pressing issue with relevance to both the local Internet and Mobile Technology scenario in Pakistan has been availability of local content and making the local content widely accessible to the community at large across Pakistan and the entire world using a variety of currently available technology platforms.
There have been few concerted efforts to create Unicode fonts for Pakistani language scripts. Nastaliq, the popular font for Urdu, is not yet widely adapted in Unicode. Online writing in the main either uses an Arabic font, as with the relatively popular BBC and Google fonts, or it uses image files pasted into text.
There is not yet a broadly accepted font in use for either mass media of citizen media production. Many mainstream media still use image files, which requires that the text be composed on another platform, and discourages hyperlinking, as with a recent issue of the Daily Jang online.
Screenshot of Daily Jang ePaper, May 2010
The Pakistani government has provided little policy guidance for language use. In an interview Ahmad Shahzad of Bytes for All notes that the National Language Authority of Pakistan lacks resources, knowledge of digital issues, and a sense of urgency or policy priorities for Pakistani language expansion online.
There are a number of projects that have been working to fix this problem over the past decade. Perhaps the most comprehensive comes out of the Centre for Research in Urdu Language Processing, at Lahore’s National University of Computer & Emerging Sciences (CRULP). The Centre’s director, Professor Sarmad Hussain, has been working to support Nastaliq in Unicode since 2002. They describe their objective to ”conduct research for the evolution of computational models of Urdu and Pakistan’s other regional languages.” Their projects develop standard character sets, localize popular software and online applications, such as Microsoft Word, Firefox, and Open Office, and script processing for fonts that can support all Pakistani languages.
They are also working on optical character recognition and speech processing  tools such as screen readers for the illiterate and blind users, and language processing tools such as spell checkers and machine translation. CMS platforms in Nastaliq, as well as mobile scripts.
Additionally, CRULP’s PAN Localization project is working to develop local language computing capacity in a dozen Asian languages, including Urdu, Pastho, and Bangla. The project seeks to develop tools to facilitate the use localization of advanced applications.
These scripts and their wider promotion, as well as the availability of content management systems in Urdu and language processing tools, has gone some way to making Urdu a functional language of content creation.
Other tools now available facilitate the shift from English to Urdu, including Google’s Urdu transliteration tool and the Dynamic Language Tools Bookmarklet, which supports transliteration of Urdu to both English and Hindi. Syed Ghulam Akbar, the bookmarklet’s creator, describes his motivation in a post on the Pakistani science blog STEP:
The main inspiration behind this tool development was not actually Urdu writing. In fact, there are many existing tools and applications which let users type Urdu either using a special keyboard layout or by using roman script transliteration. What actually inspired me to develop this tool was to provide a way to easily convert the roman content on all the existing web-pages to Urdu script so that it is more readable.
Together, the advancement of scripts, applications, and platforms in Urdu will go some way to advancing a culture of online production in Urdu. The relative lag in their availability does, however, highlight the general sense that English will continue to be the language of choice for many in Pakistan’s online world.
This lag can be addressed in several ways, including wide promotion of available tools and their application, support for both mass media and citizen media communities to discover, learn about, and implement creative use of these tools, and support to build bridges and networks among communities. For this reason, Fouad Bajwa is seeking to build an Online Urdu Encyclopedia:
It will create a converged environment overtime for presenting updated knowledge that is usable through reading, listening and visuals for both social and economic awareness, education, knowledge application in various fields, higher education, competitive exams, expert resources and endless Urdu language options.
At present there is no Urdu Wikipedia community, and few Urdu-language blog aggregators, such as http://urdublogs.co.cc/, capacity among mainstream media to produce searchable-text, Unicode-based online media, and a lack of mobile telephony platforms and applications for Urdu.
This post is part of our special coverage Languages and the Internet.