Automatic Speech Recognition Project Seeks to Revitalize the Quechua Languages

Screen capture from a YouTube video, published by engineer Luis Camacho Caballero.

Kuélap isn't just an important Peruvian pre-Inca archaeological site of the Chachapoya culture located Amazonas region. It is also the name of a recollection tool for the project QuechuaASR that aims to create an automatic speech recognition system, or ASR, for the Quechua language.

The Quechua language is a whole family of languages spoken by indigenous populations living mainly in the South American Andes, and it has been considered in danger by organizations like UNESCO. The actual number of speakers is difficult to calculate, and the wide domination of Spanish in the region, specially in formal education make it difficult for Quechua speakers to develop their language. The different ways in which the indigenous populations suffer from discrimination are among the most important causes of vulnerability. Some parents avoid teaching Quechua to their children to ensure they can integrate into society, and others who speak it can lose it when they migrate to big urban areas. As recounted by Lorenzo Colque Arias, president of the Academy of Quechua Language in Arequipa:

El habitante arequipeño es muy agresivo cuando escucha a una persona hablar en quechua, lo margina, lo discrimina, y lo peor de todo es que esa misma persona sabe hablar y entiende perfectamente el idioma, es un migrante ya radicado en la ciudad y ahora ya discrimina.

The inhabitants of Arequipa become very aggressive when they hear someone speaking Quechua. They marginalize or discriminate against [those speaking Quechua]. The worst part is that those same people [who reject Quechua speakers] speak and understand the language perfectly. They're migrants residing in the city, and they're now the ones who discriminate.

Luis Camacho is the electronic engineer behind this project. It was driven by his sense of urgency in preventing certain indigenous languages from disappearing entirely before the end of this century. He expressed this on his Facebook page, called Atuq Kamachikuq (“atuq” means fox in Quechua):

Voy detrás de mi mayor sueño: la portabilidad computacional de todas las lenguas andino amazónicas.

I am following my biggest dream: the computational portability of all the Andean Amazonian languages.

In a 2015 post on the same social network, Luis Camacho asked Quechua speakers to participate in recording at least a hundred thousand words, all spoken by at least one hundred people. Moreover, the hundred people in question had to be native Quechua speakers, rather than people who had learned Quechua as a second language.

To achieve this, there was an open call for volunteers who are native speakers of Quechua, regardless of where they live. The first course of action is for volunteers to read the texts compiled.

Secondly, the volunteers are responsible for transcribing the audio. Finally, the focus groups convene with a group of people to talk about different topics of everyday life, and with that they record the audio which will later be transcribed.

The content of the recordings is not the main subject of the research. The central point will be compiling the lexicon as a sort of dictionary of voices. The research seeks to record the existing lexicon of indigenous languages and build a database that can be implemented in the development of computer tools.

Global Voices spoke briefly with Luis Camacho about his project’s progress.

GV: What's the status of the project so far?

Luis Camacho (LC): Ya hemos reunido 100 horas de corpus de voz y texto alineados a nivel de frases. Esto lo hemos conseguido gracias a donaciones de audio de empresas de radio difusión del sur del Perú y también gracias a la contribución de más de 1000 voluntarios.

Luis Camacho (LC): We have already compiled 100 hours of voice [recordings] and [transcribed] text aligned at the sentence level. We've done this thanks to audio donations from radio broadcasting companies in southern Peru and also thanks to the contributions of more than 1000 volunteers.

GV: What is the ultimate goal of the project?

LC: El traductor automático es la meta final. Actualmente, estamos trabajando en la primera etapa, que es el conversor de voz a texto. Estamos comprometidos a lanzar esto a comienzos de 2018 un par de videos.

LC: The automatic translator is the ultimate goal. We are currently working on the first stage, which is the voice-to-text converter. We are committed to launching a couple of videos in early 2018.

GV: What plans do you have after this?

LC: Continuar hasta terminar el traductor. También espero este año con la recopilación de corpus de otros idiomas, aymara y ashaninka en primer lugar. Entre mis planes a largo plazo están realizar portabilidad computacional completa de la mayor parte de nuestros idiomas e incluso de algunos otros idiomas sudamericanos. Para eso se necesita financiación, y estoy en búsqueda permanente de fondos.

LC: To keep going until the translator is finished. I also hope to start building a compilation of other languages, firstly Aymara and Ashaninka. My long-term plans include full computational portability of most of our languages and even some other South American languages. Funding is needed for that, and I am constantly seeking funds.

As if all of that weren't enough, Camacho has also proposed to develop an automatic translator of Quechua/Aymara to Spanish, English and Chinese, and vice versa. In this Spanish-language video, he explains how the Quechua audio transcription tool works.

To be part of the project, you can e-mail engineer Luis Camacho at

Start the conversation

Authors, please log in »


  • All comments are reviewed by a moderator. Do not submit your comment more than once or it may be identified as spam.
  • Please treat others with respect. Comments containing hate speech, obscenity, and personal attacks will not be approved.