Voice data and donations for open-source software to understand and speak Catalan

10 February 2020
Assistent de veu

The major internet companies marginalise minority languages, but open-source software offers the opportunity of extending the use of Catalan in speech technologies.

It is becoming increasingly commonplace for us to speak to machines. We say “OK Google” to our mobile phones and ask Alexa or Siri any doubts that might come to mind. They are all devices that are here to stay, to change our lives and, let’s not be fooled, to generate revenues for the major technology companies offering them. The more often we use them, the more voice data we give them so that they can be improved and, in turn, to improve the product that the so-called GAFA companies (Google, Amazon, Facebook, Apple and Microsoft) will subsequently sell us. The motivation is purely lucrative and, as such, works against languages such as Catalan. This is why Siri, for example, can only be spoken to in around twenty languages and another few dialects. In view of this, an initiative has arisen that involves voice donations and validations as Common Voice, promoted by the Mozilla Foundation and dedicated to open-source software. Alexander Klepel, part of the Mozilla project, explains the reasons why they promoted it to puntCAT.

Klepel is explicit: “Machines don’t understand everyone, they only understand a fraction of us. This means that only a fraction of their potential users can benefit from this change in technology”. This is not a minor matter, argues the spokesperson for Common Voice: these machines can be very useful for illiterate people or those with functional diversities, people who are unable to touch a screen, for example. And the market leaves all these audiences out if they do not speak one of the languages offered by the software. This is why the Common Voice project, a tool to democratise speech technologies, was created in June 2017. The proposal by Mozilla involves providing a tagged, public-access audio database (transcribed in short phrases) that anyone can use to train speech applications. This is particularly useful for languages spoken by few people that are not offered by the GAFA companies.

At present, Common Voice includes 4,200 hours of recordings in 40 different languages, twice what Siri offers, of which eleven have been included since June 2019: Abkhazian, Arabic, Hong Kong Cantonese, Indonesian, Japanese, Latvian, Portuguese, Romansh, Tamil, Votic, and even the international auxiliary language. The result of contributions from almost 259,000 people from around the world. In the words of Klepel himself, Catalan is among the five languages with most contributions. Through this project, says the spokesperson for the initiative, Mozilla seeks to contribute toward an ecosystem of innovations in the field of the most diverse speech technologies. Among the more tangible results promoted is DeepSpeech, an application that converts speech into text and text into speech by automated training.

Translator Pelin Doğan, researcher Özgür Güneş Öztürk Okumuş, data scientist Federica Capranico, data engineer Baybars Külebi, and computational linguists doctor Alp Öktem are all members of the language services cooperative Col·lectivat. A cooperate devoted to translation, the training of other Turkish translators to translates works in Catalan, and to Turkish classes, and it also works on developing a speech corpus in Catalan to improve open-source speech technologies. Col·lectivat and Softcatalà met in 2017 at the Social and Solidarity Economy Fair, which led to the proposal to create a voice recognition system in our language. The former were responsible for the technological part and for entering the rules of Catalan. Once this had been done, they trained the software with speech data from the public TV station TV3. They then used data generated during the plenary sessions of the Catalan Parliament, thanks to help by the Generalitat de Catalunya department of culture.

Col·lectivat is now working on software for speech synthesis or the artificial reproduction of human speech in Catalan. This will mean that the product can be integrated into open-source applications such as Open Street Map. For now, similar applications, such as Google Maps by the Silicon Valley giant, do not offer this service, which affects other products from the company. The product in question is based on an existing proposals, a speech synthesis system in Catalan trained using open-source data. The project was subsidised by Generalitat in 2008 and 2009, and was implemented by Universitat Politècnica de Catalunya (UPC). The fact that this software is over a decade old means that it sounds very robotic. Col·lectivat has applied new methods based on the use of neural networks to make it sound much more natural.

The cooperative believes open-source technologies to be an essential part of technological sovereignty. Baybars Külebi explains: “The business model of the GAFA companies involves collecting data from people and using it for a profit. Col·lectivat and, in general, the social and solidarity economy seeks to support the technological sovereignty against the internet giants”. A battle in which we are not alone, as can be seen by the Common Voice project.