Notícies > The Start of Catalan Internet: From Softcatalà to the Google Assistant in Catalan

The Start of Catalan Internet: From Softcatalà to the Google Assistant in Catalan

Platforms such as Apple and Google prepare the Catalan versions of their voice assistants, and Google Assistant is already understands Catalan. Are we late?

23 October 2020
L'ús del català a Internet ha crescut molt els darrers anys

Google i Apple ultimen els seus assistents de veu, en català

Catalan has been alive on the internet since it reached Catalonia in 1995. Despite the fact that the initial software could not be configured in our languages. Twenty years later, platforms such as Apple and Google prepare the Catalan versions of their voice assistants, while Google Assistant is already understands Catalan. Are we late?

According to Web Technology Services (W3Techs) –which provides information on internet usage around the world– less than 0.1 of all websites are in Catalan. We are ranked 40 (of 180) in the most usual languages on the internet. Don’t worry! Don’t pull your hair out because of this percentage. Remember that over 60% of all websites in the world are in English. Apart from this, given the ten million people who speak Catalan, there are more websites in our language than in the languages spoken by many people, such as Tagalog (from the Philippines) or Urdu (which is spoken in India and Pakistan).

The Independent web creators of websites in Catalan (WICCAC) indicate on their monthly barometer that the global percentage of Catalan usage on the internet is high, standing at 65.79%. Jordi Mas, founder member of Softcatalà –the digital community fostering the use of Catalan on the internet– summarises the current situation of Catalan on the internet as follows: “The Catalan story is one of success. In comparison with most minority languages, without the support of a State, Catalan is extremely healthy on the internet”.

63.8% of the population who have consulted the internet have visited websites in Catalan, 92.2 % have done so in Spanish, and 42.2 % in other languages.

The major platforms have progressively adopted the use of Catalan on their interfaces. Of the 10 most visited websites on the internet, Google, YouTube, Facebook, Twitter and Wikipedia are adapted to Catalan.

Furthermore, according to data from the 2019 Survey of cultural participation in Catalonia by the Department of Culture, 63.8% of the population who have consulted the internet have visited websites in Catalan, 92.2 % have done so in Spanish, and 42.2 % in other languages.

A COMMUNITY OF USERS AND CREATORS

The data, numbers and percentages are cold. They help us understand that, despite representing a relatively small group of speakers within the network of networks, the health and use of Catalan never stops improving year after year. But we must take a look back to the third era of the internet, which wasn’t too long ago, to find out where it all comes from and how our language started on the internet.

Back to the turn of the 21st century. For the first Catalan wikipedian and current digital coordinator at Institut Ramon Llull, Àlex Hinojo, what has evolved most significantly since then are the communities and the use of the platforms: “The Catalan society, which has always had extremely active civic values, was progressing gradually and was creating content, blogs, podcasts and platforms in Catalan, which is something that didn’t happen on the institutional websites.”

Only 0.5% of the entire Netflix catalogue is recorded, dubbed or subtitled in Catalan, according to reports by the Consell de l’Audiovisual de Catalunya (CAC).

A good example is the news portal VilaWeb, which started in 1996 and became the first pioneering information and blogging project in Catalan on the internet. Jordi Mas remembers when Softcatalà was created in 1997: “It was a time when a lot of initiatives were coming up to standardise the use of Catalan on the internet. The Catalan digital community has always been extremely active, people immediately became excited about the blogs, Wikipedia, and freeware. If we’re given the tools that we’ll grab them with both hands.”

At the end of the last century, before the social media took off, you could already do everything on the internet. There was a format scenario for many communities, with little control and with a great deal of enthusiasm for making the most of the possibilities of the internet. What didn’t exist was too much institutional support. So with the aim of progressing, the Softcatalà project started in 1997. This is a non-profit organisation that has developed and translated around 150 programs into Catalan over 20 years: “We started translating freeware into Catalan. There was four of us and we spent one year translating”, explains Mas.

A giant feat that still goes on to this day. For Àlex Hinojo, the great success of Softcatalà lies beyond the 150 programs translated: “It has been an essential group in creating the awareness that you can, and should have, an explorer, browser, and office package in Catalan. To raise awareness -says Àlex Hinojo- of the linguistic rights regarding the internet [and software] of the Catalan-speaking community”.

La tasca que Softcatalà duu a terme amb el català a Internet va ser reconeguda per la Generalitat amb el Premi Nacional d’Internet 2004. FONT: Wikimedia Commons.

Over the years, mostly due to the appearance of the social networks, and regardless of the fact that the platforms were not in Catalan, people became use to posting their contents in Catalan: “Commercially and institutionally, it was harder for the platforms to adapt their infrastructure to adopt Catalan”. Its use is gradually becoming mainstream in more commercial contents such as video. In the audiovisual sector, in the case of platforms such as Netflix, the situation is terrible: only 0.5% of the entire catalogue is recorded, dubbed or subtitled in Catalan, as indicated by the Consell de l’Audiovisual de Catalunya (CAC) in a report published last May.

The problem with these major platforms and the use of Catalan is the regulatory body. If they’re not forced to do so, then it’s more difficult. In Denmark, Netflix subtitled and dubs everything into Danish because they have to do this to enter the market, and because the regulatory body forced them to do so. This isn’t the case in Catalonia. Because Catalan isn’t an official language of Europe, the EU does not regulate its existence on these platforms.

“Apart from this”, indicates Àlex Hinojo, “the financial cost of dubbing an audiovisual product is higher than that of translating a book into Catalan, for example. But the problem with these platforms isn’t the market, its the regulations”. In short, if instead of 24 languages in the European Union there were 25 (or more, which would be most desirable), the distributor would be forced to subtitle or dub the contents into Catalan in order to operate in the Catalan-speaking region. Until this is the case, the presence of products in Catalan on these major platforms will remain a question of political will. Of course, as Àlex Hinojo indicates, “recognising the minority languages of the EU is not a neutral matter.”

Let’s go back to the monthly barometer by the WICCAC. It shows the evolution, month by month since 2002, of the use of Catalan by a series of websites on the internet, classified by sector. A quick analysis provides two important pieces of data. The websites in the cultural, educational or institutional sectors use mostly Catalan. However, websites from commercial sectors such as electrical appliances, perfumery or cars barely amount to 17% in their use of Catalan. Why is that? “I think that Catalan consumers are sometimes not demanding enough and we don’t discriminate positively enough on the internet as we would in a bar or a shop when we aren’t served or understood in Catalan”, replies Àlex Hinojo-. There’s still a lot left to do, and consumer regulations should also include the right to be served in Catalan on line”.

SIRI AND ALEXA WILL SPEAK CATALAN

The major platforms are adopting their interfaces to the presence of Catalan. Companies such as Apple and Google have already announced that they are working on it, and Google Assistant already understands Catalan, although it doesn’t yet speak it. It might seem funny today, but for these giants to be preparing the voice recognition engine in Catalan is a significant step.

It is increasingly obvious that voice will be the common interface for communicating through technology and the internet: “When the use of voice assistants is more generalised, we will see the significance”, says Àlex Hinojo. All the neuronal and deep learning technologies being developed for robot-assisted and translation-assisted learning lead one to believe that the evolution being made by this technology in languages such as English, Spanish or Russian could be applied to Catalan. Then everything will depend on the political decision whether or not to adopt it. Technology isn’t and should be the stumbling block”.

But not everything has to go through the major companies. Along these lines, the Common Voice project on which Softcatalà is working with Mozilla to create a voice corpus in Catalan is worth noting: “We have already recorded over 700 hours of Catalan speakers of different genders, ages and accents”, explains Jordi Mas. Creating a voice recognition system particularly requires data, and this is extremely expensive to access. The #CommonVoiceCAT project, which has already used Facebook, records thousands of voice clips collaboratively, which are then reviewed anonymously. Participation remains open and over 5,100 people are already taking part.

An entire generation who are now 15 or 20 years old we born and have grown up with Catalan software, and they see this as normal.

Nationwide, only 3% of all texts on the internet are in Catalan, which should be between 15% and 18%, depending on the language use. The reason for this is a minor detail of which we are unaware: people have their devices configured in Spanish. If you use your voice or your keypad to search for the name of a person or a place and you have your mobile phone or Google configured in Spanish, the result will be Wikipedia in Spanish. This restricts the number of websites in Catalan unless access is from a platform configured in Catalan.

Whether typing or speaking, use of Catalan on the internet and its evolution over the past twenty years responds to the natural fact of expressing oneself and receiving knowledge in one’s mother tongue, although it is also a case of activism to a certain extent: “As Maria del Carme Junyent said, ‘Catalan is the marked language’”, quotes Àlex Hinojo, “it’s as if we always have to do a bit more to speak in our language. And we do, because we have this language-based activism built in. It would be great if it weren’t a marked language, and if speaking Catalan were natural and not a question of activism. In the street and on the internet”.

As can be seen, the way of producing software and creating content has evolved a great deal over the past twenty years, and software is now more for mobile apps. However, thanks to the work of the projects mentioned in this article and many others, we now have a collective awareness of the normalised use of Catalan on the internet and, even more importantly: the entire generation aged from 15 to 25 years old has grown up with software in Catalan and they see this as normal.

Paraules clau: .

We notify you when we publish content of your interest

Indicate which topics interest you and you will receive a weekly summary of the published content.

Notícies relacionades