Linguistics

Estonian and big tech: future-proofing small languages

In a bold effort to secure the future of its language, Estonia recently shared nearly four billion words of linguistic data with Meta, aiming to integrate the Estonian language into AI models. This move is designed to improve chatbots, voice assistants, and translation tools, ensuring seamless digital experiences for Estonian speakers.

For a country of just over 1.3 million people, preserving its language is not only about technology but also about safeguarding national identity. Small languages - small not being a denominator of worth or importance of a language, simply the size of the population that speaks it - like the Baltic and some Nordic ones (Faroese, Sami, and Greenlandic), face significant struggles, ranging from declining speakers to limited digital resources. As we’ve reported before, platforms like ChatGPT have been struggling to learn small languages, as there is simply not enough data available for the AI to pick up.

Speakers of the small (and rare) languages across Northern Europe, aware of their constant challenge, have taken measures into their own hands. Greenland’s Language Secretariat, for example, has been developing the Kalaallisut (Greenlandic) spellchecker since 2005, which uses pre-defined grammatical and morphological rules to recognise and correct words, rather than relying solely on machine learning models that require large datasets.

Similarly, the Sami Language Centre created Sami Voice, an AI speech recognition system for the northern Sami language, improving digital interactions, such as voice assistants and translation systems, freely available to everyone. In Latvia, the government supported an AI-powered Latvian-speaking chatbot, which goes by the name Signe, to assist users in Latvian.

Estonia’s decision to share the language data with Meta has faced criticism, with concerns that reliance on large tech companies could lead to the country losing control over its language and culture. Critics argue that, without clear protections, the global corporations might be the primary beneficiaries, rather than Estonia itself.

Experts also agree that AI alone will not solve the challenges facing small languages in the digital age. Both linguists and AI entrepreneurs warn that beyond translation and syntax, AI must capture the unique cultural context of each language. While technology can support the practical use of these languages in digital spaces, true preservation requires integrating cultural nuance and engaging directly with people who speak the languages to maintain linguistic richness and depth.

We use cookies

We use cookies to improve user experience. Choose what cookies you allow us to use. You can read more about our Cookie Policy in our Privacy Policy.