Lanfrica: A Database for African Languages developed by a Student of Jacobs University

Bonaventure is a final-year MSc Student in Data Engineering at Jacobs University Bremen. (Source: Jacobs University)


April 27, 2022
An estimated 2,000 languages are spoken on the African continent, more than in any other region of the world. The digital world, however, does not reflect their diversity; it is dominated by English. To change this, Bonaventure Dossou, a Master's student in Data Engineering at Jacobs University Bremen, together with his friend Chris Emezue developed a translation and speech recognition software for Fon, a language spoken in his home country of Benin. In addition, they also developed MMTAfrica, a multilingual machine translation for six African languages. Now they are following up with Lanfrica (, a central database for African languages. The basic idea of Lanfrica has already been awarded in a UNESCO startup competition last year.

"We want to improve the visibility and representation of African languages on the Internet," explained Bonaventure. Discoverability is limited not only because English dominates machine learning technologies, and language assistants from Google or Apple barely support African languages. But also because many African languages are not written languages. Often, only a few texts and sources exist as a data basis for NLP technologies (Natural Language Processing) such as machine translation.
Lanfrica is intended to remedy this situation. It sees itself as a catalog, a research tool that provides easy and clear access to existing research, data packages or archives. And it aims to bring together existing initiatives dealing with the machine readability of African languages.

The idea is catching on. At the ViVaTech-UNESCO Challenge, an international startup event, the duo presented Lanfrica in June 2021 and took first place in the category "Overcoming Language Barriers through Data and Artificial Intelligence". Through this competition, the United Nations Educational, Scientific, Cultural and Communication Organization promotes technology-based solutions that contribute to linguistic diversity and multilingualism. "It was exciting", Bonaventure said. "Winning has motivated us to further stay on the topic".

And so he does. Although he is just returning from a study visit of several months in Montreal, Canada. There, Bonaventure has been enrolled – as part of his Master's degree at Jacobs University – at the Mila - Quebec Artificial Intelligence Institute, the world’s renown Deep Learning research center. Furthermore, Bonaventure has also been an active student researcher at Google AI during his time in Canada. "I was able to pursue my second passion in Canada besides languages, which are health and biology related topics like developing new drugs using deep learning".

His stay in Canada also has to do with the pharmaceutical company Roche. The 25-year-old is supported within the framework of a cooperation between Roche Germany and Jacobs University and has gained practical experience in the industry at Roche Canada. The 25-year-old is also working on a health topic in his upcoming Master's thesis at Jacobs University. In his thesis, he is using Deep Learning to statistically analyze disease-associated mutations in the context of chromosomes that are publicly available from genome-wide association studies.

He will graduate from Jacobs University in June. What comes next, whether a PhD or a first job in the industry, remains to be seen. Already, however, Bonaventure has seen and done a lot. He grew up in Benin, studied mathematics in Russia at the Kazan Federal University, moved to Bremen, Germany, for his Master's degree at Jacobs University, and expanded his knowledge in Canada. He said, "I want to do meaningful and impactful research. I'm looking forward to seeing, achieving, and creating more".

