Kiswahili Machine Learning Fellow Remote, International

Company: Mozilla

The Mozilla Foundation is a California nonprofit public benefit organization. Our mission is to ensure the Internet is a global public resource, open and accessible to all. We believe that open and free is better than closed and controlled. Join us and become part of our mission to promote openness, innovation, and opportunity online!

20 years ago, this meant building a browser and protecting the open web. Today, it also includes making sure AI and data driven technology are more trustworthy. In order to focus on places where Mozilla can impact the next era of our work, we are increasingly working on the topic of trustworthy AI. For us, this means two things human agency is a core part of how AI is built and integrated and corporate accountability is real and enforced. To foster this effort, we are exploring projects that invite the public to make ‘data donations’ that help create trustworthy AI.

Most significantly, this includes the Common Voice project, which seeks to shape the future of voice AI, with a particular focus on Project Common Voice. This project seeks to build openly available data sets for training machine learning driven voice technologies. The core of this project is a platform that supports language communities and individual volunteers to “donate their voice” to an open data set, which anyone can then download for commercial or non-commercial use. Common Voice has had 350,000 people donate or validate voice samples to date!

This role will provide an opportunity to build out the next phase of the project which includes:

  • Developing partnerships and improving the platform to help us scale both languages contributing to Common Voice
  • Developing new communities for new languages (prioritizing Kiswahili). Play a key role in developing and planning our long term aspirations in the data donation space

Read more about Mozilla's Common Voice and under-served languages here:

All of this work will have a tie-in to Mozilla’s emerging Data Futures Lab initiative:


As our Kiswahili Machine Learning Fellow you will work closely with Mozilla’s Fellow Local Lead (Under-resourced Languages), and the Mozilla Foundation Common Voice project team, focusing on building:

  • An open voice data set in Kiswahili suitable for training a speech-to-text engine for seven specific use-cases (1,000+ hours), including domain-specific subset needed for specific use cases
  • A trained speech-to-text model in Kiswahili based on Mozilla’s DeepSpeech open source technology, with technical assistance provided for implementation into use cases like a local product or initiative
  • Distributing the STT model in various ways as needed by the community; as a package, as an API or through other methods
  • Establishing and supporting a Kiswahili language and tech community, with a goal of encouraging adoption and implementation of voice technology.

This requires some technical expertise: writing code, advancing standards and methods for data collection, and also working with external project collaborators to support their efforts in collecting data and using speech recognition models. You will furthermore support the Community Lead in growing an ecosystem around voice technology specifically, and AI more generally.

Experience and Qualifications:

  • Academic and some professional experience in Machine Learning (specifically TensorFlow and Python) and voice technology
  • Creative and original ideas for building technologies and SDG-relevant product use cases
  • Passion for under-resourced languages, with exposure to languages and linguistics
  • Adding value to open source projects and mentoring/supporting contributors
  • Work in a self-directed manner to achieve agreed goals and outcomes
  • Given this work crosses teams and involves outside collaborators: strong interpersonal skills, ability to communicate, and relationship building skills
  • Able to work on diverse and geographically distributed teams, and in particular across cultures
  • Some experience in international development is a plus
  • Previous exposure to STT toolkit such as DeepSpeech, Kaldi or CMU Sphinx is a plus

Mozilla Foundation Hiring Practices:

Mozilla understands that valuing diverse creative practices and forms of knowledge are crucial to and enrich the company’s core mission. We encourage applications from everyone, including members of all equity-seeking communities, such as (but certainly not limited to) women, racialized and Indigenous persons, persons with disabilities, and persons of all sexual orientations and gender identities and expressions.

We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Please contact us to request accommodation.

Vacancy page :