After finishing her long day of work, Baby Rajaram Bokale settles in her bed in Kharadi, Maharashtra, India, to perform one final task; she opens an app on her smartphone and begins to read a story aloud in Marathi, her native language. However, her nighttime routine serves a dual purpose:
Firstly, Bokale's voice will be used to train artificial intelligence (AI) applications in the Marathi language, which is spoken by around 80 million people but is underrepresented in the digital world. Secondly, the story that Bokale reads is designed to provide practical, financial knowledge in an engaging manner. Through this approach, she is learning important lessons on how to use banking services, how to save, and how to avoid scammers. "Now I'm able to do more interesting things with my smartphone," she says.
Just as Bokale is gaining knowledge, she's also making a valuable contribution to a wider project. Her narration work is part of a social impact initiative from Karya, an organization whose objective is to create high-quality datasets in India. These datasets are put together by unconventional workforces (often in rural areas) and serve to enlighten an often overlooked demographic regarding the benefits of the digital economy while simultaneously alleviating poverty.
Karya has ambitious targets. Its aspiration is to reach 100 million people by 2030, ensuring everyone can benefit from digital technologies - regardless of their native language. The issue of language is key in this mission; the digital world is largely dominated by English and Hindi, potentially alienating hundreds of millions of potential users. Karya's language data, therefore, is a crucial tool in a broader digital inclusion campaign.
Karya's work also shines a spotlight on unrecognized languages. It creates language datasets to train AI models in a multitude of Indian languages, including those spoken in rural areas. Paying workers above the minimum wage and providing continued support intend to assist Karya workers with long-term prosperity. Workers are also eligible to receive royalties if the data they help create is resold.
The organization was initially a Microsoft Research project in 2017, and has since been spun off as an independent entity. It now engages with over 30,000 workers across 24 of India's 28 states, making it a significant player in the data sector. Its CEO, Manu Chopra, believes that rural India can be both an excellent builder of AI and an excellent beneficiary of its technologies.
Kalika Bali, a language technologist and researcher at Microsoft Research Lab in Bengaluru, uses data gathered by Karya for her own studies. Regarding the importance of the organization's work, Bali emphasizes that "no one should be excluded from using technology because of their language." She asserts her belief in the potential of AI, remarking on its expedited process in preserving and using languages in large language models.
The intersection of income, education, and meaningful work is the fundamental cornerstone of Karya’s mission. One of the aims moving forward is to eventually have many Karya workers join the organization in various roles, working as organisers and local administrators.
Bokale and her friends, who have both worked for and benefitted from Karya, are testament to the project's success. From learning how to use their smartphones to becoming more financially savvy, the women appreciate the opportunities that have been unlocked for them. They were able to utilise the money they had earned through Karya’s data collection to invest in professional tools and education for their children.