AI for Good blog

Harnessing AI to deliver content in local languages across India

Culture | Digital Economy | Inclusivity

Internet use is skyrocketing in rural India, thanks to more affordable smartphones, some of the cheapest mobile data rates in the world, and a steady supply of online videos to consume.

Nine of every ten new Internet users are expected to be speakers of at least one of India’s local languages, which number in the hundreds.

Technology company VerSe Innovation is betting on artificial intelligence (AI) innovation and rising demand for localization to help them bridge the country’s digital – and linguistic – divides.

The company claims to be India’s first “unicorn” in local languages technology, reaching a valuation of 1 billion USD after raising the equivalent of 200 million USD from investors.

Its Dailyhunt news aggregator serves over 300 million users with content in 14 local languages, while its new Josh short video app, one of many to launch in the country after a ban on Tik Tok last year, supports 12 languages. Josh draws over 85 million monthly active users, according to VerSe.

Mobile content creation

“Mobile is the Internet, and the Internet is mobile,” says Dailyhunt co-founder Umang Bedi, who points to a major shift in India’s mobile internet landscape when mobile phones evolved from consumption to content creation devices. “The second big shift that we’ve seen is the smartphone becoming the primary entertainment device. And as as speed and content proliferate, attention spans get shorter and shorter.”

COVID-19 has also fast-tracked the penetration of mobile devices across age groups in the country, says another founder, CEO Virendra Gupta. Along with satisfying user demand for discovery and instant gratification, the company believes in the enabling power of local languages.

“One billion people speak, read and write in India’s local languages, Gupta says. “That is how they want to consume information.”

The numbers of people speaking Marathi (83 million), Telugu (81 million), and Tamil (69 million) are higher than the entire populations of Turkey, France, and the United Kingdom respectively, one report observes.

When VerSe started up nearly 15 years ago, local languages were difficult or impossible to use online. “A local language font would not be rendered on a mobile phone because it was not a Unicode [format] as per global standards,” Gupta recalls. “But the Internet should not be created just for English-speaking users.”

Feeding the machine

Content from the VerSe creator base is fed into the platform in multiple formats. “With over 80 AI and ML algorithms that we built across various technologies, from content filtering and collaborative filtering to reinforcement learning, the machine tries to understand the content and the context across languages,” Gupta explains.

For instance, an article about Indian cricketer Virat Kohli, typically be associated with sports, might instead turn out to be about Bollywood, given his marriage to actress Anushka Sharma. VerSe systems can analyze text content, images, and audio to better understand premise and context. “The technology creates hundreds of thousands of tags that contextualize the content and the taxonomy,” says Bedi.

While applying algorithms has been challenging, it is also the company’s biggest competitive advantage. “The local language association of words and lines is very different from English,” Gupta says.

“Over the last five years, we put people and engineers together to train our models on huge datasets to eventually understand the context of words in each language.”

“Today, we have over 100,000,000 historic pieces of content for training,” explains Bedi. “A five-petabyte dataset is growing by five terabytes a day because we are feeding three to four million pieces of content across languages daily.”

A newly acquired VerSe subsidiary aims to set up an AI lab to “aggressively pursue neuroscience-inspired reinforcement learning experiments to understand user taste profiles, using non-intrusive, implicit behavioural signs.”

Democratizing content creation

Local language users may lack high-end smartphones or strong network coverage, impeding their access for video.

With such constraints in mind, the company’s offering accommodates low network speed and strength, says Gupta.

Building and monetizing a content creator ecosystem took a great deal of effort, he adds, recalling how his team had “gone to the street to source great content creators and built partnerships with them over the years.”

Today, VerSe works with large news organizations, professional content creators, independent creators, hyper-local stringers, and influencers. “When we started off, we had 800 content partners. Today we have nearly 100,000,” Bedi notes. “We have democratized that content creation process via tools and technologies.”

Exporting the model

VerSe apps now come preloaded on most handsets in the Indian market. “A consumer who’s coming online for the first time and buys a smartphone sees our app on the phone [with] content in one’s own language,” explains Bedi.

The founders intend to export their localization model to international markets. Local language demand applies not just to developing markets but also to some developed markets, according to Bedi.


Image credit: AMIT RANJAN via Unsplash

Are you sure you want to remove this speaker?