emember the famous Chandamama magazine, which used to publish much loved mythological stories and local folklore. While the magazine is no longer printed, those stories are now getting a GenerativeAI push. As many as 10,000 volunteers, mostly from different engineering colleges, have helped a small team of techies build a repository of stories in Telugu for a Small Language Model (SLM).
While GenerativeAI models such as ChatGPT and Bard were built on Large Language Models (LLMS) and can churn out lengthy output, the SLMs can generate focused and smaller amounts of content. Students from 30 engineering colleges participated in a four-hour hackathon that witnessed uploading of 40,000 pages of stories (PLEASE CROSSCHECK – IS IT 40,000 STORIES OR 40,000 PAGES OF STORIES. ) from Chandamama. This dataset has just been released for the public.
A model for tiny tales
As it gets the feed (digital text), the SLM model – AI Chandamama Kathalu – has started learning. “For now it is churning out meaningful content, small though. We are planning to release an LLM model in March, which will have the potential to generate lengthy outputs,” Kiran Chandra, Founder of Swecha and a Free Software Movement of India activist, told businessline.
Kiran teamed up with Chaitanya (Chief Product Officer and Co-Founder, Ozonetel) and Gaurav Raina (Professor at IIT Madras) to work on the SLM with an aim to build a Language Model for local Indian languages and building an AI solution for short stories.
“To build a story oriented AI language model, we don’t need a large language model, which is very resource intensive; a small language model (SLM) should be adequate. Our aim is to bring back the moral and ethical values embedded in ‘Chandamama Kathalu’ using a new and creative AI approach,” he said.
“The stories are available in the PDF form. It would have taken several months to digitise them. But we enrolled volunteers through the Swecha community to digitise the content. We could just finish it off in four hours,” he said.
Now it is all available on the internet for anyone to download and / or improve upon. The old popular stories thus can get a new twist. Says Chandra, “This whole effort reminds me of the effort we put 20 years ago in creating the first Telugu Operating System, creating the font and the glossary. This seems a logical step in our journey towards democratising technology and we shall continue to take up more work in this space.”
After getting the data ready, the team fed it into the AI model for training and making it develop its own content. Chaitanya, whose company is also involved in deploying Generative AI for building customer-relationship management solutions, chipped in to make this happen.
Gaurav Raina said that the learnings would help the team to work on developing Generative AI solutions for other Indian languages.
Comments
Comments have to be in English, and in full sentences. They cannot be abusive or personal. Please abide by our community guidelines for posting your comments.
We have migrated to a new commenting platform. If you are already a registered user of TheHindu Businessline and logged in, you may continue to engage with our articles. If you do not have an account please register and login to post comments. Users can access their older comments by logging into their accounts on Vuukle.