Any academic researcher worth his salt has, in all likelihood, spent countless hours taking down old books from musty libraries and handling copies of old newspapers and documents that crumble at the slightest touch.

Thanks to the digital revolution, libraries found a way to protect these pages of history from eager beavers working on their Ph.Ds by digitising archived information and making it available electronically. Helping that effort is Ninestars Information Technologies Ltd, a digitisation firm in Chennai.

Danish deal

Ninestars recently won a three-year contract from the State and University Library in Aarhus, Denmark, to digitise 32 million pages of Danish newspaper archives. The deal ranks among the largest newspaper digitisation initiatives in the world, but the scale isn’t intimidating them. Since its inception in 1999, Ninestars has digitised over 300 million newspaper pages, including those of The New York Times and The Wall Street Journal .

Founded in 1999, V. Gokulkrishnan and his brother Gopalkrishnan stumbled into the world of digitisation. They converted their then newly-launched typesetting company into a digitisation facility after winning a contract from ProQuest, an electronic publisher that holds extensive archives of newspapers, magazines and research papers.

“Two companies made it very big after the dotcom bubble burst – Amazon, for its rich content, and ProQuest, which was digitising everything,” recalls Gokulkrishnan, of their early days in the business The fact that ProQuest has stayed faithful to them boded well for this low-profile company. A decade since winning that contract, Ninestars now has 1,400 employees who work round the clock in seven cities and towns and operates sales offices across the globe.

Before the digital age, newspaper and magazine archives were preserved as hard copies and on microfilm. Pages were preserved by photographing them with a special camera and then transferring the images to a black and white roll of film. The images were accessed when needed by passing the film through a microfilm reader.

How it works

At the heart of Ninestars’ digitisation process is a technology called OCR – optical character recognition. The software converts letters on an image (for instance, the logo of The New York Times ) into text. This way, the digitised files can be accessed by entering keywords in search engines operated by libraries and academic publishers like ProQuest and JSTOR.

For example, if you’re looking for a news report from The Hindu ’s archives of 1948 on Mahatma Gandhi’s assassination, you can narrow the search using suitable keywords. OCR makes this search possible by converting the photograph of the archived report into text for a search engine.

The oldest paper they have digitised is the Hartford Courant (based in Connecticut, US), with editions dating back to 1770. Ninestars has digitised almost all leading Indian dailies, including The Hindu , Business Line , The Times of India , Malayala Manorama , Eenadu and Amar Ujala . It has digitised newspapers in almost every European language besides a few in Japanese, Chinese and Korean.

Gokulkrishnan says that Ninestars won the Danish library contract because it already runs a scanning centre in Hamburg, Germany. Libraries worry about the safety of their microfilms, which need to be stored in dry, temperature-controlled rooms. “We won the contract because we were able to convince them that they wouldn’t have to send their microfilms across the globe. We scan the films in Hamburg and use the cloud to access them here.”

At its office in Chennai, floor after floor is filled with operators rapidly trawling Web sites and blogs and documenting the content for media monitoring firms around the world. He predicts that technology giants such as Google, Amazon and Facebook will become hungrier for content and he wants Ninestars to satisfy that insatiable appetite.

>tanya.et@thehindu.co.in