Data is turning out to be more valuable than we thought. Google and Facebook’s ad revenues exceeded $200 billion last year. They can hope to have a bigger source of income soon, thanks to the income generated by the Artificial Intelligence (AI) business built using the data of billions of individuals.
No wonder, getting hold of data by paying top dollars is the new game in the digital world. This may explain the sudden investments of Google, Facebook, Intel, and many others in India, one of the largest data generators of the world. AI apps are estimated to add about $15 trillion or 17 per cent to global GDP in the next 10 years. By then, over 70 per cent of firms would use AI in some form. AI systems are set to transform the healthcare, finance, security, education, industry, agriculture, entertainment, and research sectors.
Both Google and Facebook are the frontrunners to develop commercially viable AI apps. But such apps will remain private property. Users will have no claim even though their data fuels these apps. This is a pity as data is the most critical input for the AI economy. Data is not merely the raw material for AI apps, it also allows the apps to self-improve based on the insights generated from the new data. So, data is both raw material and the mind of an AI app. In that case, it becomes the intellectual property that creates competitive advantage.
As AI becomes main-streamed in the next few years, data will also become our most critical national resource. More precious than gold or oil. Not claiming ownership of data will amount to handing over our most valuable resource for private gains. This will have many effects.
Imagine a firm getting access to billions of transaction-level records of Indian bank-account holders, mobile and data users, and a detailed medical history of patients. We cannot rule out the possibility of the sale of medical history and hospital records of millions of patients to big pharma majors. Data in private hands may not make healthcare or education inexpensive.
Targeted AI apps can identify political opinions or the next location of terrorist activity. The possibility of selective use or misuse of data cannot be ruled out. Taking action with respect to remote firms becomes almost impossible. How can data generated by billions of users become private property? It is a natural public good. With an everyday generation of terrabytes of local data and a world-class IT workforce, India has both the raw material and the brainpower to become a significant data player.
Playing the data game
How to break through the data game? Here’s a four-step plan.
First, declare data as a national resource. Private firms, Indian or foreign, will not own any data. They may capture or use it for pre-specified purposes only. Firms cannot share, transfer, or sell data. Set up a National Data Authority (NDA), whose authorisation would be needed for all subsequent use. NDA may create a cloud platform where all private and government data may reside. All entities will have to surrender data to this network after the specified use. NDA can allow the sharing of specified data with firms/countries based on clear principles. European General Data Protection Regulation can be a good reference point on data-sharing.
Second, create a National AI Network, which will oversee the use of national data. It will collaborate with Indian and global private firms and institutions for development, adoption, and monetisation of AI apps.
Having India’s own AI apps will offer many strategic advantages. Imagine an Indian health app backed by medical records of millions of patients, accurately detecting the extent of the spread of cancer cells, kidney problems, or the possibility of a heart attack. And India giving it free, making healthcare inexpensive and accessible in developing countries.
India’s Cipla supplied $1 a day dose of AIDS drugs to African countries when comparable western costs were 50 times higher. India faced resistance from Big Pharma then. Resistance from digital giants in case of data will be bigger.
Third, we must become the global data labelling hub. Raw data is of no use for AI applications. A computer can see an X-ray of a patient but cannot identify the disease. Data labelling provides a connection between the raw data and its meaning to human beings. Making unstructured data AI worthy is a labour-intensive and time-consuming task. For most AI tasks, data cleaning and labelling take 80- 90 per cent of project time. Most global firms outsource this task. The scale of operation is significant. For example, one Chinese firm alone employs 300,000 data labellers.
To get into this business, we need to incentivise the opening of data labelling centers, which will label Indian and global data for the use of AI. India may sell processed data or use it for AI-related developments.
Fourth, introduce high-quality AI masters and doctoral programmes in top science and engineering colleges. The world has barely 10,000 high-quality AI experts.
The writer is an Indian Trade Service officer. Views are personal.