Building better datasets for India

KS Roshan Menon Updated - April 18, 2022 at 03:50 PM.

The recently released Draft India Data Accessibility and Use Policy must incorporate a principle-driven definition of data quality

MeITY released the Draft India Data Accessibility and Use Policy, 2022 | Photo Credit: bluebay2014

Recently, the Ministry of Electronics and Information Technology (MeITY) released the Draft India Data Accessibility and Use Policy, 2022. A halfway house between a vision document and a governance framework for public sector data in India, the version of the policy uploaded for comments – the MeITY has seemingly shared a second, dissimilar version of the policy online – seeks to build capacities towards the seamless sharing of such data. The thrust on seamlessly sharing high-quality data is also captured in the policy’s stated aim – to enhance access, quality, and use of non-personal data.

The passages of the policy engage robustly with two facets of this trident of aims. These are access and use. Engagement with the objective of data quality, however, is minimal, with the policy allowing discourse to linger on standard-setting. A review of this approach reveals flaws and requires careful policy consideration.

The Problem

India has a decade-long tryst with open data centric policymaking. In 2012, the Government of India issued the National Data Sharing and Accessibility Policy (NDSAP). Drawn up to realise the objective of proactive data sharing contained in the Right to Information Act, 2005, the NDSAP stressed emphatically on access to high-quality public sector data. To this end, the policy identified quality as a key principle governing data access, recognising both quality control and quality up-gradation as regulatory objectives.

Despite these policy manoeuvres, access to high-quality public sector data remains difficult in India. The Global Open Data Index ranks India 32nd among the 94 countries it surveyed, scoring it poorly on access to land ownership, locations and weather forecast data.

Quality challenges have been reported in the dissemination of geospatial data, wherein voluntary organisations have had to pitch in to make such data more accessible.

In the past, problems have also arisen in accessing quality health data, with the Indian Council for Medical Research noting the adverse impact of inaccurate data on policymaking in the sector. Overall, the lack of quality data has hindered both public and business interest, stymying possibilities for value-creation and evidence-based policymaking.

In this backdrop, the policy’s narrative arc on data quality leaves much to be desired. While the policy and its accompanying Background Note makes occasional references to the need for data quality, its core text dawdles on the mechanisms to process and disseminate high-quality public sector data.

More specifically, the policy decides to delegate the task of regulating for data quality to relevant ministries and the India Data Council in its entirety. By doing so, the policy subverts traditional expectations from a governance document – there is no lodestar definition of data quality that can anchor subsequent regulatory efforts. Nor does the document explicitly identify relevant data quality principles that can direct governmental agencies to make datasets richer and more meaningful.

These drawbacks hinder making high-quality datasets accessible to relevant stakeholders in India. Consequently, they require treatment in subsequent drafts of the policy. A desirable response, combining both short-term and long-term strategies, is necessary to highlight the importance of data quality in the data sharing debate.

A Possible Solution

In the short-term, the policy must incorporate a principle-driven definition of data quality. A suitable example of this is the Karnataka Open Data Policy, 2021. The policy recognises six constituent elements of data quality: accessibility, accuracy, completeness, consistency, timeliness, and uniqueness. The policy further elaborates on these elements, enabling them to act as a fetter on the data processing activities undertaken by governmental agencies. Moreover, the inputs accompanying each principle are clear and broad-based, allowing them to be animated further via targeted guidelines.

In the long run, regulators must think of enforcing data quality as a regulatory objective. Much like how data protection laws allow individuals to enforce their right to ‘quality’ processing of their personal data, regulators must consider the application of a data quality right to the processing of all public sector data. Ideally, such a right would allow stakeholders to nudge the public sector to adopt better data practices. For instance, a transport company may compel a government agency to publish a machine-readable version of a town map, allowing it to devise optimal mobility strategies.

The essence of these interventions is a policy pivot. Unlike the policy and its predecessor, the NDSAP, it is important for the next wave of open data reforms to engage more meaningfully with data quality. A framework that outlines the attitudes necessary to achieve optimal data quality can unlock pathways for a resilient data economy in India.

The author is a Research Fellow at Shardul Amarchand Mangaldas & Co. Views are personal

Published on April 18, 2022 09:28