Data poisoning can turn Generative AI a merchant of false info

K V Kurmanath Updated - September 29, 2023 at 10:19 AM.

Clover Infotech CTO warns of fictitious, false info swarming Generative AI models

bl02_artificial_inteligence

If someone says Mumbai is the capital of India or one kilometre and one mile are of the same length, we will laugh at that person. If someone tries to pass off Newton’s third law as – For every action, there will not be any equal and opposite reaction, we will scoff at him.

But in the world of Generative AI, all this fictitious and wrongful information will find its way into websites and search engines. This could make gullible users of Generative AI solutions such as ChatGPT and Bard.ai believe whatever content is produced by them.

Cybersecurity experts are calling this – Data poisoning, which is injecting malicious inputs into the training engines. This can significantly corrupt the learning process, Neelesh Kripalani, Chief Technology Officer, Clover Infotech, has said.

Engineers that are building the LLM (Large Language Models) platforms such as ChatGPT feed enormous amounts of data into the system and train it to understand the content and reproduce answers based on the questions asked by users.

Also Read: Reliance Jio collaborates with Nvidia for AI infrastructure; hints at a possible homegrown generative AI

The quality of the output is completely dependent on the input and injection of false or fictitious data corrupts the model. This could adversely impact the output.

“A robust data validation pipeline, combined with meticulous dataset curation, is the best armour against this threat,” he said.

Talking about the possible dangers in the indiscriminate use of Generative AI models, he said that malicious actors can create realistic synthetic identities using these services. “The attackers can use its ability to create synthetic identities and use it for fraudulent activities. As a guardian of digital identity, CIO’s counter measures should involve continuous monitoring of user behavior patterns, coupled with adaptive authentication mechanisms,” he said.

Deepfake Amplification

Deepfakes (AI-generated images and videos that mimic real-life personalities) pose a grave risk to organisational reputations. “Detecting manipulated media in real-time requires advanced image and video analysis tools, along with AI-driven media authenticity verification systems,” he said.

Also Read: Microsoft to defend customers on AI copyright challenges

“Organisations must put in place a process within the AI models to identify and remove deepfakes from the asset library,” he said.

He said phishing (similar-looking email ids and websites) campaigns are getting more sophisticated after the advent of Generative AI solutions. “Machine learning-driven anomaly detection systems are our allies here, enabling us to spot anomalous patterns in communication, protecting the employees and stakeholders from phishing scams,” he said.

Kripalani said Generative AI can inadvertently leak sensitive information when generating responses or content. “To address this challenge, we can use a mix of AI-driven content validation algorithms and policy-driven content filters, ensuring that only appropriate content is shared,” he said.

Published on September 9, 2023 10:11