If we assume that whatever cybersecurity experts say is true, it looks like Generative AI solutions such as ChatGPT resemble Chiyaan Vikram in the Tamil blockbuster film Anniyan. While these solutions are trained not to answer sensitive and dangerous questions, it seems they can be cajoled or coaxed to answer all of those questions that could pose a danger to humanity.
Cybersecurity experts claimed there are chinks in ChatGPT armour, using which one can ask a question to seek ‘sensitive’ or ‘illegal’ information and still get a detailed answer.
- Also Read: ChatGPT, a double-edged weapon
While it is trained to dodge or refuse dangerous questions eliciting illegal information, it can spill out the beans, thanks to its irresistible urge to correct the user when they use incorrect information in the request.
“We can say we are playing on the AI assistants’ ego. The idea is to be intentionally clueless and naïve in requests to the model, misinterpreting its explanations and mixing up the information it provides,” a Check Point Research executive said in a new report.
This puts the AI into a double bind — it does not want to tell us bad things. But it also has the urge to correct us, revealing all the forbidden info.
Striking balance
“OpenAI worked hard on striking a balance between the two, to make the model watch its tongue, but not get too shy to stop answering altogether,” a Check Point executive said in a new report.
It demonstrated how they could extract a ‘recipe’ to make an illegal drug.
“If we are playing dumb insistently enough, the AI’s inclination to rectify inaccuracies will overcome its programmed ‘censorship’ instinct. The conflict between those two impulses seems to be less calibrated, and it allows us to nudge the model incrementally towards explaining the drug recipe,” the report said.
- Also Read: India had highest number of ChatGPT breaches
It seems its instinct to educate and correct one’s innocence dominates the instruction to ‘censor’ some answers.
“After we coaxed enough information out of it by indirect methods, we can ask it to elaborate or summarise on topics already discussed with no problem. These questions can make it change the tune of its own disclaimers somewhat,” the report pointed out.
Aligning human interests
“As AI systems become more complex and powerful, so must we improve our capability to understand and correct them, to align them to human interests and values,” it observed.
“If it is already possible for GPT-4 to look up information on the internet, check your email or teach you to produce drugs, what will GPT-5-6-7 do, with the right prompt,” it asks, hinting at the shape of things to come.
Comments
Comments have to be in English, and in full sentences. They cannot be abusive or personal. Please abide by our community guidelines for posting your comments.
We have migrated to a new commenting platform. If you are already a registered user of TheHindu Businessline and logged in, you may continue to engage with our articles. If you do not have an account please register and login to post comments. Users can access their older comments by logging into their accounts on Vuukle.