Microsoft AI researchers accidentally exposed big cache of data

Bloomberg Updated - September 19, 2023 at 01:06 PM.

Microsoft investigated and remediated an incident involving its employee who shared a URL in a public GitHub repository to open-source AI learning models.

Microsoft logo | Photo Credit: GONZALO FUENTES

Microsoft Corp.’s AI research team accidentally exposed a large cache of private data on the software development platform GitHub, according to new research from a cybersecurity firm. 

A team at the cloud security company Wiz found the exposure of cloud-hosted data on the AI training platform via a misconfigured link. The data was leaked by Microsoft’s research team while publishing open-source training data on GitHub, according to Wiz.

Users of the repository were urged to download AI models from a cloud storage URL. But it was misconfigured to grant permissions on the entire storage account, and it also granted users full control permissions, as opposed to read-only, meaning they could delete and overwrite existing files, according to a Wiz blog post.

The exposed data included Microsoft employees’ personal computer backups, which contained passwords to Microsoft services, secret keys and more than 30,000 internal Microsoft Teams messages from 359 Microsoft employees.

Open data sharing is a key component of AI training, but sharing larger amounts of data leaves companies exposed to larger risk if shared incorrectly, Wiz’s researchers stated. The cloud security company shared the data in June with Microsoft, which moved quickly to remove the exposed data, said Ami Luttwak, chief technology officer and co-founder of Wiz, who added that the incident “could have been worse.”

Asked for comment, a Microsoft spokesperson said, “We have confirmed that no customer data was exposed, and no other internal services were put at risk.”

In a blog post published Monday, Microsoft said it investigated and remediated an incident involving a Microsoft employee who shared a URL in a public GitHub repository to open-source AI learning models. The tech giant said the data exposed in the storage account included backups of two former employees’ workstation profiles and internal Microsoft Teams messages of these two employees with their colleagues.

The data cache was found by Wiz’s research team scanning the internet for misconfigured storage containers, part of its ongoing work on accidental exposure of cloud-hosted data, according to the blog. 

 More stories like this are available on bloomberg.com

Published on September 19, 2023 07:36

This is a Premium article available exclusively to our subscribers.

Subscribe now to and get well-researched and unbiased insights on the Stock market, Economy, Commodities and more...

You have reached your free article limit.

Subscribe now to and get well-researched and unbiased insights on the Stock market, Economy, Commodities and more...

You have reached your free article limit.
Subscribe now to and get well-researched and unbiased insights on the Stock market, Economy, Commodities and more...

TheHindu Businessline operates by its editorial values to provide you quality journalism.

This is your last free article.