Microsoft accidentally leaks 38TB of private data on GitHub

19 Sep 2023

Illustration of a tap on a wall made of code, with liquid code pouring out from the nozzle. Used as a concept for a data breach.

Image: © the_lightwriter/Stock.adobe.com

The tech giant claimed a URL link contained an ‘overly-permissive’ token, which gave access to private employee data.

Microsoft’s AI researchers accidentally exposed 38TB of data on GitHub when publishing open-source training data, according to a report by Wiz Research.

The report claims that this leak included a disk backup of two employee workstations, which contained secrets, private keys, passwords and more than 30,000 internal Microsoft Teams messages.

Microsoft confirmed the accidental leak after it was informed by Wiz, but said no customer data was exposed and no other internal services were put at risk.

Wiz claims the issue was caused from a GitHub repository that belongs to Microsoft’s AI research division, which provides open-source code and AI models through a URL.

“However, this URL allowed access to more than just open-source models,” Wiz said in a blogpost. “It was configured to grant permissions on the entire storage account, exposing additional private data by mistake.

“In addition to the overly permissive access scope, the token was also misconfigured to allow ‘full control’ permissions instead of ‘read-only’. Meaning, not only could an attacker view all the files in the storage account, but they could delete and overwrite existing files as well.”

Microsoft said the issue happened because the URL included an “overly-permissive” Shared Access Signature token for an internal storage account. After being informed of the issue by Wiz on 22 June, Microsoft said it prevented all external access to the account and mitigated the issue on 24 June.

Wiz said the issue is an example of the new risks organisations face when they leverage the power of AI more broadly, as “more of their engineers now work with massive amounts of training data”.

“As data scientists and engineers race to bring new AI solutions to production, the massive amounts of data they handle require additional security checks and safeguards,” Wiz said.

The tech giant faced criticism earlier this year after suspected China-based hackers managed to infiltrate the emails of various government officials by impersonating Microsoft Azure AD users.

10 things you need to know direct to your inbox every weekday. Sign up for the Daily Brief, Silicon Republic’s digest of essential sci-tech news.