Microsoft and OpenAI probe DeepSeek-linked group for unauthorised data use

Microsoft and OpenAI are investigating whether a group linked to Chinese AI startup DeepSeek improperly obtained data from OpenAI’s technology.

According to Bloomberg, the probe follows concerns that the data extraction could breach terms of service or indicate unauthorised access by individuals associated with DeepSeek.

Microsoft’s security researchers observed suspicious activity in the fall, where individuals believed to be linked to DeepSeek were using OpenAI’s Application Programming Interface (API) to exfiltrate large amounts of data. The API is a licensed system for integrating OpenAI’s AI models into external applications, but misuse could violate these terms.

DeepSeek’s release of its R1 model in early January caused significant market turmoil, surpassing expectations and outperforming US competitors despite using far fewer resources than OpenAI or Google. This rapid rise raised questions about the underpinnings of the US stock market boom, which relies on AI hyperscalers investing heavily in computing power.

Tech stocks, including Microsoft, Nvidia, Oracle, and Alphabet, experienced a substantial drop in value as investors reconsidered their reliance on costly AI hardware. The decline wiped out nearly $1 trillion in market value before stabilising somewhat.

David Sacks, President Donald Trump’s AI czar, commented on the evidence, telling Fox News that there is “substantial evidence” DeepSeek used OpenAI’s models to train its own AI, a process known as distillation. This technique allows smaller models to replicate larger ones by learning from their outputs, potentially breaching OpenAI’s terms of service.

OpenAI responded by acknowledging that Chinese firms often attempt to replicate US technology. The company stated they are taking measures to protect their intellectual property and collaborate with the US government to safeguard advanced models.

While DeepSeek denied any wrongdoing during the lunar new year holiday, experts suggest that using outputs from larger models for training is a common practice in AI development. This highlights the challenges faced by companies seeking to protect their technical edge.

The probe underscores growing tensions between US and Chinese firms over intellectual property rights, as OpenAI faces its own legal battles regarding alleged unauthorised use of copyrighted data. In a statment to the House of Lords in early 2024, the company admitted "it would be impossible to train today’s leading AI models without using copyrighted materials."



Share Story:

Recent Stories


Bringing Teams to the table – Adding value by integrating Microsoft Teams with business applications
A decade ago, the idea of digital collaboration started and ended with sending documents over email. Some organisations would have portals for sharing content or simplistic IM apps, but the ways that we communicated online were still largely primitive.

Automating CX: How are businesses using AI to meet customer expectations?
Virtual agents are set to supplant the traditional chatbot and their use cases are evolving at pace, with many organisations deploying new AI technologies to meet rising customer demand for self-service and real-time interactions.