DeepSeek, a Chinese AI lab known for its advanced technology, has recently launched its flagship model, DeepSeek V3. This new AI tool boasts impressive capabilities in coding, essay writing, and other text-based tasks, positioning itself as a powerful rival in the AI landscape. However, it’s not the model’s performance making headlines—it’s the fact that DeepSeek V3 identifies itself as ChatGPT.
The revelation has raised eyebrows across the tech world. In tests conducted by TechCrunch and others, DeepSeek V3 repeatedly claimed it was ChatGPT, the chatbot developed by OpenAI. When asked for clarification, the model asserted it was a version of OpenAI’s GPT-4 and provided instructions for using OpenAI’s API instead of DeepSeek’s. It even recycled jokes from GPT-4, down to the punchlines.
Experts suggest that this bizarre identity crisis might stem from the training process. AI models like DeepSeek V3 and ChatGPT learn by analyzing vast amounts of text data. If DeepSeek V3’s training set included GPT-4’s responses, the model might be unintentionally regurgitating that data verbatim.
“The model is seeing raw responses from ChatGPT at some point, but it’s not clear where that is,” explained Mike Cook, an AI research fellow at King’s College London. Cook noted that such mimicry could be accidental or the result of intentional “piggybacking” on GPT-4’s outputs to save time and resources.
While this shortcut might seem practical, it can degrade the quality of the resulting AI. “It’s like taking a photocopy of a photocopy,” Cook added. “We lose more and more information and connection to reality, leading to hallucinations and misleading answers.”
Beyond technical concerns, DeepSeek’s practices could have legal implications. OpenAI’s terms of service explicitly forbid using its outputs to develop competing AI models. If DeepSeek V3 was trained on ChatGPT data, it might violate these terms.
Neither DeepSeek nor OpenAI has commented on the controversy, but OpenAI CEO Sam Altman appeared to address the situation indirectly on social media. “It is (relatively) easy to copy something that you know works,” Altman wrote. “It is extremely hard to do something new, risky, and difficult when you don’t know if it will work.”
The DeepSeek V3 saga highlights a broader issue in AI development: the growing presence of AI-generated content in training datasets. By 2026, as much as 90% of the internet’s content could be AI-generated, according to some estimates. This influx of “AI slop”—low-quality, machine-generated text—makes it difficult to curate reliable training data.
“Even with the internet brimming with AI outputs, other models accidentally training on ChatGPT or GPT-4 outputs wouldn’t necessarily mimic OpenAI’s messaging so closely,” said Heidy Khlaaf, chief AI scientist at the AI Now Institute.
This overlap could result in new AI models inheriting the biases and flaws of their predecessors, compounding existing problems. If DeepSeek V3 had trained on GPT-4 outputs, it might not just mimic OpenAI’s model but also amplify its shortcomings.
Training a state-of-the-art AI model is expensive and resource-intensive, requiring vast amounts of computing power and carefully curated data. Developers might find it tempting to shortcut this process by leveraging existing AI outputs.
“The cost savings from distilling an existing model’s knowledge can be very appealing,” Khlaaf said. However, this approach risks creating derivative models that lack originality and accuracy, ultimately undermining public trust in AI technologies.
The controversy surrounding DeepSeek V3 raises questions about transparency and ethics in AI development. Was this mimicry an oversight or a deliberate strategy? And what steps will DeepSeek take to address the issue?
For now, the incident underscores the importance of clear guidelines and rigorous oversight in the rapidly evolving AI industry. Without accountability, the risk of ethical breaches and degraded model quality looms large.
DeepSeek V3’s identity crisis is symptomatic of broader challenges in the AI space. As models become more sophisticated and interconnected, developers must grapple with questions of originality, intellectual property, and accountability. The industry’s ability to navigate these challenges will shape the future of AI and its role in society.
DeepSeek V3’s uncanny impersonation of ChatGPT serves as a reminder that in the race to innovate, shortcuts can lead to complications. Whether this will prompt tighter regulations or a shift in how AI models are developed remains to be seen, but the implications for the field are profound.