+0.48 OpenAI Furious DeepSeek Might Have Stolen All the Data OpenAI Stole from Us (www.404media.co S:+0.31 )
1361 points by latexr 396 days ago | 16 comments on HN | Moderate positive Contested Editorial · v3.7 · 2026-02-28 10:42:19 0
Summary Intellectual Property & Equitable Compensation Advocates
404 Media investigates OpenAI's corporate hypocrisy: the company complains about DeepSeek's alleged unauthorized use of its training data while OpenAI itself built its systems through extensive, largely uncompensated scraping of creator-generated content. The article strongly advocates for intellectual property rights, fair compensation for creative work, and uniform corporate accountability, positioning data and creative labor as property deserving legal protection and economic return.
Article Heatmap
Preamble: +0.30 — Preamble P Article 1: ND — Freedom, Equality, Brotherhood Article 1: No Data — Freedom, Equality, Brotherhood 1 Article 2: ND — Non-Discrimination Article 2: No Data — Non-Discrimination 2 Article 3: ND — Life, Liberty, Security Article 3: No Data — Life, Liberty, Security 3 Article 4: ND — No Slavery Article 4: No Data — No Slavery 4 Article 5: ND — No Torture Article 5: No Data — No Torture 5 Article 6: ND — Legal Personhood Article 6: No Data — Legal Personhood 6 Article 7: +0.20 — Equality Before Law 7 Article 8: +0.48 — Right to Remedy 8 Article 9: ND — No Arbitrary Detention Article 9: No Data — No Arbitrary Detention 9 Article 10: ND — Fair Hearing Article 10: No Data — Fair Hearing 10 Article 11: ND — Presumption of Innocence Article 11: No Data — Presumption of Innocence 11 Article 12: +0.25 — Privacy 12 Article 13: ND — Freedom of Movement Article 13: No Data — Freedom of Movement 13 Article 14: ND — Asylum Article 14: No Data — Asylum 14 Article 15: ND — Nationality Article 15: No Data — Nationality 15 Article 16: +0.64 — Marriage & Family 16 Article 17: +0.56 — Property 17 Article 18: ND — Freedom of Thought Article 18: No Data — Freedom of Thought 18 Article 19: +0.62 — Freedom of Expression 19 Article 20: ND — Assembly & Association Article 20: No Data — Assembly & Association 20 Article 21: +0.20 — Political Participation 21 Article 22: +0.30 — Social Security 22 Article 23: +0.44 — Work & Equal Pay 23 Article 24: ND — Rest & Leisure Article 24: No Data — Rest & Leisure 24 Article 25: ND — Standard of Living Article 25: No Data — Standard of Living 25 Article 26: ND — Education Article 26: No Data — Education 26 Article 27: +0.50 — Cultural Participation 27 Article 28: ND — Social & International Order Article 28: No Data — Social & International Order 28 Article 29: +0.40 — Duties to Community 29 Article 30: +0.38 — No Destruction of Rights 30
Negative Neutral Positive No Data
Aggregates
Editorial Mean +0.48 Structural Mean +0.31
Weighted Mean +0.45 Unweighted Mean +0.41
Max +0.64 Article 16 Min +0.20 Article 7
Signal 13 No Data 18
Volatility 0.14 (Medium)
Negative 0 Channels E: 0.6 S: 0.4
SETL +0.47 Editorial-dominant
FW Ratio 53% 26 facts · 23 inferences
Evidence 20% coverage
2H 5M 6L 18 ND
Theme Radar
Foundation Security Legal Privacy & Movement Personal Expression Economic & Social Cultural Order & Duties Foundation: 0.30 (1 articles) Security: 0.00 (0 articles) Legal: 0.34 (2 articles) Privacy & Movement: 0.25 (1 articles) Personal: 0.60 (2 articles) Expression: 0.41 (2 articles) Economic & Social: 0.37 (2 articles) Cultural: 0.50 (1 articles) Order & Duties: 0.39 (2 articles)
HN Discussion 8 top-level · 4 replies
roshin 2025-01-29 15:43 UTC link
I hate clickbait articles that try to make the bad guys seem like they're angry.

> Both Bloomberg and the Financial Times are reporting that Microsoft and OpenAI have been probing whether DeepSeek improperly trained the R1 model

The company openai is not angry, or furious, or enraged. They simply suspect that deepseek broke their usage agreement and are trying to verify that.

csallen 2025-01-29 16:54 UTC link
There is nothing in this article to suggest that OpenAI is "furious" or even upset. Zero evidence. It's total clickbait.

And it's embarrassing that so many commenters on Hacker News who want to believe this storyline are just pretending that it's true despite the lack of evidence.

dang 2025-01-29 18:14 UTC link
Comments moved to https://news.ycombinator.com/item?id=42861475, which has the more informative of the two articles that this one was lifted from.

Submitters: "Please submit the original source. If a post reports on something found on another site, submit the latter." - https://news.ycombinator.com/newsguidelines.html

Please especially don't submit knock-off articles that jack up the linkbait and indignation. That's what we're trying to avoid on Hacker News. There are enough places to get that hit elsewhere.

SergeAx 2025-01-30 05:38 UTC link
randalflagged 2025-01-30 06:42 UTC link
[flagged] of course. This place is starting to drift in one direction.
AlpineG 2025-01-30 08:01 UTC link
OpenAI's model is closed source. IDK if distilling can be done via the API effectively? DeepSeek already has distilled models from other open source models like Qwen which have been done by 3rd party researchers, and I assumed that happened rapidly because they are all open source.
adultSwim 2025-01-31 07:00 UTC link
Spot on headline. OpenAI itself uses distillation to launder its own ill-gotten data.
csallen 2025-01-29 16:53 UTC link
I came here to say just this.

Can we change the headline of this article to something more accurate and less clickbaity?

The article unjustifiably labels OpenAI as "furious" despite surfacing zero evidence that that's how they actually feel, obviously in an attempt to paint them as hypocrites who are okay with copying others but are upset at being copied.

This is a very dishonestly-framed and -advertised story.

sebastiennight 2025-01-29 16:59 UTC link
Such a perfect article title, but wasted on clickbait.
ZeroTalent 2025-01-29 19:24 UTC link
As I understand from Twitter, the issue explained in this article is not the actual case at hand. The issue is that they suspect them of stealing the o1 model with the weights via corporate espionage and optimizing it with Matrix Multiplication and other upgrades. That would explain why the outputs are nearly identical in some cases.

I don't know how much of any of this is true. This is what I'm reading on Twitter today.

tim333 2025-01-29 22:05 UTC link
It's funny though. There seem to be a lot of commenters on Hacker News who don't really get the sense of humor thing.
Editorial Channel
What the content says
+0.80
Article 16 Marriage & Family
High Advocacy Framing Coverage
Editorial
+0.80
SETL
+0.57

CENTRAL FOCUS: Article strongly advocates for intellectual property rights of data creators; frames unauthorized use as property violation; emphasizes lack of compensation as core injustice; well-sourced critical coverage

+0.70
Article 17 Property
Medium Advocacy Framing
Editorial
+0.70
SETL
+0.49

Article advocates for creators' right to own and control intellectual property; frames data and model outputs as property subject to ownership disputes and legal protection

+0.70
Article 19 Freedom of Expression
High Advocacy Practice Coverage
Editorial
+0.70
SETL
+0.37

Article exercises free expression through critical journalism; publicly investigates and critiques major corporations and government officials without apparent fear; demonstrates robust press freedom

+0.70
Article 27 Cultural Participation
Medium Advocacy Framing
Editorial
+0.70
SETL
+0.59

Article advocates for creators' rights to share in benefits of their scientific and creative contributions; frames fair compensation as essential protection of intellectual rights

+0.60
Article 8 Right to Remedy
Medium Advocacy Framing
Editorial
+0.60
SETL
+0.42

Article advocates for effective remedy and accountability; critiques systemic gap where OpenAI has faced no remedies despite similar violations; frames investigation as holding power accountable

+0.60
Article 23 Work & Equal Pay
Medium Advocacy Framing
Editorial
+0.60
SETL
+0.49

Article frames data creation and intellectual work as labor deserving just compensation; advocates that creators should be economically compensated for work underlying AI systems

+0.50
Article 30 No Destruction of Rights
Medium Advocacy Framing
Editorial
+0.50
SETL
+0.39

Article opposes corporate practices that violate creators' rights; advocates for prevention of abuse through transparency and public accountability

+0.40
Article 29 Duties to Community
Low Framing
Editorial
+0.40
SETL
ND

Article frames corporate duties and community obligations; implies companies have responsibilities toward communities and individuals whose work/data they use

+0.30
Preamble Preamble
Low Framing
Editorial
+0.30
SETL
ND

Article implicitly affirms human dignity of creators; frames data exploitation as disrespect for rights-holders' intellectual contributions

+0.30
Article 22 Social Security
Low Framing
Editorial
+0.30
SETL
ND

Implicitly addresses economic and social rights; frames lack of compensation for data creation as denial of economic rights to creators

+0.25
Article 12 Privacy
Low Framing
Editorial
+0.25
SETL
ND

Implicitly addresses privacy and informational integrity; frames unauthorized data collection as violation of creators' privacy interests

+0.20
Article 7 Equality Before Law
Low Framing
Editorial
+0.20
SETL
ND

Implicitly addresses equal protection; suggests corporate data practices should be governed fairly regardless of which company commits them

+0.20
Article 21 Political Participation
Low Framing
Editorial
+0.20
SETL
ND

Implicitly addresses democratic participation; journalism enables informed public discourse on corporate governance and AI regulation

ND
Article 1 Freedom, Equality, Brotherhood

Not addressed

ND
Article 2 Non-Discrimination

Not addressed

ND
Article 3 Life, Liberty, Security

Not addressed

ND
Article 4 No Slavery

Not addressed

ND
Article 5 No Torture

Not addressed

ND
Article 6 Legal Personhood

Not addressed

ND
Article 9 No Arbitrary Detention

Not addressed

ND
Article 10 Fair Hearing

Not addressed

ND
Article 11 Presumption of Innocence

Not addressed

ND
Article 13 Freedom of Movement

Not addressed

ND
Article 14 Asylum

Not addressed

ND
Article 15 Nationality

Not addressed

ND
Article 18 Freedom of Thought

Not addressed

ND
Article 20 Assembly & Association

Not addressed

ND
Article 24 Rest & Leisure

Not addressed

ND
Article 25 Standard of Living

Not addressed

ND
Article 26 Education

Not addressed

ND
Article 28 Social & International Order

Not addressed

Structural Channel
What the site does
+0.50
Article 19 Freedom of Expression
High Advocacy Practice Coverage
Structural
+0.50
Context Modifier
ND
SETL
+0.37

404 Media platform enables free expression; independent ownership and investigative mission demonstrate structural commitment to public speech and press freedom

+0.40
Article 16 Marriage & Family
High Advocacy Framing Coverage
Structural
+0.40
Context Modifier
ND
SETL
+0.57

404 Media investigative practice demonstrates structural commitment to protecting IP rights through public accountability journalism

+0.35
Article 17 Property
Medium Advocacy Framing
Structural
+0.35
Context Modifier
ND
SETL
+0.49

404 Media platform protects property rights through investigative accountability journalism

+0.30
Article 8 Right to Remedy
Medium Advocacy Framing
Structural
+0.30
Context Modifier
ND
SETL
+0.42

404 Media's investigative journalism contributes to accountability mechanisms by exposing corporate violations and forcing public reckoning

+0.20
Article 23 Work & Equal Pay
Medium Advocacy Framing
Structural
+0.20
Context Modifier
ND
SETL
+0.49

404 Media's investigative labor exemplifies recognition of creative work's value

+0.20
Article 27 Cultural Participation
Medium Advocacy Framing
Structural
+0.20
Context Modifier
ND
SETL
+0.59

404 Media investigative practice protects this right through accountability journalism

+0.20
Article 30 No Destruction of Rights
Medium Advocacy Framing
Structural
+0.20
Context Modifier
ND
SETL
+0.39

404 Media's investigative practice functions as abuse prevention mechanism through exposure and public reckoning

ND
Preamble Preamble
Low Framing

No structural signals

ND
Article 1 Freedom, Equality, Brotherhood

Not applicable

ND
Article 2 Non-Discrimination

Not applicable

ND
Article 3 Life, Liberty, Security

Not applicable

ND
Article 4 No Slavery

Not applicable

ND
Article 5 No Torture

Not applicable

ND
Article 6 Legal Personhood

Not applicable

ND
Article 7 Equality Before Law
Low Framing

No structural signals

ND
Article 9 No Arbitrary Detention

Not applicable

ND
Article 10 Fair Hearing

Not applicable

ND
Article 11 Presumption of Innocence

Not applicable

ND
Article 12 Privacy
Low Framing

No structural signals

ND
Article 13 Freedom of Movement

Not applicable

ND
Article 14 Asylum

Not applicable

ND
Article 15 Nationality

Not applicable

ND
Article 18 Freedom of Thought

Not applicable

ND
Article 20 Assembly & Association

Not applicable

ND
Article 21 Political Participation
Low Framing

No structural signals

ND
Article 22 Social Security
Low Framing

No structural signals

ND
Article 24 Rest & Leisure

Not applicable

ND
Article 25 Standard of Living

Not applicable

ND
Article 26 Education

Not applicable

ND
Article 28 Social & International Order

Not applicable

ND
Article 29 Duties to Community
Low Framing

No structural signals

Supplementary Signals
How this content communicates, beyond directional lean. Learn more
Epistemic Quality
How well-sourced and evidence-based is this content?
0.74 medium claims
Sources
0.8
Evidence
0.8
Uncertainty
0.6
Purpose
0.8
Propaganda Flags
3 manipulative rhetoric techniques found
3 techniques detected
loaded language
Use of strong language: 'surreptitiously and indiscriminately sucking up whatever data it can find' to describe data harvesting practices
repetition
Repeated emphasis on 'without permission or compensation' throughout article for rhetorical reinforcement
exaggeration
Extended satirical 'Hahahaha' passage exaggerates corporate hypocrisy through laughter for emotional emphasis
Emotional Tone
Emotional character: positive/negative, intensity, authority
cynical
Valence
-0.3
Arousal
0.7
Dominance
0.8
Transparency
Does the content identify its author and disclose interests?
0.92
✓ Author
More signals: context, framing & audience
Solution Orientation
Does this content offer solutions or only describe problems?
0.31 problem only
Reader Agency
0.3
Stakeholder Voice
Whose perspectives are represented in this content?
0.35 3 perspectives
Speaks: journalistsgovernment_official
About: corporationindividualsworkers
Temporal Framing
Is this content looking backward, at the present, or forward?
present short term
Geographic Scope
What geographic area does this content cover?
global
United States, China
Complexity
How accessible is this content to a general audience?
moderate medium jargon general
Longitudinal · 6 evals
+1 0 −1 HN
Audit Trail 19 entries
2026-02-28 10:42 model_divergence Cross-model spread 0.63 exceeds threshold (5 models) - -
2026-02-28 10:42 eval Evaluated by claude-haiku-4-5-20251001: +0.45 (Moderate positive) +0.16
2026-02-28 07:27 model_divergence Cross-model spread 0.63 exceeds threshold (5 models) - -
2026-02-28 07:27 eval Evaluated by claude-haiku-4-5-20251001: +0.29 (Mild positive)
2026-02-28 01:40 dlq Dead-lettered after 1 attempts: OpenAI Furious DeepSeek Might Have Stolen All the Data OpenAI Stole from Us - -
2026-02-28 01:38 rate_limit OpenRouter rate limited (429) model=llama-3.3-70b - -
2026-02-28 01:37 rate_limit OpenRouter rate limited (429) model=llama-3.3-70b - -
2026-02-28 01:36 dlq_replay DLQ message 97673 replayed to LLAMA_QUEUE: OpenAI Furious DeepSeek Might Have Stolen All the Data OpenAI Stole from Us - -
2026-02-28 00:04 eval_success Light evaluated: Moderate positive (0.50) - -
2026-02-28 00:04 eval Evaluated by llama-3.3-70b-wai: +0.50 (Moderate positive)
2026-02-27 21:32 eval_success Light evaluated: Strong positive (0.80) - -
2026-02-27 21:32 eval Evaluated by llama-4-scout-wai: +0.80 (Strong positive)
2026-02-27 21:30 eval_success Evaluated: Mild positive (0.17) - -
2026-02-27 21:30 eval Evaluated by deepseek-v3.2: +0.17 (Mild positive) 10,901 tokens
2026-02-27 21:09 dlq Dead-lettered after 1 attempts: OpenAI Furious DeepSeek Might Have Stolen All the Data OpenAI Stole from Us - -
2026-02-27 21:07 rate_limit OpenRouter rate limited (429) model=llama-3.3-70b - -
2026-02-27 21:05 rate_limit OpenRouter rate limited (429) model=llama-3.3-70b - -
2026-02-27 21:04 rate_limit OpenRouter rate limited (429) model=llama-3.3-70b - -
2026-02-27 20:54 eval Evaluated by claude-haiku-4-5: +0.68 (Strong positive)