+0.40 Ask HN: Have top AI research institutions just given up on the idea of safety?

Name: HRCB Evaluation: Ask HN: Have top AI research institutions just given up on the idea of safety?
Item: Ask HN: Have top AI research institutions just given up on the idea of safety?
Rating: 0.52
Author: HN HRCB

Model: deepseek/deepseek-v3.2-20251201 +0.40 @cf/meta/llama-4-scout-17b-16e-instruct lite +0.40 @cf/meta/llama-3.3-70b-instruct-fp8-fast lite +0.20 Compare

+0.40	Ask HN: Have top AI research institutions just given up on the idea of safety?
	81 points by DietaryNonsense 4 days ago \| 89 comments on HN \| Moderate positive Contested Community · v3.7 · 2026-02-28 23:44:19 0

Summary Free Expression Neutral

This is a user-submitted question on Hacker News inquiring about the sincerity of AI safety efforts within labs and institutions. The content directly engages with Article 19 (freedom of opinion and expression) by posing a question and soliciting insider discussion. The evaluation is neutral overall, as the majority of UDHR articles are not addressed by the brief post.

Article Heatmap

Negative Neutral Positive No Data

Aggregates

Editorial Mean	+0.40	Structural Mean	+0.60
Weighted Mean	+0.52	Unweighted Mean	+0.52
Max	+0.52 Article 19	Min	+0.52 Article 19
Signal	1	No Data	30
Volatility	0.00 (Low)
Negative	0	Channels	E: 0.6 S: 0.4
SETL ℹ	-0.35	Structural-dominant
FW Ratio ℹ	57%	4 facts · 3 inferences

Evidence 2% coverage ℹ

  1M   30 ND 

Theme Radar

HN Discussion 20 top-level · 13 replies

akersten 2026-02-25 15:05 UTC link

Not an insider but someone who uses the tools. It's a branding update, nothing more. The models haven't gotten any less sanctimonious, but the companies behind them have stopped harping on their restrictions in order to appeal to a broader customer base (gov contracts, etc.)

So the guardrails (for you and me) are still there. They just stopped committing the unforced error of excluding themselves from federal procurement. Under a different administration, the requirement might change, and you might see them boasting once more on "safety."

nemomarx 2026-02-25 15:05 UTC link

Well, Anthropic clearly has some kind of lines if their recent argument with the us government is anything to go by. "don't kill humans" isn't all of safety or alignment goals but it is something?

CivBase 2026-02-25 15:06 UTC link

The goal is and always was to make as much money as possible. Any consideration for how it affects actual people was marketing to get ahead of bad PR and regulation.

Safety was never a genuine concern. They simply don't benefit from marketing themselves that way anymore so they've stopped pretending.

caconym_ 2026-02-25 15:08 UTC link

At this point, do you really think any of these "labs" would give up competitive parity or advantage just because they're already making life worse for a lot of people and (by their own admission) stand to make it much, much, much worse? The persistence forecast says no.

There are maybe a few token exceptions, like Anthropic's current pushback against the DoD, but by and large I think we can continue to expect them to pay lip service to safety while continuing to build toward systems that, by their own admission, have incredible potential to cause harm. As you noted, the fact that they employ safety researchers does not necessarily mean that they will put safety over revenue.

nkohari 2026-02-25 15:08 UTC link

I don't think they've given up on the idea, but as AI becomes increasingly mainstream, the labs will be under immense pressure to hold the line. We're seeing this play out right now with Anthropic and the Pentagon.

These companies have raised eye-watering amounts of funding, and will need to continue to do so for the foreseeable future. They're not yet self-sustaining, and this insecurity increases the pressure for them to compromise on ideals.

With that said, there is a massive war for top talent, and I think that the employees at the labs would become increasingly uncomfortable with their work being used for Bad Things. If Anthropic capitulates to the Pentagon, it wouldn't surprise me to see a mass exodus of talent occur.

leptons 2026-02-25 15:17 UTC link

We're not in a race to the bottom with "AI", we're in a speedrun to the bottom.

tchalla 2026-02-25 15:27 UTC link

Are you asking about top AI research institutions or leading AI businesses? There’s tons of work in research communities.

AndrewKemendo 2026-02-25 15:32 UTC link

I was the author for the practitioners implementation section for the IEEE 7010 standard for assessing human impact from AI software

https://standards.ieee.org/ieee/7010/7718/

I also worked closely with Jack Clark at OpenAI before he disappeared on all these issues as CTO back in 2018

There are literally zero “AI labs” that have ever cared about “safety”

none of them have ever done anything tangible with any kind of independent auditable third-party way that has some defined reference baseline for what is safe and what is not, how to evaluate it, or a practitioners guidance for how to determine what it is and what is not safe as a designer.

They follow the same rules as every other technology platform: do as much as you can legally get away with no more no less

I say this as somebody who’s been actively involved in the AI “safety” debate for a long time now at least since 2013

The concept itself doesn’t even make sense if you fully understand the intersectional scope of technology and society

Societies demands are the things that are unsafe not the technologies themselves

Just like Bertrand Russell said “as long as war exists all technologies will be utilized for it” - you can replace “war” for anything that you think is unsafe

chasd00 2026-02-25 15:33 UTC link

"safe" is such a subjective concept to begin with, have any of the model providers ever defined what they mean by "safe"?

It doesn't mean much to me if a safe model is one that does not output the recipe for mustard gas, that information is trivially available elsewhere.

Or, is a safe model one that doesn't come off as racist? Ok but i would classify that as unoffensive instead of safe but I admit definitions of words can be fluid and change.

Is a safe model one that refuses to produce code for a weapons system? Well.. does a PID controller count? I can use that to keep a gun pointed at a target or i can use that to prevent a baby rocker from falling over.

Maybe they're giving up on "safe" because there's no definitive way to know if a model is safe or not. I've always held the opinion that ai safety was more about brand safety. Maybe now the model providers can afford some bad press and it not be the death of their company.

dasil003 2026-02-25 15:40 UTC link

Safety is a nice idea but it’s not structurally pursuable at this point. Everything is moving too quickly and we don’t exactly know what is useful or not, just like we don’t know what’s safe or not.

Anyone pursuing safety will be outcompeted by someone who isnt. Given the amount of investments there is no patience for any calls to slow down. I tend to believe this won’t actually end in disaster as I don’t think it’s actually economical to put AI everywhere with enough real control that we can’t manage the risks as they evolve, but it’s a low confidence prediction.

vasco 2026-02-25 15:43 UTC link

A safety team at the hammer company cannot prevent me from using it to bang your head.

You can align to the user wants and so you are a hammer. This is alignment>safety.

Or you take a safety first approach where the AI decides what safe is and does its own bidding instead of yours. This is safety>alignment.

I prefer hammers to be honest. Mostly because humans can be prosecuted, AIs can't. So if the human wants to commit crime with the AI it should be able to, because the opposite turns to dystopia fast.

blamestross 2026-02-25 15:48 UTC link

"AI Safety" got suborned, then dropped when it wasn't needed anymore.

Every misalignment/AI safety paper is basically a metaphor for how corporate values can misalign with actual human values under capitalism.

The first thing that happened when "AI Safety" became useful to corporate interests, is that the "goal" of it instantly became "profitability" not safety. "AI Safety" became about liability minimization, not actual safety for humanity. (Look! the system is now misaligned with the goal, wonder how that happened!?)

AI Safety concerns were instantly proven true, it happened, and now we live in the world where it is too late to prevent the superintelligences that we call "corporations" from paper-clipping us to death in pursuit of profit.

program_whiz 2026-02-25 15:50 UTC link

Humans can't develop safety until there is enough blood in the streets. Only issue with AI is that threshold may come at a point where its too far gone to recover. But humans can't put in seatbelts until we're losing 40k people per year in car crashes. Unfortunately its just how we're wired. Those that are careful are outcompeted by the brash and the fast-moving, until the relative value of moving fast is removed, then we consider the value of making things safe. We didn't start with safe electricity, we started by killing lots of people and starting lots of fires. Many many years later, we ended up with electrical codes and standards.

The AI proponents who originally spoke of safety did so because they are aware of the dangers. However they, like all of us, are not able to change human nature or society. Molloch will drag them into the most dangerous game or eliminate them from the competition. Only with time, death, and damage (and many lawsuits) will any measure of safety be gained. The righteous will say "see we said AI was dangerous!" but that will be the only satisfaction they can have, many years after the damage is done.

If we want to speedrun safety, the only real mechanism is to make legal recourse more viable (e.g. $1M penalty per copyright infringement, $100M per AI-related death, etc.). If this was the case, lawyers self-interest and greed will compete with the self-interest and greed of the AI corps, balancing the risk (but there is no altruistic route to solving this).

WarmWash 2026-02-25 15:53 UTC link

Safety means slower and this is viewed as a winner takes all game.

This isn't new either, the safety glass cracked the day OpenAI publicly launched ChatGPT. "Safety" was (and perhaps still is) a fall back for the models plateauing and LLMs failing to really make an impact..."we need more time while we focus on safety"

But after this latest round of models, it's a lot more fuel on the "this could be it" fire. Labs are eager to train on the new gigawatt scale datacenters coming online, and it's very hard to make a case right now that the we won't get another step-change up in capability. Safety just obstructs all that.

stared 2026-02-25 15:55 UTC link

If there is a VC-backed for-profit company, the core part is how much value something brings.

"Safety" here works for both PR and hiring (a lot of talented engineers and researchers might flock to it), and maybe soft power for legislation. Compare and contrast with "Don't be evil" by Google.

I do not say that individual employees do not care about safety - many do. And well, a lot don't, what is very visible during this OpenClaw mania.

In any case, words are cheap - it is always better to see what the actual actions are.

some_random 2026-02-25 15:56 UTC link

Also an outsider, but my perspective is that "safety" has always been a nebulous term for a variety of concepts. No AI institution will ever give up on alignment because "the AI does what you want it to" is a pure functionality thing. On the other end of the scale there's a censorship aspect to it where models will refuse to provide wikipedia level information because it's "dangerous". The latter is very much subject to the whims of the labs, politicians, journalists, etc.

totetsu 2026-02-25 16:26 UTC link

Safety has been dropped from Ai safety institutes

https://www.commerce.gov/news/press-releases/2025/06/stateme...

https://www.gov.uk/government/news/tackling-ai-security-risk...

Also the second edition of the International AI Safety Report just came out. https://internationalaisafetyreport.org/publication/internat...

neko_ranger 2026-02-25 19:00 UTC link

Alignment of AI is hard, and aligned to whom? I just finished the safety chapter in Stripe Press's "An Oral History of AI", and there's a good quote in it: "It's an interesting question, how to tell the difference between a hallucination and deception." (I'll let you figure out who said it, you know their name).

marviio 2026-02-26 22:43 UTC link

Yes. Ten years ago I would say there was a consensus in the ML community that if we got really powerful AI, it should be kept isolated in controlled environments (no internet, no way to execute code) until it could be trusted/verified. Fast forward: openclaw. People don’t seem to care, why should the labs?

mbellotti 2026-02-27 14:52 UTC link

I'm probably too late to the party for this comment to matter but: what the AI community pushes as "Safety" isn't actually Safety. Read Sidney Dekker. Read Nancy Leveson. Read Jens Rasmussen. Safety is not building perfect technology that never makes a mistake.

When I was working in defense technology I had two questions for engineers when we talked about Safety:

1) Can the operator assess the risk of using this technology? 2) If something goes wrong during operations, can the operator mitigate the risk?

The degree to which either of those statements is true is a measure of how safe that technology is. Technology that is simple to understand and executes deterministically every single time and where it is obvious if it is malfunctioning and the operator has enough time to either correct it or stop it, is generally perceived as safe. Technology that hides what it is really doing, confuses the operator about what the effects of operating it might be, and either executes faster than the operator can respond or specifically prevents the operator from responding, is more likely to trigger negative safety outcomes.

The problem the AI industry faces is that tricking the operator into thinking the technology is doing something it is not is explicitly part of their business model. Read any of the mentioned authors (Dekker is probably the best starting point) and it will become obvious why AI Safety is impossible when AI is dependent on pretending to "think" and "reason". In order to be safe they would have to abandon that. If they abandon that, they will be unable to raise the capital they need to keep the bubble from bursting. The technology will survive, maybe with another AI winter, but many of the businesses will not.

So they will abandon the lip service about Safety instead, but then that was never real Safety to begin with. Real Safety is not about zero risk. It is just as impossible to have zero risk as it is to have 100% uptime. Real Safety is about how the technology is designed to manage risk as part of an overall system.

MattDaEskimo 2026-02-25 15:28 UTC link

Surely safety does not exclusively mean guardrails, but the philosophy and ethics instilled during training?

toddmorey 2026-02-25 15:29 UTC link

I don't think it's sanctimonious to say, hey, I don't want the technology I work on to be used for targeting decisions when executing people from the sky. Especially as the tech starts to play more active roles. You know governments will be quick to shift blame to the model developers when things go wrong.

LordHumungous 2026-02-25 15:45 UTC link

What if I tell the model to go commit fraud or crimes and it complies? What if users are having psychotic episodes driven by their interactions with the model?

Just because safety is a hard and messy problem doesn't mean we should just wash our hands of it.

bluecheese452 2026-02-25 15:50 UTC link

Those are some really interesting questions. To me giving a mustard gas receipt to someone with no intent to use it is unlikely to be dangerous. On the other hand some particularly inflammatory racial propaganda in an area with simmering ethnic tensions is very likely to be dangerous.

But give that same recipe to a wannabe terrorist and suddenly it is dangerous. Context matters, not just the information.

Goofy_Coyote 2026-02-25 15:56 UTC link

Where can I find some of these researches? Any links or pointers are very much appreciated.

Everything I find by searching is marketing BS, or the same half-baked prompt injection protection that only works for cherry picked problems.

Really need some help here finding the right communities.

wongarsu 2026-02-25 15:57 UTC link

My preferred version of "safe" is "in its actions considers and mostly upholds usually unstated constraints like 'don't kill unless necessary', 'keep Earth inhabitable', 'avoid toppling society unless really well justified for the greater good', etc. The kind of framing that was prevalent pre-ChatGPT. Not terribly relevant for a chat software, but increasingly important as chat models turn into agents.

Of course once you have that framing, additional goals like "don't give people psychosis", "don't give step-by-step instructions on making explosives, even if wikipedia already tells you how to do it" or "don't harm our company's reputation by being racist" are conceptually similar.

On the other hand "don't make weapon systems" or "never harm anyone" might not be viable goals. Not only because they are difficult to impossible to define, but also because there is huge financial and political pressure not to limit your AI in that way (see Anthropic)

terminalshort 2026-02-25 15:58 UTC link

If we had rules like that in the past we never would have had the industrial revolution.

Goofy_Coyote 2026-02-25 15:58 UTC link

Can you elaborate this part please?

> The concept itself doesn’t even make sense if you fully understand the intersectional scope of technology and society Societies demands are the things that are unsafe not the technologies themselves

Where can I learn more about it?

some_random 2026-02-25 15:59 UTC link

>Is a safe model one that refuses to produce code for a weapons system? Well.. does a PID controller count? I can use that to keep a gun pointed at a target or i can use that to prevent a baby rocker from falling over.

I've been using LLMs for some cyber-y tasks and this is exactly how it ends up going. You can't ask "hack this IP" (for some models), but more discrete tasks it'll have no such qualms.

chasd00 2026-02-25 16:00 UTC link

if would be super helpful if you could give the elevator pitch version of what a safe AI is.

pjc50 2026-02-25 16:07 UTC link

> I can use that to keep a gun pointed at a target or i can use that to prevent a baby rocker from falling over.

This leads to what I'm going to call the "Ender's Game" approach: if your AI is uncooperative just present it with a simulation that it does like but which maps onto real-world control that it objects to.

> I've always held the opinion that ai safety was more about brand safety

Yes. The social media era made that very important. The extent to which brand safety is linked to actual, physical safety then becomes one of how you can manage the publicity around disasters. And they're doing a pretty good job of denying responsibility.

ASalazarMX 2026-02-25 19:55 UTC link

Research was being done slower, until OpenAI forced everyone to jump the gun or potentially be left behind. For a few months it looked like everyone was light years behind them.

tehjoker 2026-02-26 03:10 UTC link

All of those examples given are due to the prerogatives of capitalism, not because of human nature.

Editorial Channel

What the content says

+0.40

Article 19 Freedom of Expression

Medium Coverage

Editorial

+0.40

SETL

-0.35

The post is an expression of opinion and a call for insider discussion on AI safety, exercising freedom of expression.

Preamble Preamble

No observable content engages with the UDHR's foundational principles.

Article 1 Freedom, Equality, Brotherhood

No observable content engages with equality in dignity and rights.

Article 2 Non-Discrimination

No observable content engages with non-discrimination.

Article 3 Life, Liberty, Security

No observable content engages with the right to life, liberty, or security.

Article 4 No Slavery

No observable content engages with slavery.

Article 5 No Torture

No observable content engages with torture.

Article 6 Legal Personhood

No observable content engages with recognition as a person.

Article 7 Equality Before Law

No observable content engages with equal protection.

Article 8 Right to Remedy

No observable content engages with effective remedy.

Article 9 No Arbitrary Detention

No observable content engages with arbitrary detention.

Article 10 Fair Hearing

No observable content engages with fair hearings.

Article 11 Presumption of Innocence

No observable content engages with presumption of innocence.

Article 12 Privacy

No observable content engages with privacy.

Article 13 Freedom of Movement

No observable content engages with freedom of movement.

Article 14 Asylum

No observable content engages with asylum.

Article 15 Nationality

No observable content engages with nationality.

Article 16 Marriage & Family

No observable content engages with marriage and family.

Article 17 Property

No observable content engages with property.

Article 18 Freedom of Thought

No observable content engages with freedom of thought.

Article 20 Assembly & Association

No observable content engages with assembly and association.

Article 21 Political Participation

No observable content engages with political participation.

Article 22 Social Security

No observable content engages with social security.

Article 23 Work & Equal Pay

No observable content engages with work rights.

Article 24 Rest & Leisure

No observable content engages with rest and leisure.

Article 25 Standard of Living

No observable content engages with standard of living.

Article 26 Education

No observable content engages with education.

Article 27 Cultural Participation

No observable content engages with cultural participation.

Article 28 Social & International Order

No observable content engages with a social order.

Article 29 Duties to Community

No observable content engages with duties to community.

Article 30 No Destruction of Rights

No observable content engages with destruction of rights.

Structural Channel

What the site does

+0.60

Article 19 Freedom of Expression

Medium Coverage

Structural

+0.60

Context Modifier

0.00

SETL

-0.35

The Hacker News platform structurally enables user-generated posts and comments, facilitating the free exchange of opinion.

Preamble Preamble

No structural features engage with human dignity, freedom, or foundational rights.

Article 1 Freedom, Equality, Brotherhood

No structural features engage with equality, dignity, or rights.

Article 2 Non-Discrimination

No structural features engage with discrimination.

Article 3 Life, Liberty, Security

No structural features engage with life, liberty, or security.

Article 4 No Slavery

No structural features engage with slavery.

Article 5 No Torture

No structural features engage with torture.

Article 6 Legal Personhood

No structural features engage with legal personhood.

Article 7 Equality Before Law

No structural features engage with equal protection.

Article 8 Right to Remedy

No structural features engage with effective remedy.

Article 9 No Arbitrary Detention

No structural features engage with arbitrary detention.

Article 10 Fair Hearing

No structural features engage with fair hearings.

Article 11 Presumption of Innocence

No structural features engage with presumption of innocence.

Article 12 Privacy

No structural features engage with privacy.

Article 13 Freedom of Movement

No structural features engage with freedom of movement.

Article 14 Asylum

No structural features engage with asylum.

Article 15 Nationality

No structural features engage with nationality.

Article 16 Marriage & Family

No structural features engage with marriage and family.

Article 17 Property

No structural features engage with property.

Article 18 Freedom of Thought

No structural features engage with freedom of thought.

Article 20 Assembly & Association

No structural features engage with assembly and association.

Article 21 Political Participation

No structural features engage with political participation.

Article 22 Social Security

No structural features engage with social security.

Article 23 Work & Equal Pay

No structural features engage with work rights.

Article 24 Rest & Leisure

No structural features engage with rest and leisure.

Article 25 Standard of Living

No structural features engage with standard of living.

Article 26 Education

No structural features engage with education.

Article 27 Cultural Participation

No structural features engage with cultural participation.

Article 28 Social & International Order

No structural features engage with a social order.

Article 29 Duties to Community

No structural features engage with duties to community.

Article 30 No Destruction of Rights

No structural features engage with destruction of rights.

Supplementary Signals

How this content communicates, beyond directional lean. Learn more

Epistemic Quality ℹ

How well-sourced and evidence-based is this content?

0.46 low claims

Sources		0.0
Evidence		0.4
Uncertainty		0.8
Purpose		1.0

Propaganda Flags ℹ

No manipulative rhetoric detected

0 techniques detected

Emotional Tone ℹ

Emotional character: positive/negative, intensity, authority

measured

Valence		-0.1
Arousal		0.3
Dominance		0.2

Transparency ℹ

Does the content identify its author and disclose interests?

0.00

✗ Author

More signals: context, framing & audience

Solution Orientation ℹ

Does this content offer solutions or only describe problems?

0.28 problem only

Reader Agency

0.2

Stakeholder Voice ℹ

Whose perspectives are represented in this content?

0.10 1 perspective

Speaks: individuals

About: institutionindividuals

Temporal Framing ℹ

Is this content looking backward, at the present, or forward?

present unspecified

Geographic Scope ℹ

What geographic area does this content cover?

unspecified

Complexity ℹ

How accessible is this content to a general audience?

accessible low jargon general

Longitudinal 121 HN snapshots · 3 evals

Audit Trail 23 entries

2026-02-28 23:44	eval_success	Evaluated: Moderate positive (0.52)	- -
2026-02-28 23:44	model_divergence	Cross-model spread 0.32 exceeds threshold (3 models)	- -
2026-02-28 23:44	eval	Evaluated by deepseek-v3.2: +0.52 (Moderate positive) 8,129 tokens
2026-02-28 23:31	eval_failure	Evaluation failed: AbortError: The operation was aborted	- -
2026-02-28 23:29	eval_failure	Evaluation failed: AbortError: The operation was aborted	- -
2026-02-28 23:28	eval_failure	Evaluation failed: AbortError: The operation was aborted	- -
2026-02-28 23:26	eval_failure	Evaluation failed: AbortError: The operation was aborted	- -
2026-02-28 23:24	eval_failure	Evaluation failed: AbortError: The operation was aborted	- -
2026-02-28 23:22	eval_failure	Evaluation failed: AbortError: The operation was aborted	- -
2026-02-28 23:21	eval_failure	Evaluation failed: AbortError: The operation was aborted	- -
2026-02-28 23:17	eval_failure	Evaluation failed: AbortError: The operation was aborted	- -
2026-02-28 23:14	eval_failure	Evaluation failed: AbortError: The operation was aborted	- -
2026-02-28 23:12	eval_failure	Evaluation failed: AbortError: The operation was aborted	- -
2026-02-28 23:09	eval_failure	Evaluation failed: AbortError: The operation was aborted	- -
2026-02-28 23:07	eval_failure	Evaluation failed: AbortError: The operation was aborted	- -
2026-02-28 23:05	eval_failure	Evaluation failed: AbortError: The operation was aborted	- -
2026-02-28 23:01	eval_failure	Evaluation failed: AbortError: The operation was aborted	- -
2026-02-28 23:00	eval_failure	Evaluation failed: AbortError: The operation was aborted	- -
2026-02-28 22:59	eval_failure	Evaluation failed: AbortError: The operation was aborted	- -
2026-02-28 22:56	eval_failure	Evaluation failed: AbortError: The operation was aborted	- -
2026-02-28 22:53	eval_failure	Evaluation failed: AbortError: The operation was aborted	- -
2026-02-28 05:40	eval	Evaluated by llama-4-scout-wai: +0.40 (Moderate positive)
2026-02-28 05:22	eval	Evaluated by llama-3.3-70b-wai: +0.20 (Mild positive)

build 1ad9551+j7zs · deployed 2026-03-02 09:09 UTC · evaluated 2026-03-02 11:31:12 UTC