this post was submitted on 01 Mar 2026
37 points (100.0% liked)

Technology

2397 readers
466 users here now

Tech related news and discussion. Link to anything, it doesn't need to be a news article.

Let's keep the politics and business side of things to a minimum.

Rules

No memes

founded 10 months ago
MODERATORS
 

Anthropic, the wildly successful AI company that has cast itself as the most safety-conscious of the top research labs, is dropping the central pledge of its flagship safety policy, company officials tell TIME.

In 2023, Anthropic committed to never train an AI system unless it could guarantee in advance that the company’s safety measures were adequate. For years, its leaders touted that promise—the central pillar of their Responsible Scaling Policy (RSP)—as evidence that they are a responsible company that would withstand market incentives to rush to develop a potentially dangerous technology.

But in recent months the company decided to radically overhaul the RSP. That decision included scrapping the promise to not release AI models if Anthropic can’t guarantee proper risk mitigations in advance.

you are viewing a single comment's thread
view the rest of the comments
[–] certified_expert@lemmy.world 1 points 1 month ago (1 children)

Yeah, not in predicting, but they could do analysis of the generated output and filter. The so called "guardrails"

[–] XLE@piefed.social 1 points 1 month ago* (last edited 1 month ago)

The problem is the filtration algorithm is basically flaky in the same way as the LLM itself, and probably is an LLM. And even if it does work, I've never heard a single soul say that Anthropic shut down their account due to questionable prompts. I even ran into somebody here who claims he uses AI to work on sexual abuse cases; he says that he's been stalled by the chatbot, but he's never been blocked even for review.