this post was submitted on 30 Jan 2025

153 points (94.7% liked)

Technology

77608 readers

2089 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

153

Deepseek Database Exposed (thehackernews.com)

submitted 10 months ago* (last edited 10 months ago) by TheMachineStops@discuss.tchncs.de to c/technology@lemmy.world

16 comments fedilink hide all child comments

all 17 comments

sorted by: hot top controversial new old

[–] Redacted@lemmy.world 78 points 10 months ago (1 children)

AI is clearly no match for little Bobby Tables.

[–] PrivacyDingus@lemmy.world 9 points 10 months ago

the boy is all grown up

[–] lowleveldata@programming.dev 73 points 10 months ago

Did OpenAI and Microsoft ask for my permission? I don't think so

[–] MagicShel@lemmy.zip 42 points 10 months ago (2 children)

both OpenAI and Microsoft are probing whether DeepSeek used OpenAI's application programming interface (API) without permission to train its own models on the output of OpenAI's systems, an approach referred to as distillation.

That would definitely show up in the quality of responses. Surely they have better and cheaper training sources...

[–] monotremata@lemmy.ca 4 points 10 months ago

I think it's reasonably likely. There was a research paper about how to do basically that a couple years ago. If you need a basic LLM trained on a specialized form of input and output, getting the expensive existing LLMs to generate that text for you is pretty efficient/inexpensive, so it's a reasonable way to get a baseline model. Then you can add stuff like chain of reasoning and mixture of experts to improve the performance back up to where you need it. It's not going to be a way to push the state of the art forward, but it's sure a cheap way to catch up to models that have done that pushing.

[–] coherent_domain 28 points 10 months ago (1 children)

LOL, their code is probably written by AI.

[–] dabaldeagul@feddit.nl 11 points 10 months ago

Considering that they actively recruit young and inexperienced people to work for 'm, there's a big chance, yeah.

[–] autonomoususer@lemmy.world 13 points 10 months ago* (last edited 10 months ago) (2 children)

After removing ChatGPT, anti-libre software, my data never leaves my control.

[–] WhyJiffie@sh.itjust.works 6 points 10 months ago (1 children)

only if it would be so easy. think about your data that's taken about you and you can't refuse. healthcare, home ownership, if you're still learning then a bunch of data about your progress, and maybe even your handwriting

[–] autonomoususer@lemmy.world 1 points 10 months ago (2 children)

Where's your solution?

[–] TseseJuer@lemmy.world 1 points 10 months ago

only solution to not having data harvested is to not have even been born. YW

[–] WhyJiffie@sh.itjust.works 1 points 10 months ago

Unfortunately I don't have one, other than a long term plan of eating the rich. But the issue is there and we shouldn't ignore it.

[–] Mac@mander.xyz 0 points 10 months ago (1 children)

Lemmy.world admins have your data right here, what are you on about?

[–] autonomoususer@lemmy.world 4 points 10 months ago* (last edited 10 months ago)

Tell us, how many of my posts here are not public?

[–] Cyber@feddit.uk 8 points 10 months ago* (last edited 10 months ago)

I smell politics here over ethical hacking

Normally, when vulnerabilities are found, the responsible steps are to disclose to the site owner first before waiting for them to resolve it (ie 90 days).

I didn't see that mentioned in Wiz's article - which is showing their data & links to the vulnerabilities.

[–] Boomkop3@reddthat.com 2 points 10 months ago

That was quick