Technology

81611 readers

4295 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

372

Microsoft 365's buggy Copilot 'Chat' has been summarizing confidential emails for a month — yet another AI privacy nightmare (www.windowscentral.com)

submitted 10 hours ago by throws_lemy@reddthat.com to c/technology@lemmy.world

26 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] TRBoom@lemmy.zip 17 points 8 hours ago (7 children)

Unless someone has released something new while I haven't been paying attention, all the gen AIs are essentially frozen. Your use of them can't impact the actual weights inside of the model.

If it seems like it's remember things is because of the actual input of the LLM is larger than the input you will usually give it.

For instance lets say the max input for a particular LLM is 9096 tokens. The first part of that will be instructions from the owners of the LLM to prevent their model from being used for things they don't like. Lets say the first 2000 tokens. That leaves 7k or so for a conversation that will be 'remembered'.

Now if someone was really savvy, they'd have the model generate summaries of the conversation and stick them into another chunk of memory, maybe another 2000 tokens worth, that way it will seem to remember more than just the current thread. That would leave you with 5000 tokens to have a running conversation.

[–] dgdft@lemmy.world 17 points 8 hours ago* (last edited 8 hours ago) (6 children)

Your general understanding is entirely correct, but:

Microsoft is almost certainly recording these summarization requests for QA and future training runs; that’s where the leakage would happen.

[–] SirHaxalot@nord.pub -1 points 6 hours ago (4 children)

That is kind of assuming the worst case scenario though. You wouldn’t assume that QA can read every email you send through their mail servers ”just because ”

This article sounds a bit like engagement bait based on the idea that any use of LLMs is inherently a privacy violation. I don’t see how pushing the text through a specific class of software is worse than storing confidential data in the mailbox though.

That is assuming that they don’t leak data for training but the article doesn’t mention that.

[–] edm@thelemmy.club 3 points 4 hours ago* (last edited 4 hours ago)

Always assume the worst, I gaurentee it is usually that bad in reality. Companies absolutely hate spending money on IT and security is always an after thought. API logs for the production systems that contain your full legal name, DOB, SSN, and home address? Yea wide open and accessible by anyone. Production databases with employee SSN, address, salary information? Same thing, look up how much the worthless management is making and cry.

Booz Allen just got shit on because of the dude they hired who specifically sought out consulting for the IRS so he could steal Trumps IRS records.

https://home.treasury.gov/news/press-releases/sb0371

https://en.wikipedia.org/wiki/Charles_E._Littlejohn

load more comments (3 replies)

load more comments (4 replies)