this post was submitted on 16 Apr 2024
82 points (95.6% liked)
Apple
20410 readers
1 users here now
Welcome
to the largest Apple community on Lemmy. This is the place where we talk about everything Apple, from iOS to the exciting upcoming Apple Vision Pro. Feel free to join the discussion!
Rules:
- No NSFW Content
- No Hate Speech or Personal Attacks
- No Ads / Spamming
Self promotion is only allowed in the pinned monthly thread
Communities of Interest:
Apple Hardware
Apple TV
Apple Watch
iPad
iPhone
Mac
Vintage Apple
Apple Software
iOS
iPadOS
macOS
tvOS
watchOS
Shortcuts
Xcode
Community banner courtesy of u/Antsomnia.
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
So it will very slowly find some results on the web for you.
you’d be surprised how fast a model can be if you narrow the scope, quantize, and target specific hardware, like the AI hardware features they’re announcing.
not a 1-1, but a quantized Mistral 7B runs at ~35 tokens/sec on my M2. that’s not even as optimized as it could be. it can write simple scripts and do some decent writing prompts.
they could get really narrow in scope (super simple RAG, limited responses, etc), quantize down to even something like 4 bit, and run it on custom accelerated hardware. it doesn’t have to reproduce Shakespeare, but i can imagine a PoC that runs circles around Siri in semantic understanding and generated responses. being able to reach out on Slack to the engineers that built the NPU stack ain’t bad neither.