This is an automated archive made by the Lemmit Bot.
The original was posted on /r/selfhosted by /u/badhiyahai on 2025-01-05 09:05:19+00:00.
Hello self-hosters! ๐
We are working on a self-hostable open source alternative for Computer Use. We have gotten success with OpenAI, Gemini and Molmo recently (not much with Llama) in controlling phones.
It can draft a gmail to a friend asking for lunch
, find bus stops using google maps app/browser, start a 3+2 game on lichess etc. Demos are in the GitHub repository.
The goal is to make everything work with local models, we are half-way there.
We use Planner
๐ค to sketch out the plan of action. Then Finder
๐ finds the coordinates of the elements and then Executor clicks on the element / navigates etc.
For the Finder
, we can use local model Molmo
and for the Planner
we can bring your own API keys.
For the Planner
you can use Gemini Flash
for now as it is free for 15 calls/min which should be enough for automating anything. But in my testingGPT 4o / Gemini Pro > Gemini Flash\
https://github.com/BandarLabs/clickclickclick
Will be happy to hear your thoughts ๐