https://news.ycombinator.com/threads?id=SkylerJi
https://news.ycombinator.com/threads?id=zwenbo
We, at NonBioS.ai [AI Software Dev], built something like this from scratch for Linux VM's, and it was a heavy lift. Could have used you guys if had known about it. But can see this being immediately useful at a ton of places.
I reckon I could run this for buying fashion drops, is this a use case y'all have seen?
I don’t know if this is a problem you’ve faced, but I’m curious: how do LLM tool devs handle authn/authz? Do host apps normally forward a token or something? Is there a standard commonly used? What if the tool needs some permissions to act on the user’s behalf?
I'm also working on a blog post that touches on this - particularly in the context of giving agents long-term and episodic memory. Should be out next week!
The LLM interacts with the VM through a structured virtual computer interface (cua-computer and cua-agent). It’s a high-level abstraction that lets the agent act (e.g., “open Terminal”, “type a command”, “focus an app”) and observe (e.g., current window, file system, OCR of the screen, active processes) in a way that feels a lot more like using a real computer than parsing raw data.
So under the hood, yes, screen+metadata are used (especially with the Omni loop and visual grounding), but what the model sees is a clean interface designed for agentic workflows - closer to how a human would think about using a computer.
If you're curious, the agent loops (OpenAI, Anthropic, Omni, UI-Tars) offer different ways of reasoning and grounding actions, depending on whether you're using cloud or local models.
https://github.com/trycua/cua/tree/main/libs/agent#agent-loo...
Second, as a user, you’d want to handle the case where some or all of these have been fully compromised. Surreptitiously, super-intelligently, and partially or fully autonomously, one container or many may have access to otherwise isolated networks within homes, corporate networks, or some device in a high security area with access to a nuclear weapons, biological weapons, the electrical grid, our water supply, our food supplies, manufacturing, or even some other key vulnerability we’ve discounted, like a toy.
While providing more isolation is good, there is no amount of caution that can prevent calamity when you give everyone a Pandora’s box. It’s like giving someone a bulletproof jacket to protect them from fox tapeworm cancer or hyper-intelligent, time-traveling, timespace-manipulating super-Ebola.
That said, it’s the world we live in now, where we’re in a race to our demise. So, thanks for the bulletproof jacket.
First time: it opened a MacOS VM and started to do stuff, but it got ahead of itself and starting typing things in the wrong place. So now that VM has a Finder window open, with a recent file that's called
plt.ylabel('Price(USD)').sh
The second and third times, it launched the VM but failed to do anything, showing these errors: INFO:cua:VM run response: None
INFO:cua:Waiting for VM to be ready...
INFO:cua:Waiting for VM macos-sequoia-cua_latest to be ready (timeout: 600s)...
INFO:cua:VM status changed to: stopped (after 0.0s)
DEBUG:cua:Waiting for VM IP address... Current IP: None, Status: stopped
DEBUG:cua:Waiting for VM IP address... Current IP: None, Status: stopped
DEBUG:cua:Waiting for VM IP address... Current IP: None, Status: stopped
INFO:cua:VM status changed to: running (after 12.4s)
INFO:cua:VM macos-sequoia-cua_latest got IP address: 192.168.64.2 (after 12.4s)
INFO:cua:VM is ready with IP: 192.168.64.2
INFO:cua:Initializing interface for macos at 192.168.64.2
INFO:cua.interface:Logger set to INFO level
INFO:cua.interface.macos:Logger set to INFO level
INFO:cua:Connecting to WebSocket interface...
INFO:cua.interface.macos:Waiting for Computer API Server to be ready (timeout: 60s)...
INFO:cua.interface.macos:Attempting WebSocket connection to ws://192.168.64.2:8000/ws
WARNING:cua.interface.macos:Computer API Server connection lost. Will retry automatically.
INFO:cua.interface.macos:Still waiting for Computer API Server... (elapsed: 10.0s, attempts: 11)
INFO:cua.interface.macos:Still waiting for Computer API Server... (elapsed: 20.0s, attempts: 21)
INFO:cua.interface.macos:Still waiting for Computer API Server... (elapsed: 30.0s, attempts: 31)
WARNING:cua.interface.macos:Computer API Server connection lost. Will retry automatically.
INFO:cua.interface.macos:Still waiting for Computer API Server... (elapsed: 40.0s, attempts: 41)
INFO:cua.interface.macos:Still waiting for Computer API Server... (elapsed: 50.1s, attempts: 51)
ERROR:cua.interface.macos:Could not connect to 192.168.64.2 after 60 seconds
ERROR:cua:Failed to connect to WebSocket interface
DEBUG:cua:Computer initialization took 76856.09ms
ERROR:agent.core.agent:Error in agent run method: Could not connect to WebSocket interface at 192.168.64.2:8000/ws: Could not connect to 192.168.64.2 after
60 seconds
WARNING:cua.interface.macos:Computer API Server connection lost. Will retry automatically.
This was using the gradio interface, with the agent loop provider as OMNI and the model as gemma3:4b-it-q4_K_MThese versions:
cua-agent==0.1.29
cua-computer==0.1.23
cua-core==0.1.5
cua-som==0.1.3
Stay tuned - we're also releasing support for UI-Tars-1.5 7B this week! It offers excellent speed and accuracy, and best of all, it doesn't require bounding box detection (Omni) since it's a pixel-native model.
Feel free to ping me on Discord (I'm francesco there) - happy to hop on a quick call to help debug: https://discord.com/invite/mVnXXpdE85
thank you e forza Cua
We're designing with that in mind: think fine-grained permissioning, auditability, and minimizing surface area. But it’s still early, and a lot of it depends on how teams end up using CUAs in practice.
- Open-source from the start. Cua’s built under an MIT license with the goal of making Computer-Use agents easy and accessible to build. Cua's Lume CLI was our first step - we needed fast, reproducible VMs with near-native performance to even make this possible.
- Native macOS support. As far as we know, we’re the only ones offering macOS VMs out of the box, built specifically for Computer-Use workflows. And you can control them with a PyAutoGUI-compatible SDK (cua-computer) - so things like click, type, scroll just work, without needing to deal with any inter-process communication.
- Not just the computer/sandbox, but the agent too. We’re also shipping an Agent SDK (cua-agent) that helps you build and run these workflows without having to stitch everything together yourself. It works out of the box with OpenAI and Anthropic models, UI-Tars, and basically any VLM if you’re using the OmniParser agent loop.
- Not limited to Linux. The hosted version we’re working on won’t be Linux-only - we’re going to support macOS and Windows too.
In the meantime, I’ll give this a shot on macOS tonight. Congrats!
Also, let us know on Discord once you’ve tried out c/ua locally on macOS: https://discord.com/invite/mVnXXpdE85
(I am not affiliated)
Also, is the project still active? No commits for 2 months is odd for a YC startup in current batch :)
Would love to chat sometime!
Feel free to join our Discord so we can chat more: https://discord.com/invite/mVnXXpdE85
Also built something on top of Browser Use (Nanobrowser) and Docker.
https://github.com/reindent/nanomachine
Just finished planning and shell capabilities
Lets chat @reindentai (X)