I still don't buy the we needed it to be a whole Browser and not a Chrome Extension argument:
- your interface is still literally a chrome extension side panel
- none of the agentic browsers from the bigger players like Atlas and Comet really took off either
I do think the server side integration is required:
- with rtrvr.ai a ton of users are integrating our web agent chrome extension via Remote MCP from chatgpt.com as well as triggering as an API endpoint remotely. Your implementation is limited to only local connections as I understand.
- the biggest unlock for users is running at scale, so just being able to launch a hundred cloud browsers, do a task, and return results while you do other things. So we see hybrid cloud/local execution as the key unlock for this year
Your workflow pipeline is really cool! Any blog post/summary on how you set it up?
Last year was a lot of technical builders exploring the capabilities, and I am excited for this year of making these agentic browsers useful!
One simple example is an extension can't see cross origin iframes. This means it could never do soemthing like fill out a payment form for you if it's an extension.
Limited computation and action space is another as well as bot detection systems.
For example a javascript method trying to automate something like microsoft word in an iframe will have a tough time because the second you inject code in there they will block you.
We honestly haven't faced any bot detection or blocking issues. Owning the browser layer exposes to you much more detection just look at Comet getting blocked on Amazon etc.
> whole Browser and not a Chrome Extension argument
Both of us are definitely biased to think our own approach is better :)
But without owning the binary, we couldn't shipped today's feature -- Agent with access to your filesystem and being able to run shell commands like Claude Cowork.
> your interface is still literally a chrome extension side panel
Yep, our interface is a chrome extension to make iterating on the UX faster. But it uses a ton of C++ APIs that we expose under `chrome.browseros.*`
> Your workflow pipeline is really cool! Any blog post/summary on how you set it up?
> But without owning the binary, we couldn't shipped today's feature -- Agent with access to your filesystem and being able to run shell commands like Claude Cowork
Chrome Extension can also access local files and can also execute LLM generated code in sandboxes
> we're adding browser-level guardrails (think IAM for agents)
This sounds interesting, but where would I go to see these guardrails and their implementation? I tried searching in the repository and couldn't find them.
At the chromium level, you have access to every single DOM element and coordinate space around it. So, when a click happens either user or agent, we have a neat way of enforcing required action (either allow it or nullify the click).
We are still at early version. And mostly targeting enterprise sites (like SAP) which don't change that often.
What would be great is if it could work in the browser like Claude in chrome and communicate (with my control) back to objects on my desktop like my ide for example or really anything
Ohh, interesting, technically this should already be possible. Because we already package gemini-cli into the sidecar (bun) binary. We just have to create a good UX.
What angle are you looking at this from? Is it for convenience? Or do you not like terminal UI and need a web-friendly UI for these agents?
Good question. We think the browser is becoming the new OS. It doesn’t really matter anymore if you’re on Windows, macOS, or Linux—the browser is where most work already happens.
We see a future where it’s the main gateway to everything, and where agents live and work alongside you inside the browser. That’s why we call it BrowserOS. :)
Is this really true? Mobile device users are all mostly forced to use apps rather than the browser for most stuff, and people on desktop PCs/laptops are probably either using them for gaming (all desktop apps), or work where a lot of stuff is desktop apps.
Sure regular consumer stuff like social media is webapps (if they're not mobile only), and if you're interacting with like salesforce or a customer support tracker or an issue tracker or something you're likely using a webapp, but the move to mobile devices for most consumer stuff means that people still using PCs are largely power users.
I didn't hear back there, but huzzah, it looks like this is in there. I'm glad to see it!
Yes, we expose BrowserOS as an MCP server -- that you can use from claude code, cursor, opencode, etc -- https://docs.browseros.com/features/use-with-claude-code
MCP server works out of box (unlike Chrome DevTools MCP which requires tricky setup).
I still don't buy the we needed it to be a whole Browser and not a Chrome Extension argument:
- your interface is still literally a chrome extension side panel
- none of the agentic browsers from the bigger players like Atlas and Comet really took off either
I do think the server side integration is required:
- with rtrvr.ai a ton of users are integrating our web agent chrome extension via Remote MCP from chatgpt.com as well as triggering as an API endpoint remotely. Your implementation is limited to only local connections as I understand.
- the biggest unlock for users is running at scale, so just being able to launch a hundred cloud browsers, do a task, and return results while you do other things. So we see hybrid cloud/local execution as the key unlock for this year
Your workflow pipeline is really cool! Any blog post/summary on how you set it up?
Last year was a lot of technical builders exploring the capabilities, and I am excited for this year of making these agentic browsers useful!
One simple example is an extension can't see cross origin iframes. This means it could never do soemthing like fill out a payment form for you if it's an extension.
Limited computation and action space is another as well as bot detection systems.
For example a javascript method trying to automate something like microsoft word in an iframe will have a tough time because the second you inject code in there they will block you.
Sounds like a skill issue, our web agent is able to interact with cross origin iframes to for example solve captchas: https://www.youtube.com/watch?v=LD3afouKPYc
We honestly haven't faced any bot detection or blocking issues. Owning the browser layer exposes to you much more detection just look at Comet getting blocked on Amazon etc.
> whole Browser and not a Chrome Extension argument
Both of us are definitely biased to think our own approach is better :)
But without owning the binary, we couldn't shipped today's feature -- Agent with access to your filesystem and being able to run shell commands like Claude Cowork.
> your interface is still literally a chrome extension side panel
Yep, our interface is a chrome extension to make iterating on the UX faster. But it uses a ton of C++ APIs that we expose under `chrome.browseros.*`
> Your workflow pipeline is really cool! Any blog post/summary on how you set it up?
Thanks! We'll look into publishing a blog soon!
Chrome Extension can also access local files and can also execute LLM generated code in sandboxes
This sounds interesting, but where would I go to see these guardrails and their implementation? I tried searching in the repository and couldn't find them.
What use case did you have? Happy to show a demo of current version we have (you can hit me up on discord or slack -- links available on our repo)
> how is it reliably enforced?
At the chromium level, you have access to every single DOM element and coordinate space around it. So, when a click happens either user or agent, we have a neat way of enforcing required action (either allow it or nullify the click).
We are still at early version. And mostly targeting enterprise sites (like SAP) which don't change that often.
What use case did you have in mind?
What angle are you looking at this from? Is it for convenience? Or do you not like terminal UI and need a web-friendly UI for these agents?
We see a future where it’s the main gateway to everything, and where agents live and work alongside you inside the browser. That’s why we call it BrowserOS. :)
Sure regular consumer stuff like social media is webapps (if they're not mobile only), and if you're interacting with like salesforce or a customer support tracker or an issue tracker or something you're likely using a webapp, but the move to mobile devices for most consumer stuff means that people still using PCs are largely power users.
Precisely. I think most knowledge work (especially at business) still happens browser. That is the workflow we want to target!