How can AI use websites?

How Browserbase built their viral demo of Open Operator (in 24 hrs!) using Vercel's v0 and the AI SDK.

Thanks to our sponsor

This post is part of a series called "AI, it’s not that Complicated,” where we cover 1) how AI products are built and 2) what they can help you get done. Thanks to Vercel for sponsoring the series.

For most readers, browsing the internet is something you could do in your sleep. And yet, for AI models, it’s actually a highly complex, nuanced task. Newer GenAI models are getting better and better at “thinking” and complex reasoning, yes; but without the ability to navigate and use websites, they’ll be pretty much useless at automating our work for us. So how does an AI model actually use the internet?

Browserbase builds and maintains software that helps AI models do just that: browse the web. This post is going to walk through how their tech works and why it’s important.

I’ll also cover how Browserbase uses the full Vercel stack – including v0 (which I’ve already written about) and the AI SDK (which I will one day write about) – to build Open Operator, their viral public demo that lets anyone use AI to browse the web.

How we got here: better models and browsing the web

If you rewind the time machine a few years ago, LLMs from providers like OpenAI and Anthropic were really good at one specific thing: predicting the next word or sentence (token). And thus, people most commonly used them for things like summarizing text, helping you finish paragraphs (or haikus), and other things that required text generation.

Fast forward to today, and thanks mostly to advances in post training , these models have developed sophisticated, sometimes freaky capabilities that go way past word predictions. They can “think” (although not in the biological sense) and do quite complex stuff, like deep research, generating entire codebases, and completing full on workflows for you. Perhaps most importantly, they can use tools.

And it is indeed said tools that are starting to become the problem, or more accurately making AI models and the tools play nicely together. When it comes to things like web search, which most major AI model providers support now, it’s simpler: **search is an API **. You send it a request, you get a deterministic response, even if it’s not a good one.

🤔 Undefined term

Deterministic just means that there’s no randomness involved. You make a request to the API, you get a response. The responses won’t all be the same, but they’ll all be text. Non-deterministic would mean that sometimes the API returns text, sometimes it returns numbers, and sometimes it returns a birthday cake.

But browsing the web is surprisingly hard!

  1. Web browsing is non-deterministic. Websites don’t have an API, and they’re all completely different.

  2. Websites change over time. Sometimes every day. Sometimes every second.

  3. There are billions of websites out there. Yes, billions!

So for an AI model to really understand and use a website like a human would, it needs a deeper level of understanding, one that we’re only really scratching the surface of right now.

And then there’s the logistical problems: how do you actually run a browser that an AI model can use?

Remember, when you use a model like Claude, the code that…is Claude is running somewhere on a server . So for the model to use a browser, it would need to be on a server too. But you can’t just put a regular browser on a server:

  • Astute readers will probably have noticed that your browser on your laptop uses like 1,000GB of memory, which would be waaaaay too resource intensive for a server.

  • We need security measures: what’s stopping the model from downloading a virus, or clicking on one of those poorly made Viagra popups?

This, in a nutshell, is what Browserbase does. They develop a browser for AI models that’s fast, secure, and can help models actually do your work for you.

Browserbase and the AI-first browser

Most people using Browserbase today are companies building AI products, where part of their product requires reading or writing data from the internet.

A good example is Benny, a startup that helps people process food stamp reimbursements. (shocker) There’s a complicated and lengthy form process you need to fill out to get these reimbursements, but with Benny, you just upload your receipts and they take care of the rest. Behind the scenes, they use AI models and Browserbase to intelligently handle those forms for you.

OG web manipulation frameworks

If there are any QA engineers reading this, you’re probably wondering, “what about Selenium?” There are a bunch of existing (perhaps, old school) frameworks that developers use to automate work in the browser: think web scraping and product testing.

  • Puppeteer

  • Playwright

  • Selenium

But these are designed for browser testing. They let you pick a specific website, and do automations like:

  • Scroll this much down, and click this exact link

  • Click this exact button

  • Input this exact information into this field in a form

These things work great when it’s your site, or a specific site you know well (like all of you LinkedIn scrapers out there). These tools don’t work as a base layer for AI models, because they need the capability to browse every site, and do anything on each of those sites. But sometimes AI models will use these tools to generate scripts for particular sites.

How to build your own Browserbase (or, how they built it)

Put yourself in the inventor’s chair. You want to design a browser built for AI models, not humans. And it needs to be a much, much, much smaller and pared down version of the browsers you already use. What would you do?

The obvious place to focus on is the GUI, or graphical user interface. Tabs, extensions, bookmarks, back buttons, reload buttons (really any buttons)...all of that is GUI. And AI models don’t need that stuff. They also don’t need history, password managers, incognito mode, zooming in…anything related to graphics.

This is what’s called a headless browser: it has no frontend . The way you control it is through one of those browser automation frameworks we mentioned earlier, like Playwright or Selenium. You can still control the mouse, take screenshots, etc. – but you do that using code, not by being a human moving a mouse. They open sourced a big piece of this as Stagehand, too.

OpenOperator

In January of this year, OpenAI introduced Operator: the ability for ChatGPT to use a proprietary browser to do stuff on the internet. But it’s proprietary, so it only works for OpenAI models. Within 24 hours of that release, Browserbase built their own, open version of it called Open Operator. Anyone can use it to browse the web using AI. Like so:

And behind the scenes, it’s a template on GitHub you can use to build your own version with whatever AI models you want (not just ChatGPT).

browserbase's open operator

OpenOperator is built and powered by none other than Vercel, my favorite company and the sponsor of this series. It’s full stack on Vercel, from the initial designs to the current app running and deployed .

open operator in action

Using v0 to design the first version of OpenOperator

After OpenAI released Operator, the Browserbase team wanted to follow up quickly with something. But they’re backend people, and design and frontend isn’t exactly their specialty. So they used v0 to generate an initial version of what the OpenOperator UI might look like. They took the v0 output, combined with the Vercel AI SDK, and were able to release OpenOperator within 24 hours of the OpenAI announcement (not bad).

open operator's initial design with v0

Using the AI SDK to make switching between models easy

Models are getting better (seemingly) every day, so every company building AI apps needs the ability to constantly switch between and update to state of the art.

Vercel’s AI SDK helps developers build AI apps quickly and easily: if you’ve heard of frameworks like LangChain, this is in that cinematic universe. Browserbase used the AI SDK to build OpenOperator, without needing to constantly update their code to use the latest models. So if you’re using Browserbase to build your app, you can choose whichever model you want to use (and then change it frequently).

switching models with the AI SDK

Beyond v0 and the AI SDK, Open Operator is running entirely on Vercel. The code generated in v0 is running in the Vercel cloud , in a Next.js app. Instead of holding state in a database , it uses Vercel Functions to tell the model what the current step is (e.g. scrolling down) and ask what to do next. And it uses a Vercel Firewall to make sure only authorized people can access the code.

“I feel like a Vercel billboard right now”

– Paul Klein IV

According to Paul (CEO) who I tricked into talking to me for an hour, the reception for Open Operator has been surprisingly positive so far. They’ve got 1.5K GitHub stars, people are talking about it on YouTube…not bad for something that had no marketing behind it. And, fun fact, he built a lot of this on the plane traveling to his honeymoon. Dedication, folks.