How a sandbox works: dynamically creating infrastructure at runtime#
OK, so a sandbox is just isolated infrastructure. How is that different from a virtual machine (VM), the foundational building block of every cloud provider like AWS? And how do you actually use a sandbox?
There are many answers, but I will focus on 3 here because they’re the most important, and because I’ve grown tired of writing.
- Sandboxes are at a different scale than VMs
For internal coding agents, the number of sandboxes you need is pretty manageable. An engineer might have a few, maybe even dozens, of agents working on different features that each needs a sandbox.
But for RL, we are talking about a completely different scale. To quote myself from another post:
“At scale, RL training becomes a problem of orchestrating massive numbers of sandboxes. One of Modal’s customers, a major AI lab, is already running on the order of 100,000 concurrent sandboxes for RL workloads, with a stated goal of reaching 1 million.”
In RL, your goal is for the young coding agent to explore different ways of writing code and find the best ones. The more sandboxes you can spin up means the more “shots on goal” your agent has to figure that out, ergo the better your result will be and the faster you can get there.
Now. 1 million might be on the higher end of what demand looks like now. But even for more pedestrian numbers like 100K, you are already way out of band of what you can realistically do with VMs. AWS will not let you create one hundred thousand virtual machines in one fell swoop.
This is why in practice, sandbox providers like Modal will put multiple sandboxes on one VM. And in many cases, the underlying cloud provider will have multiple VMs on one server. And then multiple servers in a rack…and multiple racks in a data center…it’s all coming together baby.
- Sandboxes have thick walls
Unlike VMs, which are built to communicate with other machinery like databases and APIs, sandboxes need to be mostly self sufficient. By default all of their networking ports are closed. In many cases they are forbidden from communicating with any outside systems at all (e.g. not API calls). You can change all of these settings, but the defaults reflect the solitary confinement ethos.
- Sandboxes are defined and spun up differently than VMs
For most use cases, VMs are things that you create with the intention of running for a while. Let’s say you’re building a new email client, or a competitor to X, or B2B software for dog groomers. For all of these, you will need ongoing backend infrastructure for your app that you will run indefinitely, basically until you go out of business (or switch providers).
Sandboxes, on the other hand, are ephemeral. They’re designed to be spun up, used for a very short period of time, and then demolished. Short here could mean only a few seconds, more typically a few minutes, and almost never longer than a few hours.
And because short is short, they’re built differently than VMs (which are here for a good time and a long time). They need to spin up much, much faster than a VM, which can take minutes. They don’t need as much attached storage, since they’re here for a short time. And other design decisions like these.
Sandbox providers: an exercise in translation#
For my final act before releasing you, let’s stress test this post by taking a look at a couple of sandbox providers and how they pitch their product.
First, here is Daytona. I like them because I was at a talk that the CEO gave once, and he was wearing a Platinum Day Date, which let’s be honest is pretty sick.
- Lightning-fast infrastructure for AI development: you can spin up these sandboxes really fast.
- Separated & isolated runtime protection: what happens in the sandbox stays in the sandbox. Untrusted code can’t break out.
- Massive parallelization for concurrent AI workflows: you can spin up a metric fuckton of sandboxes at the same time.
And here is Modal’s same but different.
- Built for concurrency: you can spin up a metric fuckton of sandboxes at the same time.
- Fast on any image: you can spin up these sandboxes really fast.
- Deep GPU or CPU capacity: this is more of a reflection of Modal’s impressive work on GPU infra, but unlike some other providers they can get you GPUs for your sandboxes.
Like I said, there are many, many sandbox providers now that the market for them has seemingly coalesced.
So next time your hear about sandboxes and you think about the rectangular playdates of yore, and think to yourself:
“Wait a minute, I don’t have a child. I’m not even married. In fact, I’m not even a human being…I’m a horseshoe crab.”
Let this post be your guide.