Vercel Eve: An AI Agent Is Just a Folder of Files
Vercel's open-source Eve makes an AI agent a folder of files: instructions, tools, and skills that ship as a durable service. How it works and when to use it.

Half the commits landing on Vercel's own platform are now written by agents, not people. That was the number Guillermo Rauch led with at Ship in London: agent-triggered commits jumped from under 3% to more than 50% in six months. Which raises an awkward question. If agents write that much of the code, what does an agent actually look like as a project you can read, review, and deploy?
Vercel shipped its answer the same day. It's called Eve, and the whole bet is that an agent should be a folder of files.
What Eve is
Eve launched on June 17 at Vercel Ship 2026 in London. It's open-source under Apache-2.0, TypeScript all the way through, and in public preview now. The framing people keep reaching for is "Next.js for agents," and that's the right instinct. Next.js took a pile of React, a router, a bundler, and a server you used to wire together by hand, and gave it a convention. Eve is trying to do the same thing for the loop you write every time you build an agent.
If you've ever built an LLM agent from scratch, you know that loop: a model, a list of tools, and a while-loop that calls the model, runs whatever tool it asked for, feeds the result back, and repeats until the agent says it's done. The model part is easy now. The annoying part is everything around it. Where does state live. What happens when the process restarts mid-task. How do you let a human approve a risky step. How do you test the thing. Eve's pitch is that those should be conventions, not code you write again on every project.
An agent is a directory of files
Here's the core idea, and it's almost too simple. You make a folder. The folder is the agent.
agent/
agent.ts # model + settings
instructions.md # the system prompt: who this agent is
tools/ # TypeScript functions the agent can call
get_weather.ts
skills/ # Markdown domain knowledge, loaded on demand
refund-policy.md
subagents/ # agents this one can hand work off to
channels/ # where it listens: Slack, Discord, GitHub...
schedules/ # cron jobs for autonomous runsEve scans that directory, validates what's inside, and compiles it into a manifest that runs as a durable service on Vercel Functions. Nothing is registered in code. The filesystem is the config. Drop a file in tools/ and the agent can call it. Drop a Markdown file in skills/ and it becomes domain knowledge the agent pulls in when it's relevant.
The two files you'll touch most are the instructions and a tool. One is plain prose, the other is plain TypeScript.
You are a support assistant for a weather app.
Be brief. When someone asks about conditions in a city,
call the get_weather tool and summarize the result in
one sentence. If the city is ambiguous, ask which one
they mean before calling anything.That tool is just a function, and tool calling is the mechanism underneath: the model decides it needs weather, Eve runs run(), and the JSON goes back into the conversation. The exact export names are in the docs, but the shape is the point. A tool is a file, not a registration ceremony.
Quick check
In Eve, what is an 'agent'?
Durable by default
This is the part worth stealing even if you never touch Eve.
Every conversation in Eve is a durable workflow. Each step gets checkpointed, so a session can pause, survive a crash or a redeploy, and pick up exactly where it left off. An agent that's waiting on a human approval or a slow API doesn't sit there burning a function invocation. It parks. State is saved, compute stops, and it wakes back up when the next event arrives.
Why durable execution matters for agents
A normal serverless function has a time limit and dies when it returns. That's fine for a request that finishes in a second. It's useless for an agent that might wait three hours for a human to click approve, or run a chain of twenty tool calls where step 14 hits a flaky API. Checkpointing each step means the long-running, stop-and-start nature of agent work stops fighting the platform it runs on. The work resumes instead of restarting.
If you've built an agent on raw functions, you've felt this gap. You end up bolting on a queue, a state table, and a retry scheme to fake durability. Eve makes that the floor instead of a project.
What you get without writing it
The directory gives you six things by default. Durable execution is the one above. The rest:
- Sandboxed compute for code the agent generates or runs, using Vercel Sandbox in production and a local sandbox while you develop.
- Human-in-the-loop approvals, where the agent parks on a risky action and waits for a person, with no compute ticking the whole time.
- Subagents so one agent can hand a sub-task to another instead of cramming every job into one giant prompt.
You also get OpenTelemetry tracing that exports to the usual places like Datadog or Honeycomb, and a built-in evals system so you can score an agent against test cases instead of eyeballing whether your last prompt tweak made it worse. Tracing and evals are the two things people skip when they roll their own, and they're exactly the two that tell you whether the thing actually works.
Does it hold up? Vercel's own fleet
Vercel didn't ship a demo. It says it runs more than a hundred agents on Eve in production, and it named a few:
| Agent | Job | What Vercel claims |
|---|---|---|
| d0 | Data analyst in Slack | 30,000+ questions answered a month |
| Lead Agent | Autonomous sales rep | ~$5k/year cost against a ~$160k return |
| Athena | RevOps tool | Built by non-engineers in six weeks |
| Vertex | Support | Resolves 92% of tickets on its own |
Two more numbers from the keynote put the scale in context. That jump to over half of commits being agent-written is one. The other is token volume through Vercel's AI Gateway, which the company says went from roughly 2 trillion to 20 trillion a month over the same six months. This is the same shift where coding agents quietly became the default way teams ship, now pointed at Vercel's own internal tools.
These are the vendor's own numbers
A 92% resolution rate and a 32x ROI are Vercel describing Vercel, on a platform Vercel is selling. Read them as direction, not as a benchmark you'll hit. The useful signal isn't the exact percentage. It's that the company built real internal tools on this and is willing to put names and figures on them, which is more than most framework launches do.
The catch
Eve is genuinely nice DX, and it's also a vendor framework that's happiest on its maker's infrastructure. Durable execution, the sandbox, the approvals, the parking model all assume Vercel Functions and Vercel Sandbox underneath. It's open-source and the code is TypeScript you can read, but "open-source" and "easy to run anywhere else" are not the same promise. Plan for that before you build something load-bearing on it.
It's also a public preview, which means the API will move and the docs are ahead of the scars. And the folder convention, as clean as it is, doesn't make the hard parts disappear. Writing good instructions.md, designing tools that fail safely, and building evals that catch real regressions is still the actual work. Eve removes the plumbing. It does not write the agent for you.
What I'd actually do
If you want to understand agents, don't start with Eve. Build the loop yourself first so you know what the folder is hiding. Once you've felt the pain of managing state and retries by hand, Eve's conventions read as relief instead of magic.
If you have a real internal tool in mind, a Slack bot that answers questions over your own data is the obvious first project, and it's close to what d0 does. Try it on Eve, keep the agent logic in plain functions you could lift out later, and watch how much the durable-by-default model saves you. Even if you never adopt the framework, copy the idea: agents are long-running and stop-and-start by nature, so build them on something that checkpoints and resumes instead of something that times out and dies.
The bet underneath all of this is that agents are about to be ordinary software, written and shipped like any other service. Turning one into a folder you can read in a code review is a real step toward making that true.

Written by
Rhythm Bhiwani
Engineer and relentless builder, happiest reverse-engineering hard problems until they click.
Enjoyed this?
Tap the heart to leave some love.
Be the first to react
Comments
Join the conversation.
Loading comments…


