Home » Blog » Behind NVIDIA’s Vera Launch: How AI Agents Are Reshaping Data Center CPUs Behind NVIDIA’s Vera Launch: How AI Agents Are Reshaping Data Center CPUs

Nvidia Vera CPU

Behind NVIDIA’s Vera Launch: How AI Agents Are Reshaping Data Center CPUs (Image Credit: NVIDIA)

Between May 15 and May 18, 2026, NVIDIA’s Ian Buck hand-delivered the first production Vera CPU systems to Anthropic, OpenAI, and SpaceX AI on Friday, followed by Oracle Cloud Infrastructure on Monday. It was a literal white-glove launch — a vice president walking server boxes into customer lobbies with a screwdriver in his pocket. The optics were unusual. The signal underneath them is more interesting than the news itself.

Vera is NVIDIA’s first custom CPU, and it is built specifically for the kind of work AI agents do inside a data center. That framing matters. NVIDIA has spent a decade defining itself as the GPU company. Shipping a CPU — and pitching it as a “new multi-billion dollar business,” as Jensen Huang described it at GTC San Jose in March 2026 — is a tell about where the AI workload is moving and what’s bottlenecking it now.

This post is less about the chip itself and more about the shift it points to: how the rise of agentic AI is quietly rewriting what a data center CPU is expected to do.

From training-heavy to agent-heavy: the workload has changed

For most of the last four years, AI data centers were built around one workload: training. Racks of GPUs running massive matrix multiplications for weeks at a time, with the CPU acting as a relatively quiet conductor — scheduling jobs, moving data, keeping the operating system happy. The CPU mattered, but it wasn’t the limiting factor.

That balance is now shifting. According to Deloitte’s 2026 technology predictions, inference workloads are projected to consume roughly two-thirds of AI compute in 2026, up from about one-third in 2023. Industry analyses report that inference now accounts for more than half of AI-optimized infrastructure spending, with the inference-chip market alone passing $50 billion in 2026.

The reason isn’t only that inference is cheaper to scale than training. It’s that what enterprises do with AI has changed. Gartner forecasts that 40% of enterprise applications will embed task-specific AI agents by the end of 2026, up from less than 5% in 2025. Deloitte projects that the share of enterprises using generative AI that also deploy autonomous AI agents will double from 25% in 2025 to 50% by 2027. Agents are no longer experimental — they are starting to handle real workflow automation.

That shift puts a different kind of pressure on the data center. Training is throughput-dominated. Agentic inference is latency-sensitive and orchestration-heavy. The two workloads stress different parts of the rack.

Why CPUs are back on the critical path

When a single AI agent runs, it isn’t just one giant GPU call. It is a sequence: read a prompt, plan, call a tool, retrieve a document, summarize the result, decide what to do next, call another model, write a file, return an answer. Many of those steps are sequential, branching, and short. They live on the CPU.

NVIDIA’s own framing makes this explicit. In the Vera launch post, Ian Buck put it this way: “AI agents don’t run on GPUs alone. Every agentic sandbox, every tool call, every orchestration layer, every long-context retrieval operation — that’s CPU work.” When the CPU stalls on those steps, the GPU sitting next to it goes idle, and the most expensive component in the rack stops earning its keep.

This is what NVIDIA means when it calls Vera a response to “a new CPU moment in the AI factory.” The bottleneck isn’t raw GPU throughput anymore. It’s how quickly the CPU can keep dozens of concurrent agent sessions fed: scheduling tool calls, managing long-context memory, shuttling data to and from the accelerator, and handling the control-flow logic that orchestrators like LangGraph, CrewAI, and the major agent frameworks generate by the thousand.

That set of demands — high concurrency, high memory bandwidth, low latency, tight coupling with the GPU — is what Vera was designed to prioritize.

Is the CPU for AI agents a new kind of CPU?

This is the question worth answering plainly, because the marketing around Vera can read as though AI agents need a fundamentally new species of processor. They don’t.

Vera is still a general-purpose CPU. According to NVIDIA, it uses 88 custom Arm-based “Olympus” cores with 1.2 TB/s of memory bandwidth and roughly 50% faster per-core performance under full load. Independent specs reported by VideoCardz cover the broader Vera Rubin NVL72 platform: 36 Vera CPUs paired with 72 Rubin GPUs in a single rack, up to 1.5 TB of LPDDR5X per CPU, and 260 TB/s of NVLink fabric bandwidth. These are extreme numbers, but they sit on the same architectural lineage as the Arm server CPUs that have powered hyperscale data centers for years.

What Vera changes is the emphasis. A typical Intel Xeon or AMD EPYC is built to handle a broad mix of enterprise workloads — databases, virtualization, web tier, the lot. Vera narrows the target: orchestration, tool-calling, agent sandboxing, long-context state management, and feeding GPUs without stalling. NVIDIA also tightly couples Vera to its Rubin GPU through a second-generation NVLink-C2C coherent-memory interconnect, so the CPU and GPU effectively share memory rather than copying data across a PCIe bus. That coherence is one of the few places where Vera offers something a commodity x86 CPU cannot easily replicate today.

NVIDIA isn’t the only vendor chasing this kind of CPU–GPU integration. AMD has been pursuing a different version of the same idea with the Instinct MI300A, an APU that packs 24 Zen 4 CPU cores and 228 CDNA 3 GPU compute units onto a single package with 128 GB of fully unified HBM3 memory. The architectural choice is different — AMD bets on monolithic integration, NVIDIA on separate dies coupled by NVLink-C2C — but the underlying problem is the same: how to remove the latency cost of CPU–GPU data movement on AI workloads. Two different approaches, one direction of travel.

So the honest framing is this: AI agents do not require a new category of CPU. They run on Xeon and EPYC every day. But on a workload where the CPU and GPU need to talk constantly and quickly, a CPU that has been co-designed with the GPU has structural advantages — and that is the gap Vera, MI300A, and the next generation of accelerated systems are trying to fill. As we covered in The End of Intel’s Monopoly: How AMD and ARM Are Quietly Redrawing the Data Center Map, the server CPU landscape has been shifting for a while; Vera is the latest entrant in that broader competition. Expect Intel, AMD, and the Arm cloud-server ecosystem (Graviton, Axion, Cobalt) to keep responding with their own variations on the same theme.

What this signals for the hardware market

A few near-term implications are worth keeping in view.

The first is that the gap between AI infrastructure and traditional enterprise infrastructure is widening. Oracle has announced plans to deploy hundreds of thousands of Vera CPUs beginning in 2026, and the other major hyperscalers are expected to follow with their own Vera Rubin deployments through the year. Hyperscalers are now buying CPU and GPU as a co-designed unit, not as separately sourced parts. That sustained, concentrated buying pressure is one of the reasons server CPU supplies has been so tight — a dynamic we explored in Why AI Is Causing a Global CPU Shortage.

The second is that refresh cycles in AI data centers are short. The same labs that were running Hopper-class GPUs a couple of years ago, and Blackwell-class systems more recently, are now taking delivery of Vera Rubin. Each generation arrives on roughly an eighteen-month cadence. The hardware that powered the previous generation does not disappear — it gets decommissioned, redeployed, or sold into the secondary market. For operators of mid-size data centers, enterprise IT teams, and resellers, that flow is where supply meets demand.

The third is a question every infrastructure team will eventually face: when does it make sense to refresh, and when does the older generation still earn its keep? The answer depends on the workload. A team training models from scratch sees a clear ROI in moving to the latest GPU. A team running agentic inference at modest scale may get plenty of mileage out of last-generation CPUs and GPUs — provided the rest of the stack (memory, networking, orchestration software) is tuned for the new workload pattern.

The takeaway

Vera is one chip, but it is a useful marker. The AI conversation has spent years focused almost entirely on GPUs. The arrival of a purpose-built data center CPU for agents — delivered by a VP, by hand, to four of the most consequential AI customers in the world — is a clear sign that the workload itself is changing. Inference, orchestration, and tool use are now first-class infrastructure concerns. The CPU is back on the critical path.

For anyone planning hardware purchases, evaluating a fleet, or thinking about what to do with retiring servers, the practical takeaway is the same: the next phase of AI infrastructure will reward operators who think about the full system rather than just chasing accelerator headlines. The chips at the top of the stack are getting more specialized. The hardware below them is moving faster than it used to. If your organization is sitting on a previous generation of Intel or AMD inventory from an earlier refresh, the secondary market is still tight — a sensible moment to sell CPU hardware while demand remains strong.

 


BuySellRam.com is a BBB A+ rated ITAD provider that buys and sells data center hardware, including GPUs, memory, SSDs, networking equipment and test equipment. If you are planning an infrastructure refresh and want to recover value from the hardware being retired, reach out for a quote.