Powering NVIDIA Multi-Tenant AI Clouds: Why Zadara is the Ideal Software Platform

powering nvidia blog post

Welcome to our new blog series diving into how Zadara is uniquely positioned to bring Software Reference Architecture for Multi-Tenant Inference Clouds to life. With NVIDIA’s blueprint for multi-tenant generative AI infrastructure now public, it’s time to look at how cloud providers can implement this vision in the real world. At Zadara, we believe we are your ideal partner to do just that. In this series, we will explore various components of the reference architecture—from GPU networking to control plane isolation—and show how Zadara makes it possible.

Let’s begin by looking at the big picture: what NVIDIA’s reference design demands, and why Zadara is already built to meet those needs.

Understanding the Requirements of NVIDIA’s Reference Architecture

NVIDIA’s software reference architecture is a comprehensive framework designed to help cloud service providers deliver scalable, secure, and high-performance AI infrastructure. At its core, the architecture supports:

  • True Multi-Tenancy: Complete isolation between customers across the full stack (compute, storage, networking, and orchestration).
  • AI-Centric Infrastructure: Optimization for AI workloads, beyond just GPU training—including inference, data processing, databases, and orchestration layers.
  • Dynamic Resource Allocation: Ability to provision and scale resources (GPUs, CPUs, storage, networking) per tenant and per workload.
  • Tenant-Controlled Kubernetes Environments: Each customer should operate within their own Kubernetes control plane, ensuring maximum flexibility and control.
  • Support for Edge and Core Deployments: Architecture must support low-latency deployments near the user as well as centralized cloud operations.

Moreover, as AI models grow in complexity, inference itself is becoming more compute-intensive—especially for workloads focused on reasoning, such as decision trees, planning, or code generation. These reasoning models often require larger memory footprints and longer GPU execution times, making dynamic, high-performance resource allocation a necessity—not just for training, but for real-time inference as well.

These requirements are brought together using NVIDIA hardware and software components like Spectrum-X for high-performance networking, BlueField-3 DPUs for offloaded and secure networking, and NVIDIA AI Enterprise software for AI operations.

Why Zadara is the Right Fit

Zadara was built from the ground up as a multi-tenant cloud, aligning natively with NVIDIA’s recommendations. Here’s how Zadara meets and exceeds the reference design expectations:

  1. Native Multi-Tenancy:
    Zadara offers built-in tenant isolation for compute, storage, and networking. Each tenant operates in their own secure slice of the infrastructure, with policy-driven access controls.

  2. Full-Stack Workload Support:
    Modern AI workloads are not limited to GPUs. Zadara supports all the workloads that are the building blocks of AI/ML environments, including databases, vector search engines, and Kubernetes control plane components.

  3. Per-Tenant Kubernetes Environments:
    Zadara enables the deployment of dedicated Kubernetes control planes per tenant. This meets the architectural recommendation for control plane separation and provides unmatched flexibility.

  4. Elastic Resource Allocation:
    With Zadara, compute, storage, and GPU resources can be allocated dynamically to tenants and workloads. This ensures efficient use of infrastructure and responsive scaling.

  5. Global Edge Presence:
    With 500+ Zadara powered edge locations operating in more than 25 countries worldwide, operated by 200+ regional partners, Zadara brings AI closer to the user. This supports low-latency inference across a spectrum of workloads—from RAG-based LLMs to more demanding reasoning tasks. It also meets data residency requirements, another key reference design principle.

 

What’s Next in the Series?

In our next posts, we’ll deep-dive into specific NVIDIA technologies and how Zadara enables them:

  • Spectrum-X and GPU Networking: Building the high-performance data plane for AI.
  • BlueField DPUs: Enabling secure and accelerated networking, thin hypervisors, separation of control plane and runtime planes.
  • Kubernetes Control Plane Isolation: How Zadara supports per-tenant K8s orchestration at scale.

    Stay tuned as we explore each of these key components. The future of multi-tenant AI cloud is here—and it runs on Zadara.
Picture of Simon Grinberg

Simon Grinberg

Simon Grinberg is a technology and product leader focused on cloud infrastructure and virtualization. He took his first steps in the virtualization and cloud space at Qumranet, the company behind KVM, and continued as a product manager for Red Hat Enterprise Virtualization after its acquisition. He later joined Stratoscale, and went on to found Neokarm to pursue a new vision for cloud-native virtualization. Simon joined Zadara through its acquisition of Neokarm and is passionate about building scalable systems that solve real-world problems.

Share This Post

More To Explore