KubeCon 2025: The Enterprise AI Infrastructure Moment Has Arrived
KubeCon Atlanta wasn't just another cloud native conference this year. Something shifted. The hallway conversations, the keynotes, the vendor pitches — they all pointed in the same direction: enterprises are done sending their data to someone else's infrastructure to run AI. They're bringing it home.
Three days of sessions and a lot of coffee later, the theme that stuck with us most was data sovereignty. Not as a compliance checkbox, but as a strategic imperative.
The numbers are hard to ignore
The CNCF shared some stats that put this in perspective: 52% of cloud native developers are now running AI workloads, with another 18% planning to. The ecosystem has grown to over 230 projects, 275,000 contributors, and 3.35 million code commits. This isn't a niche anymore.
Data sovereignty stopped being optional
The same concerns kept coming up across industries, regardless of the session or the vendor on stage.
Healthcare teams can't send patient data to cloud AI services. HIPAA doesn't allow it, and patients wouldn't accept it even if it did. Financial services need complete audit trails and zero data egress — their models are competitive advantages, not commodity workloads. EU manufacturers are staring at the Cyber Resilience Act deadline (December 2027) and realizing they need compliant infrastructure now, not in Q3 of next year.
One of the more compelling demos came from Deloitte: lightweight AI agents running at the edge using Kubernetes, Ollama, and K3s. These agents process insurance claims locally, where the data already lives. No cloud round-trip, no privacy concerns. When you're looking at $265+ billion in annual claims processing errors, running inference closer to the data starts to make a lot of sense.
The pattern is consistent: run AI where your data lives, not where it's convenient for the cloud provider.
The technology stack finally caught up
What's different about 2025 is that the tools to actually do this are production-ready now. A year ago, "run AI on-prem" meant cobbling together half a dozen immature open source projects and hoping they worked together. That's changed.
DRA is GA and it matters
Dynamic Resource Allocation shipped as generally available in Kubernetes 1.34. This is a big deal for anyone running GPU workloads. The old model — statically allocating entire GPUs to workloads that only use them 30% of the time — was spectacularly wasteful.
AMD showed 50-70% efficiency gains through intelligent memory partitioning using DRA with Kueue and fractional GPU resources. If you're paying for on-prem GPUs, that efficiency improvement translates directly to money.
Multi-cluster is no longer theoretical
Bloomberg and Huawei presented real Karmada deployments managing thousands of Kubernetes clusters. Federated resource quotas across heterogeneous capacity, multi-cluster queuing with Volcano, federated HPA for cross-cluster autoscaling, application-level failover with automated migration. For enterprises with multiple data centers and edge locations, this solves the coordination problem that used to require custom tooling and a prayer.
AI inference infrastructure got serious
Two projects stood out for model serving.
AIBrix, newly donated to the CNCF by ByteDance, tackles inference challenges that matter at scale: KV cache offloading for memory management, multimodal serving, intelligent request routing, and support across NVIDIA, AMD, and Huawei Ascend hardware.
KServe graduated to CNCF status with v0.17. The new LLMInferenceService CRD, disaggregated serving architecture, and multi-cluster inference gateway with dynamic routing make it a credible production option for teams that don't want to build their own serving layer.
Combine these with MinIO for storage and PostgreSQL with pgvector, and you've got a complete, production-grade AI stack built entirely on open source.
Platform engineering is becoming its own discipline
The CNCF released a Platform Maturity Assessment framework, and the findings confirmed what most of us already knew: platform teams are under-resourced, under-appreciated, and struggling with adoption.
The conference kept returning to three principles that seem obvious but are rarely followed:
- API-first self-service. Developers shouldn't need to file tickets to get infrastructure.
- Compliance built in, not bolted on.
- Standardize on patterns, not just tools.
For AI infrastructure specifically, platform teams are now expected to provide model serving, vector databases, GPU management, and RAG pipelines as internal services. That's a significant scope expansion from "manage the Kubernetes clusters." The teams that treat their platform as a product — with users, feedback loops, and roadmaps — are the ones seeing adoption. Everyone else is building shelfware.
The compliance timeline is real
If you're doing business in the EU or serving EU customers, the Cyber Resilience Act demands attention:
- June 11, 2026: Governments and assessment bodies must be ready
- December 11, 2027: Full regulation applies
While services and many device types are excluded, commercial products incorporating open source software are affected. For enterprises deploying AI, this means implementing security.txt for vulnerability disclosure, generating SBOMs for every release, establishing CVE reporting processes, and maintaining audit trails for model training and inference.
Organizations treating compliance as a design principle will have a meaningful head start. Retrofitting compliance onto existing systems is expensive and disruptive — and the deadline isn't moving.
Cloud inference economics are shifting
With inference overtaking training as the dominant AI workload, cloud providers are rewriting their playbooks. Sessions from Google Cloud, AWS, Azure, and Oracle all focused on inference-optimized infrastructure and competitive pricing.
Azure was first to deploy the new Vera Rubin NVL72 systems and has rolled out hundreds of thousands of liquid-cooled Grace Blackwell GPUs globally in under a year — a genuine logistics achievement. Google Cloud and NVIDIA announced a deeper co-engineering partnership, with the AI Hypercomputer front and center as an inference-optimized IaaS platform.
For enterprise buyers, the vendor selection discussion has shifted from "who has the most GPUs" to who offers the best tokens-per-dollar at production scale, with the governance and observability to match.
What we took away from the week
The technology for enterprise AI infrastructure is production-ready. DRA solves GPU waste. Karmada solves multi-cluster coordination. KServe and AIBrix solve model serving. The compliance timelines are set and not moving.
The organizations that will be in the best position are the ones investing in platform teams now, treating data sovereignty as a first-order concern, and running the GPU ownership economics before defaulting to cloud APIs for everything.
If you left KubeCon feeling comfortable about your organization's AI infrastructure posture, it might be worth a second look.
What's your organization's approach to AI infrastructure ownership? We'd love to hear how others are thinking about this. Get in touch to continue the conversation.
Stay in the loop
Get the latest updates on our progress, product news, and insights.
Related Insights
NVIDIA GTC 2026 Recap
A firsthand look at NVIDIA GTC 2026 in San Jose — covering the keynote highlights, the rise of Agentic AI, data readiness as the real competitive moat, robotics breakthroughs, cloud inference economics, AI security gaps, and what it all means for enterprise AI strategy.
AI Made the Codebase Feel Shared
AI coding tools changed more than productivity for our team. By lowering the cost of contribution across the codebase, they shifted collaboration from implementation details to intent — and made the whole system feel like shared ground.