NVIDIA's $4M CNCF Bet, Karpenter Regression, GKE Model Load in 14min

Based on the research already gathered in the workflow, here is the synthesized newsletter issue:

Kyverno and Dragonfly both reached CNCF Graduated status at KubeCon Amsterdam; Fluid and Tekton moved to Incubating.
NVIDIA joined CNCF as a Platinum member with a $4 million open-source investment and donated its Dynamic Resource Allocation (DRA) driver for GPUs to the Kubernetes community — making GPU scheduling a vendor-neutral, community-maintained capability.
The CNCF ecosystem now counts 230 projects and 19.9 million contributors, up 30% in six months. Europe (38.8%) has overtaken the US (36.29%) in contributor share for the first time.

Dragonfly added native hf:// and modelscope:// protocol support for Hugging Face and ModelScope, eliminating proxy layers and URL rewriting for AI model distribution on Kubernetes.
Its P2P piece-based streaming cuts origin traffic by 99.5% — distributing a 130 GB model to 200 GPU nodes drops total origin pull from 26 TB to ~130 GB. Private repos (token auth), specific revisions, and recursive directory downloads are all supported.

Gateway API v1.5 promoted 5 features to Standard channel in a single release — a project record. ListenerSet and TLSRoute are new additions to the spec (builds on the v1.4 migration path from archived ingress-nginx, covered last issue).
A Kubernetes AI Conformance Program was launched at KubeCon Amsterdam to certify platforms for reliable AI/ML workload execution. An AI Gateway Working Group was also announced, with kgateway (Envoy-based) as the reference implementation — covering token-based rate limiting, payload inspection, semantic routing, and credential management.
Cluster API is prototyping "Update Extensions" for in-place cluster upgrades, targeting more efficient version bumps without full node replacement cycles.

GKE Cloud Storage FUSE Profiles are GA for GKE ≥ 1.35.1-gke.1616000. Three pre-built StorageClasses — gcsfusecsi-training, gcsfusecsi-serving, gcsfusecsi-checkpointing — auto-tune mount options, cache sizes, and backing medium (RAM vs. Local SSD) based on real-time GPU/TPU type and available node resources.
The serving profile integrates Rapid Cache for cold-start model loading. Qwen3-235B-A22B (480 GB) load time drops from 39 hours to 14 minutes on TPUs with the profile enabled.

Karpenter v1.11.0 has a confirmed CPU utilization regression (issue opened April 9). The offending commit is identified; maintainers plan a patch release while evaluating a longer-term fix. Teams on v1.11.0 should monitor node CPU or hold at v1.10.x.
Amazon EKS Auto Mode, built on Karpenter, delegates node lifecycle, AMI patching, instance type selection, and add-on consistency to AWS via EC2 Managed Instances — a new EC2 primitive where AWS holds delegated operational control of the instance. AWS runs cluster operational software outside the customer cluster, reducing platform team maintenance surface. First introduced at re:Invent 2024, now positioned as the opinionated answer to EKS operational toil.

Get Platform and Infra Briefing in your inbox