GPU virtualization is a decade behind CPU
K8s v1.36 Post-Stable: Pod-Level NUMA Resource Managers Alpha
- Pod-Level Resource Managers alpha (KEP-5818) landed in v1.36 with a Kubernetes blog post on May 1 — enables NUMA-topology-aware CPU and memory allocation at pod granularity, pinning all containers in a pod to the same NUMA node as a single scheduling unit. Previously, topology decisions were made per-container, preventing coordinated placement across the pod boundary.
- The primary target is multi-container AI inference pods where co-resident prefill and decode containers (the disaggregated pattern in llm-d/KServe, previously covered) suffered cross-NUMA memory access penalties that the per-container topology model could not eliminate.
- Enable via the
PodLevelResourceManagersfeature gate; requiresTopologyManagerpolicy set tosingle-numa-nodeorrestricted. Coordinates simultaneously with CPU Manager, Memory Manager, and DRA device allocation — teams runningInPlacePodLevelResourcesVerticalScaling(beta) alongside this should validate the interaction before enabling in production.
AWS: M8in/R8in Families GA; Cupertino Lab Previews Next-Gen Networking
- M8in, M8ib, R8in, and R8ib instances are GA — memory-optimized families now matching the bandwidth ceiling of C8in (previously covered): 600 Gbps network on the
nvariants and 300 Gbps EBS on thebvariants. R-series targets high-memory workloads (in-memory databases, analytics); M-series targets general-purpose memory-intensive compute per the AWS May 4 roundup. - C8ine and M8ine are new enhanced-network variants delivering 2.5x packet processing performance vs. prior generation — optimized for virtual network appliances, 5G core functions, and stateful packet inspection on Kubernetes.
- AWS's Cupertino research lab disclosed hollow core fiber cutting datacenter latency 30%, next-gen 102.4 Tbps switches expected in production within ~12 months, and an UltraCluster topology reducing GPU-to-GPU hop count to 5 for AI training fabrics.
- Amazon Q Developer blocks new signups May 15, 2026; full EOL is April 30, 2027 — all users must migrate to Kiro. Relevant for teams with Q Developer integrated into CI/CD or IDE toolchains.
HCP Terraform Powered by Infragraph; VCF 9.1 Ships with GPU vMotion
- HCP Terraform powered by Infragraph enters public preview May 8 — an IBM Think 2026 announcement adding a real-time multi-cloud knowledge graph layer to HCP Terraform. Infragraph tracks live infrastructure state, relationships, and drift across cloud providers to enable AI-driven impact analysis before
apply. - VCF 9.1 ships today with improved cold-memory-page detection for NVMe tiering, AMD Instinct MI350 GPU support, and zero-downtime GPU vMotion enabling AI workloads to migrate between accelerators live. New AI observability surfaces per-workload token consumption, active agent inventory, and model usage.
- VCF 9.1 adds a lightweight Kubernetes environment for dev/test — avoiding the need to dedicate full clusters to non-production workloads — plus multi-tenant AI workload isolation on shared infrastructure via a new storage compression layer for AI data pipelines.
- Broadcom reports 2,000+ VCF 9 deployments in the year since GA, representing less than 1% of the ~350,000 pre-acquisition VMware customer base; no license renewals are available for legacy vSphere products.
CNCF: Microcks Incubates; Edera Brings Per-Workload Kernel Isolation to GPUs
- Microcks promoted to CNCF Incubating — API mocking and simulation platform supporting REST, gRPC, AsyncAPI, GraphQL, and Kafka for contract-first API development and service virtualization without live service dependencies; previously in CNCF Sandbox.
- Edera uses a type-1 hypervisor with direct IOMMU control to give each Kubernetes workload its own Linux kernel instance — preventing cross-tenant cache and branch-predictor attacks and isolating proprietary GPU kernel drivers per workload rather than per-node. Boot and runtime overhead is described as near-zero.
- The CNCF AI Sandboxing post frames the shared Linux kernel as Kubernetes's structural multi-tenant AI security gap: GPU virtualization security is "a decade behind CPU virtualization," and advanced LLMs can escape standard containers in certain configurations. Edera's hypervisor layer contains the CVE-2026-31431 (Copy Fail) container escape vector regardless of host kernel patch status.
Defender for Containers GA; Koney Deception Operator; IREN Acquires Mirantis
- Microsoft Defender for Containers runtime anti-malware detection and blocking, and DNS threat detection, are both GA for AKS, EKS, and GKE — enforced at the node-agent level without sidecar injection or application code changes required.
- Koney 0.2.0 is an open-source Kubernetes operator from Dynatrace Research that automates honeytoken deployment via a
DeceptionPolicyCRD. Koney mounts realistic fake credential files at attacker-targeted paths (e.g.,/run/secrets/.aws/credentials), monitors access at kernel level via Tetragon eBPF tracing, and forwards structured security events to Dynatrace or any configured sink. Install via Helm:helm install koney oci://ghcr.io/dynatrace-oss/koney/charts/koney --version 0.2.0. - IREN acquired Mirantis for $625M in stock — Mirantis brings the k0rdent AI platform for managing AI infrastructure across bare metal, VMs, and Kubernetes, plus 1,500+ enterprise customers and founding ISV status in NVIDIA's AI Cloud Ready Initiative. IREN (GPU data center operator) gains managed Kubernetes and cloud software depth; Mirantis operates as a standalone subsidiary continuing its existing customer relationships.
Get Platform and Infra Briefing in your inbox
Subscribe to receive new issues as they're published.