K8s resize without restart, $5K/cluster hidden fees
EKS Cluster Governance: 7 IAM Condition Keys for Proactive Policy Enforcement
- Amazon EKS added 7 new IAM condition keys scoped to
CreateCluster,UpdateClusterConfig,UpdateClusterVersion, andAssociateEncryptionConfig— enabling SCP and IAM policy enforcement of cluster security posture at creation time rather than post-deployment audit. - Key enforceable constraints:
eks:endpointPublicAccess(block public API endpoints),eks:encryptionConfigProviderKeyArns(require specific CMKs for secrets encryption),eks:kubernetesVersion(allowlist approved versions), pluseks:deletionProtection,eks:controlPlaneScalingTier, andeks:zonalShiftEnabledper the announcement. - The keys compose with AWS Organizations SCPs — a single deny SCP on
eks:encryptionConfigProviderKeyArnsblocks unencrypted cluster creation across all member accounts without per-account IAM configuration (announcement).
KServe's LLMInferenceService CRD Abstracts Prefill/Decode Topology
- KServe introduced a
LLMInferenceServiceCRD (~6 months in development) that generates the full disaggregated inference stack from a single spec — endpoint picker, inference scheduler, prefill pods, and decode pods — with NIXL (NVIDIA) managing NVLink/RoCEv2 inter-pod communication; vLLM 0.6 is the primary tracking target. - A WVA (Workload Variant Autoscaler) CRD is in development — a single object that centrally manages both KEDA and HPA with pluggable actuators and scaling signals beyond KV cache utilization, replacing per-workload autoscaler configuration (KubeCon EU roadmap).
- LeaderWorkerSet (LWS) support replaces StatefulSet for grouped multi-node, multi-GPU worker deployments; KServe is also transitioning autoscaling from KNative to KEDA, as KNative's eventing model doesn't map cleanly to LLM inference load patterns (KubeCon EU roadmap).
K8s v1.36 Post-Stable Guides: Pod-Level Vertical Scaling Beta and Velero etcd Tooling
- The Kubernetes blog published a pod-level vertical scaling operations guide for
InPlacePodLevelResourcesVerticalScaling(beta, on by default in v1.36) — CPU and memory now resize at the pod level without restart, applying changes across all containers simultaneously; distinct from container-level in-place resize (KEP-1287) promoted in v1.27. - A second v1.36 guide covers mutable pod resources for suspended Jobs (beta) — node selectors, tolerations, and resource requests can now be modified in-place without deleting and recreating the Job object (briefly noted in Issue #7's preview; this is the full operational breakdown).
- Alongside the Velero CNCF sandbox donation (Issue #1), Broadcom also published etcd diagnosis and recovery tooling at
github.com/vmware/etcd-diagnosisandgithub.com/vmware/etcd-recovery— providing structured control-plane visibility and recovery automation independent of backup tooling per InfoQ's Velero deep dive. - Velero's post-donation community roadmap directions include a multi-cluster backup policy control plane, CSI Data Management spec integration for pre-snapshot application quiescing, and Sigstore-signed backup artifacts; the Broadcom/Red Hat/Microsoft maintainer group adopted 5-day lazy-consensus voting for governance (InfoQ).
Rancher Prime 2.13.5: Revert Chart Name to rancher, Two CVEs Patched
- Rancher Prime 2.13.5 reverts the chart name change from v2.13.1 — all Helm install and upgrade commands must use
helm install rancher rancher-prime/rancher; automation using the v2.13.1 chart name will break on upgrade per the release notes. - CVE-2026-25705 (path traversal enabling arbitrary file access in Rancher Extensions) and CVE-2026-41050 (Fleet Helm deployer bypassing ServiceAccount impersonation, exposing unauthorized access to Kubernetes secrets) are both patched in this release (release notes).
- S3 snapshot retention silently resetting to
5on RKE2/K3s cluster version upgrades is fixed. Bundled Kubernetes versions: v1.34.7 (default), v1.33.11, v1.32.13 (release notes).
Q1 2026 Cloud Infrastructure: Google Cloud +63%; Kubernetes Support Costs Shift Private Cloud Math
- Google Cloud reached $20B in Q1 2026 revenue, up 63% YoY from $12.2B, with $6.6B operating income (203% YoY profit growth); AWS posted $37.6B at 28% YoY with $14.2B operating income; Azure Intelligent Cloud hit $34.7B at 28% YoY with Azure-specific services growing 39% per the CRN earnings face-off.
- The global cloud infrastructure market hit $129B in Q1 2026, up $35B YoY — all three providers cite AI workload demand as the primary growth driver (CRN).
- A ReveCom analysis documents a "lag gap" of 2–7 months between CNCF upstream releases and platform GA — VCF releases within ~2 months; RHOS lags up to 6 months due to OS vertical integration. Support windows: VCF 24 months (no extra cost), RHOS 18 months, EKS/GKE 14 months, AKS 12 months.
- Hyperscaler extended support fees exceed $5,000/cluster/year — a key cost driver alongside Gartner's forecast of 20% workload shift from public hyperscalers to local/private providers by 2026 and $80.4B in sovereign cloud spending for 2026 (Cloud Native Now).
Get Platform and Infra Briefing in your inbox
Subscribe to receive new issues as they're published.