K8s v1.36 RC, Karpenter fix, OpenTofu 1.12 beta, CoreDNS 300K QPS
Kubernetes v1.36.0-rc.0: DRA Graduates, Gang Scheduling Alpha, Breaking Changes
- Kubernetes v1.36.0-rc.0 tagged April 8, built with Go 1.26.2. Feature freeze is complete; GA targeted shortly.
- Promoted to GA:
MutatingAdmissionPolicy(v1),DRAPrioritizedList,DRAAdminAccess,UserNamespacesSupport,NodeLogQuery,ImageVolume,KubeletPSI,ExternalServiceAccountTokenSigner,ProcMountType,KubeletPodResourcesGet. - Promoted to beta (on by default):
InPlacePodLevelResourcesVerticalScaling,DRAPartitionableDevices,DRAConsumableCapacity,ConstrainedImpersonation,StrictIPCIDRValidation,NodeDeclaredFeatures. - New alpha APIs:
ResourcePoolStatusRequestfor DRA device pool availability querying;scheduling.k8s.io/v1alpha2WorkloadandPodGroupfor gang scheduling with Job controller integration;PlacementGenerate/PlacementScoreextension points for Topology Aware Workload Scheduling (TAS). Full details in the v1.36 CHANGELOG. - Breaking changes: metric
volume_operation_total_errors→volume_operation_errors_totalin kube-controller-manager; Portworx in-tree volume plugin removed (CSIMigrationPortworxgate removed);Service.spec.externalIPsnow emits deprecation warnings. - CoreDNS bumped to 1.14.2. Prometheus native histogram support enabled by default across apiserver, controller-manager, kube-proxy, kubelet, and scheduler.
Karpenter v1.11.1 Patches CPU Regression; OCI Provider Hits GA
- Karpenter v1.11.1 released April 6, resolving the CPU utilization regression in v1.11.0 (reported last issue). Root causes: incorrect locking in state cost calculation (
#2944) and offering count tracking bug in the cost controller re-add path (#2946). Teams holding at v1.10.x can now upgrade. - v1.11.1 also ships
NodePoolnode limits (#2526) and cloud provider node registration hooks (#2923). - The Karpenter Provider for OCI is now GA and open-sourced. Two new CRDs:
NodePool(instance families, ADs, on-demand/preemptible mix) andOciNodeClass(compartment, subnet, VCN CNI, secondary VNIC). Unlike Cluster Autoscaler's fixed node pool model, the OCI provider automatically selects alternative shapes when preferred capacity is unavailable. - OCI-native integrations: preemptible capacity, capacity reservations, cluster placement groups, and Compute Clusters. Recommended migration path: keep existing managed node pools for system workloads, introduce Karpenter for application workloads via labels/taints. Karpenter and Cluster Autoscaler can coexist during transition.
OpenTofu 1.12.0-beta1: Dynamic Lifecycle Guards and Resource Identity Import
- OpenTofu 1.12.0-beta1 is available for testing (released April 7). Key additions: dynamic
prevent_destroy, Resource Identity for compound identifiers, improved lock file population, and a new-json-into=FILEoutput flag. - Dynamic
prevent_destroy: the lifecycle argument can now reference variables and locals —prevent_destroy = var.is_production— eliminating the need for separate module copies per environment. - Resource Identity: a new
identityblock inimport {}handles resources with compound identifiers (e.g., AWS SSM Maintenance Window requires bothwindow_idandid). Providers must implement the new identity schema in the plugin protocol. - Lock file:
tofu initnow auto-populates.terraform.lock.hclwith all platform-agnostic checksums (bothzh:andh1:schemes), eliminating most manualtofu providers lockruns. -json-into=FILE: new flag forplan,apply, and related commands — writes machine-readable JSON to a file while preserving human-readable output on stdout/stderr; supports named pipes for streaming.
CoreDNS Multisocket Scaling: 300K QPS Per Instance, Node-Local DNS Hardening
- CoreDNS 1.14.2 shipped two weeks before KubeCon EU, adding a Nomad plugin, DNS-over-QUIC improvements, proxy protocol support for load-balancer client IP resolution, and connection multiplexing in the QUIC plugin. The KubeCon EU maintainer session (Yong Tang and John Belamaric, Google) disclosed that additional CVEs beyond the four already patched are in progress — 1.14.3 expected within roughly one week.
- Multisocket support (introduced in 1.12) resolves CoreDNS's longstanding vertical scaling ceiling: without it, CoreDNS flatlined at ~40K QPS regardless of additional CPU. With
SO_REUSEPORTacross multiple sockets, a single CoreDNS instance now scales to over 300K QPS with near-linear CPU growth. Maintainers now recommend CPU-based HPA over the traditional cluster-proportional autoscaling model — most clusters running dozens of CoreDNS pods can consolidate significantly. - Node-local DNS cache was highlighted as a high-priority production recommendation: deploying the DaemonSet upgrades UDP DNS to TCP between node and central CoreDNS (eliminating 5-second timeout failures from conntrack table exhaustion), adds per-node caching, and enables direct routing to corporate DNS for out-of-cluster domains. Google's own incident data showed it eliminating DNS failure spikes in clusters with thousands of nodes.
- For multi-tenant DNS isolation (noisy-neighbor attack vectors), the recommended mitigation is using a
MutatingAdmissionPolicyto redirect misbehaving namespace pods to a dedicated CoreDNS instance via pod DNS policy — there is no cluster-level protection against a tenant flooding the shared CoreDNS.
Keycloak 26.6 for Kubernetes; bootc Brings GitOps to Node OS
- Keycloak 26.6 promotes five features to production: zero-downtime patch releases (set Operator update strategy to
"Auto"), Federated Client Authentication (supports Kubernetes Service Accounts as client credentials, eliminating per-client secrets in multi-IDP orgs), JWT Authorization Grant (RFC 7523), YAML-defined realm lifecycle workflows, and the Keycloak Test Framework. New platform integrations: Traefik and Envoy client certificate lookup providers; automatic truststore initialization on Kubernetes/OpenShift. - Breaking changes in 26.6: JavaScript-based policies require the
Scriptsfeature explicitly enabled; client URIs must use HTTPS; JWT Authorization Grant issuers must uniquely identify a provider. - A CNCF ChatLoopBackOff session on April 10 covered
bootc, which manages the Linux OS itself as an OCI container image — platform teams define node base images in Dockerfiles, roll out changes via image tags, and roll back by pulling a previous image. Turns node OS lifecycle into a GitOps-compatible workflow using the same container toolchain already in use for application images — directly applicable to teams managing bare-metal or VM node OS at scale on Kubernetes.
Get Platform and Infra Briefing in your inbox
Subscribe to receive new issues as they're published.