K8s v1.36 RC, Karpenter fix, OpenTofu 1.12 beta, CoreDNS 300K QPS

Kubernetes v1.36.0-rc.0: DRA Graduates, Gang Scheduling Alpha, Breaking Changes

  • Kubernetes v1.36.0-rc.0 tagged April 8, built with Go 1.26.2. Feature freeze is complete; GA targeted shortly.
  • Promoted to GA: MutatingAdmissionPolicy (v1), DRAPrioritizedList, DRAAdminAccess, UserNamespacesSupport, NodeLogQuery, ImageVolume, KubeletPSI, ExternalServiceAccountTokenSigner, ProcMountType, KubeletPodResourcesGet.
  • Promoted to beta (on by default): InPlacePodLevelResourcesVerticalScaling, DRAPartitionableDevices, DRAConsumableCapacity, ConstrainedImpersonation, StrictIPCIDRValidation, NodeDeclaredFeatures.
  • New alpha APIs: ResourcePoolStatusRequest for DRA device pool availability querying; scheduling.k8s.io/v1alpha2 Workload and PodGroup for gang scheduling with Job controller integration; PlacementGenerate/PlacementScore extension points for Topology Aware Workload Scheduling (TAS). Full details in the v1.36 CHANGELOG.
  • Breaking changes: metric volume_operation_total_errorsvolume_operation_errors_total in kube-controller-manager; Portworx in-tree volume plugin removed (CSIMigrationPortworx gate removed); Service.spec.externalIPs now emits deprecation warnings.
  • CoreDNS bumped to 1.14.2. Prometheus native histogram support enabled by default across apiserver, controller-manager, kube-proxy, kubelet, and scheduler.

Karpenter v1.11.1 Patches CPU Regression; OCI Provider Hits GA

  • Karpenter v1.11.1 released April 6, resolving the CPU utilization regression in v1.11.0 (reported last issue). Root causes: incorrect locking in state cost calculation (#2944) and offering count tracking bug in the cost controller re-add path (#2946). Teams holding at v1.10.x can now upgrade.
  • v1.11.1 also ships NodePool node limits (#2526) and cloud provider node registration hooks (#2923).
  • The Karpenter Provider for OCI is now GA and open-sourced. Two new CRDs: NodePool (instance families, ADs, on-demand/preemptible mix) and OciNodeClass (compartment, subnet, VCN CNI, secondary VNIC). Unlike Cluster Autoscaler's fixed node pool model, the OCI provider automatically selects alternative shapes when preferred capacity is unavailable.
  • OCI-native integrations: preemptible capacity, capacity reservations, cluster placement groups, and Compute Clusters. Recommended migration path: keep existing managed node pools for system workloads, introduce Karpenter for application workloads via labels/taints. Karpenter and Cluster Autoscaler can coexist during transition.

OpenTofu 1.12.0-beta1: Dynamic Lifecycle Guards and Resource Identity Import

  • OpenTofu 1.12.0-beta1 is available for testing (released April 7). Key additions: dynamic prevent_destroy, Resource Identity for compound identifiers, improved lock file population, and a new -json-into=FILE output flag.
  • Dynamic prevent_destroy: the lifecycle argument can now reference variables and locals — prevent_destroy = var.is_production — eliminating the need for separate module copies per environment.
  • Resource Identity: a new identity block in import {} handles resources with compound identifiers (e.g., AWS SSM Maintenance Window requires both window_id and id). Providers must implement the new identity schema in the plugin protocol.
  • Lock file: tofu init now auto-populates .terraform.lock.hcl with all platform-agnostic checksums (both zh: and h1: schemes), eliminating most manual tofu providers lock runs.
  • -json-into=FILE: new flag for plan, apply, and related commands — writes machine-readable JSON to a file while preserving human-readable output on stdout/stderr; supports named pipes for streaming.

CoreDNS Multisocket Scaling: 300K QPS Per Instance, Node-Local DNS Hardening

  • CoreDNS 1.14.2 shipped two weeks before KubeCon EU, adding a Nomad plugin, DNS-over-QUIC improvements, proxy protocol support for load-balancer client IP resolution, and connection multiplexing in the QUIC plugin. The KubeCon EU maintainer session (Yong Tang and John Belamaric, Google) disclosed that additional CVEs beyond the four already patched are in progress — 1.14.3 expected within roughly one week.
  • Multisocket support (introduced in 1.12) resolves CoreDNS's longstanding vertical scaling ceiling: without it, CoreDNS flatlined at ~40K QPS regardless of additional CPU. With SO_REUSEPORT across multiple sockets, a single CoreDNS instance now scales to over 300K QPS with near-linear CPU growth. Maintainers now recommend CPU-based HPA over the traditional cluster-proportional autoscaling model — most clusters running dozens of CoreDNS pods can consolidate significantly.
  • Node-local DNS cache was highlighted as a high-priority production recommendation: deploying the DaemonSet upgrades UDP DNS to TCP between node and central CoreDNS (eliminating 5-second timeout failures from conntrack table exhaustion), adds per-node caching, and enables direct routing to corporate DNS for out-of-cluster domains. Google's own incident data showed it eliminating DNS failure spikes in clusters with thousands of nodes.
  • For multi-tenant DNS isolation (noisy-neighbor attack vectors), the recommended mitigation is using a MutatingAdmissionPolicy to redirect misbehaving namespace pods to a dedicated CoreDNS instance via pod DNS policy — there is no cluster-level protection against a tenant flooding the shared CoreDNS.

Keycloak 26.6 for Kubernetes; bootc Brings GitOps to Node OS

  • Keycloak 26.6 promotes five features to production: zero-downtime patch releases (set Operator update strategy to "Auto"), Federated Client Authentication (supports Kubernetes Service Accounts as client credentials, eliminating per-client secrets in multi-IDP orgs), JWT Authorization Grant (RFC 7523), YAML-defined realm lifecycle workflows, and the Keycloak Test Framework. New platform integrations: Traefik and Envoy client certificate lookup providers; automatic truststore initialization on Kubernetes/OpenShift.
  • Breaking changes in 26.6: JavaScript-based policies require the Scripts feature explicitly enabled; client URIs must use HTTPS; JWT Authorization Grant issuers must uniquely identify a provider.
  • A CNCF ChatLoopBackOff session on April 10 covered bootc, which manages the Linux OS itself as an OCI container image — platform teams define node base images in Dockerfiles, roll out changes via image tags, and roll back by pulling a previous image. Turns node OS lifecycle into a GitOps-compatible workflow using the same container toolchain already in use for application images — directly applicable to teams managing bare-metal or VM node OS at scale on Kubernetes.

Get Platform and Infra Briefing in your inbox

Subscribe to receive new issues as they're published.