Kubernetes v1.36 Unleashed: 7 Essential DRA Updates You Need to Know

By ● min read

Introduction

Dynamic Resource Allocation (DRA) has transformed how Kubernetes handles specialized hardware like GPUs, networking devices, and other accelerators. With the v1.36 release, the DRA framework takes a significant leap forward, introducing several graduated features and new capabilities that enhance scheduling flexibility, hardware management, and ecosystem breadth. Whether you're running large-scale AI workloads or fine-tuning resource utilization, these updates offer practical tools to simplify operations and improve efficiency. Below, we break down the seven most important DRA changes in Kubernetes v1.36 — each designed to make your cluster smarter, more resilient, and easier to manage.

Kubernetes v1.36 Unleashed: 7 Essential DRA Updates You Need to Know

1. Prioritized List (Stable) — Graceful Fallback for Device Requests

Hardware heterogeneity is the norm in modern clusters, and the Prioritized List feature, now stable, addresses this head-on. Instead of tying a workload to a specific device model, you can define a fallback preference list — for instance, “prefer an H100 GPU, but if none are available, accept an A100.” The scheduler evaluates these preferences in order, selecting the best match based on current availability. This flexibility dramatically improves scheduling success rates and cluster utilization, especially in multi-accelerator environments. It also simplifies troubleshooting by reducing hard failures due to device shortages. Cluster operators can now confidently present diverse hardware to workloads without compromising reliability.

2. Extended Resource Support (Beta) — Gradual DRA Migration

Adopting new resource management models can be disruptive. The Extended Resource Support feature, now beta, bridges the gap between traditional extended resources and the DRA ResourceClaim API. With this, pods can still request resources via the classic resources.requests field, seamlessly converting them into DRA claims behind the scenes. This allows cluster operators to begin using DRA without forcing immediate changes to application manifests. Developers can adopt the ResourceClaim API at their own pace, ensuring a smooth migration path. This feature is especially valuable for organizations transitioning to DRA across hundreds of microservices, reducing operational friction and enabling incremental adoption.

3. Partitionable Devices (Beta) — Slice Hardware for Better Utilization

High-end accelerators often exceed the needs of a single workload. The Partitionable Devices feature, graduating to beta, introduces native support for carving physical hardware into smaller logical instances — similar to NVIDIA’s Multi-Instance GPU (MIG) technology. Administrators can define partitioning policies that match typical resource profiles, and the scheduler dynamically assigns slices to pods based on exact requests. This maximizes hardware utilization and reduces waste, especially in multi-tenant clusters. It also simplifies security, as each pod only sees its allocated partition. This feature is a game-changer for cost optimization in GPU-heavy environments, allowing you to pack more workloads onto fewer devices safely.

4. Device Taints (Beta) — Dedicated Hardware Management

Just as nodes can be tainted to control pod placement, Device Taints (now beta) extend this concept to individual DRA devices. You can mark faulty or reserved hardware with custom taints — for example, “do not use for standard workloads” or “high-priority experiment only.” Pods without matching tolerations will be rejected by the scheduler when trying to claim such devices. This enables precise access control, workload isolation, and proactive fault management. Use cases include quarantining defective units, reserving premium hardware for critical jobs, or isolating test environments. Combined with tolerations, it gives administrators fine-grained governance over expensive hardware assets.

5. Device Binding Conditions (Beta) — Smarter Scheduling Decisions

Improving scheduling reliability is a core goal for DRA, and Device Binding Conditions (now beta) deliver exactly that. Previously, the scheduler could allocate a device that later fails to bind due to real-time constraints. With this feature, you can define conditions that must be satisfied before a device is considered “bound” to a pod — such as driver readiness or network connectivity checks. If conditions aren’t met, the scheduler backtracks and reconsiders other devices. This reduces scheduling failures and retries, improving cluster stability. It also supports complex multi-device configurations where interdependencies must be validated ahead of time. For administrators managing large fleets, this reduces operational overhead and ensures workloads land on truly available hardware.

6. Expanding DRA Driver Ecosystem — More Hardware, More Flexibility

The power of DRA is only as good as its drivers. In v1.36, the driver ecosystem continues to expand beyond traditional compute accelerators like GPUs. New drivers now support networking hardware (e.g., SmartNICs, DPUs), storage accelerators, and even custom FPGA devices. This broadens DRA from a GPU-centric tool to a universal resource allocation framework. The community has also improved the driver development toolkit, making it easier for hardware vendors to create compliant drivers. With this diversity, platform engineers can standardize resource management across all specialized hardware types, reducing complexity and improving operational consistency. The result is a more hardware-agnostic Kubernetes infrastructure that adapts to evolving hardware roadmaps.

7. ResourceClaims in PodGroups — Coordinated Allocation for Workload Sets

Many distributed workloads — such as Ray clusters or MPI jobs — require resources to be allocated to multiple pods as a group. The new ResourceClaims in PodGroups feature (introduced in v1.36) allows you to define a set of ResourceClaims that must be allocated together to a group of pods. The scheduler ensures all claims are satisfied before launching any pod in the group, avoiding partial allocation deadlocks. This is particularly useful for tightly coupled workloads where all instances need identical hardware or need to communicate across dedicated interconnects. It simplifies orchestration and improves reliability for multi-pod workflows. This feature marks a critical step toward supporting complex batch and AI training jobs natively within Kubernetes.

Conclusion

Kubernetes v1.36 marks a maturation point for Dynamic Resource Allocation. From stable fallback lists to beta-grade device partitioning and group-level claim coordination, these features empower platform teams to manage specialized hardware with precision and confidence. The expanded driver ecosystem further ensures DRA becomes the go-to standard for resource abstraction. Whether you are piloting DRA or scaling it across thousands of nodes, these updates provide practical, real-world improvements that reduce complexity and boost efficiency. Dive into the official release notes to start experimenting with these capabilities today — your infrastructure will thank you.

Tags: