Kubernetes v1.36: 6 Dynamic Resource Allocation Upgrades You Need to Know

Dynamic Resource Allocation (DRA) has reshaped how Kubernetes handles specialized hardware. With the v1.36 release, DRA reaches a new level of maturity, introducing graduated features and usability improvements that empower platform engineers. From stable fallback preferences to beta support for partitioned devices, these updates make managing GPUs, FPGAs, and other accelerators simpler and more reliable. Whether you're migrating from legacy extended resources or optimizing fleet-wide utilization, the following six enhancements deserve your attention. Let's explore each one and see how they can streamline your infrastructure.

1. Prioritized List: Stable Fallback Preferences

Hardware heterogeneity is a reality in most clusters. With the prioritized list feature now stable, you can define a ranked order of device preferences when requesting resources. Instead of hardcoding a request for a specific model and risking scheduling failures, you specify fallback options—for example, “Give me an H100, but if none are available, fall back to an A100.” The scheduler evaluates these requests in the order you provide, dramatically improving flexibility and cluster utilization. This feature is especially valuable in multi-tenant environments where different workloads require different accelerator capabilities. By allowing graceful degradation, you reduce wasted capacity and avoid manual intervention. To see how other features complement this, check out partitionable devices for finer-grained slicing.

Kubernetes v1.36: 6 Dynamic Resource Allocation Upgrades You Need to Know

2. Extended Resource Support: Beta Bridge to Legacy Systems

As DRA becomes the standard, bridging the gap with older resource models is essential. The extended resource support feature (now in beta) lets users request resources via traditional extended resources on a Pod while gradually adopting the DRA ResourceClaim API. This means cluster operators can migrate to DRA without forcing immediate changes on application developers. For example, you can continue using nvidia.com/gpu extended resources while new workloads start leveraging DRA's richer semantics. This gradual transition reduces disruption and accelerates adoption. Combined with other beta features like device taints, you get a powerful toolkit for hardware management.

3. Partitionable Devices: Beta Support for Logical Slicing

Hardware accelerators are powerful, but often a single workload doesn’t need an entire device. The partitionable devices feature (now beta) provides native DRA support for carving physical hardware into smaller logical instances—similar to Multi-Instance GPU (MIG) technology. Administrators can define how to split a device (e.g., into slices of 2 GB memory) and let the scheduler allocate appropriate shares to multiple Pods. This maximizes utilization and reduces per-tenant costs. The feature works alongside prioritized lists to further enhance flexibility; you can request a partition from a preferred device model with fallback options. It’s a game-changer for shared clusters where efficiency matters.

4. Device Taints: Beta Controls for Hardware Health

Just as nodes can be tainted, you can now apply device taints (beta) directly to specific DRA devices. This empowers cluster administrators to manage hardware more effectively. For instance, you can taint faulty devices to prevent them from being allocated to standard claims, or reserve specific hardware for dedicated teams or experiments. Only Pods with matching tolerations are permitted to claim tainted devices. This feature works hand-in-hand with device binding conditions to improve scheduling reliability and operational control. By isolating problematic hardware early, you reduce runtime failures and simplify maintenance.

5. Device Binding Conditions: Beta for Better Scheduling

Scheduling reliability gets a boost with device binding conditions (beta). This feature allows DRA to express conditions that must be satisfied before a device is bound to a claim—similar to node affinity but at the device level. For example, you can require that a network interface card is bonded to a specific virtual LAN before allocation. This reduces delays and ensures that allocated resources are immediately usable. With the gradual rollout of other features like extended resource support, device binding conditions help create a more deterministic scheduling pipeline, minimizing surprises at runtime.

6. Expanding Driver Ecosystem: Beyond Accelerators

DRA’s driver ecosystem continues to grow beyond specialized compute accelerators. The v1.36 release sees new drivers supporting networking hardware, FPGAs, and other resource types. This reflects a move toward a truly hardware-agnostic infrastructure model. For cluster operators, this means fewer vendor-specific plugins and more consistent management APIs. As more hardware vendors adopt DRA, you can expect reduced operational complexity and improved portability across clouds and on-premises environments. The combination of prioritized lists and partitionable devices makes it easier to mix and match different hardware types without rewriting scheduling logic.

Kubernetes v1.36 marks a milestone for Dynamic Resource Allocation, delivering both stability and innovation. With five feature graduations and a growing driver ecosystem, DRA is becoming the go-to mechanism for managing specialized hardware. Whether you are optimizing GPU fleets, migrating from extended resources, or exploring device partitioning, these upgrades provide the control and flexibility your clusters need. Start planning your migration today—your workloads will thank you.

Tags: