TuyettheDocsProgramming
Related
Python 3.15 Alpha 1 Arrives: A Look at Early Features and What to Expect7 Key Updates About the Python Insider Blog MigrationPython 3.14 Release Candidate 3: Final Preview Before Stable Version7 Inside Stories from McDonald’s Grimace Shake Viral Trend (And How the Company Reacted)Safeguarding Configuration Rollouts at Scale: Meta’s ApproachGDB's Experimental Source-Tracking Breakpoints Automatically Adapt to Code ChangesGitHub Actions Workflow Compromised: How a Malicious PyPI Package Slipped ThroughPython 3.14 Final Release Candidate Ships: Stable ABI Locked, Launch Set for October

10 Lessons from the Kernel-TCMalloc Clash Over Restartable Sequences

Last updated: 2026-05-02 06:08:55 · Programming

The principle known as Hyrum's Law states that any observable behavior of a system will eventually be depended upon by someone. A recent incident in the Linux kernel community offers a textbook example of this phenomenon. The kernel's restartable sequences (rseq) interface, designed to accelerate user-space threading, was optimized in version 6.19. The changes respected the documented API, but they broke Google's TCMalloc memory allocator—because TCMalloc secretly relied on undocumented internal details. The kernel's strict no-regressions policy then forced developers to find accommodations for TCMalloc's non‑standard behavior. This article unpacks the episode through 10 key insights, from the original rseq design to the broader lessons for system software development.

1. Hyrum's Law in Action

Hyrum's Law is a software engineering adage: given enough time, every observable behavior—even an unintended one—becomes a de facto API. In the rseq case, the kernel's documented interface specified how threads could register and use per‑CPU critical sections. Yet TCMalloc began to rely on an undocumented aspect: the exact placement of the rseq area within a thread's memory structure. This dependency was invisible until the kernel optimization altered that placement. The resulting breakage is a classic Hyrum's Law outcome: a hidden contract that developers inadvertently depended upon.

10 Lessons from the Kernel-TCMalloc Clash Over Restartable Sequences

2. What Are Restartable Sequences?

Restartable sequences provide a lightweight mechanism for user-space code to execute atomic sequences without expensive system calls. A thread can mark a code block as a restartable sequence; if the kernel preempts the thread within that block, it transparently restarts the sequence from the beginning. This enables high‑performance per‑CPU operations, such as reading and updating thread‑local counters. The kernel manages the rseq state through a per‑thread structure that includes the restartable sequence's address and status flags. For years, this interface was stable and widely used in threading libraries like glibc's pthreads and in allocators like TCMalloc.

3. The Kernel's 6.19 Optimization

In the 6.19 release cycle, kernel developers identified performance bottlenecks in the rseq implementation. One key improvement involved moving the thread's rseq area from the task descriptor into a dedicated per‑CPU memory region. This change reduced cache misses and improved scalability on multi‑socket systems. All documented API contracts—the function signatures, the data structure fields, and the required kernel configurations—remained identical. The developers believed the change was fully backward compatible because no officially documented aspect of the rseq interface had been modified.

4. TCMalloc's Hidden Dependency

Google's TCMalloc memory allocator uses rseq to implement fast per‑thread caching without locking. To optimize access, TCMalloc directly read the rseq area's location from the struct task_struct—which is not part of the user‑visible API. When the kernel relocated the rseq area for 6.19, TCMalloc's internal pointer became stale, causing it to read from an incorrect memory location. The result: corrupted allocation data, crashes, and degraded performance. TCMalloc's developers had unwittingly created a dependency on the internal layout of kernel data structures, violating the documented API boundaries.

5. The No‑Regressions Rule

The Linux kernel community enforces a strict no‑regressions policy: for each new release, any change that breaks existing user‑space programs—or even widely used libraries—must be reverted or mitigated. This rule is critical for maintaining trust and stability in the kernel ecosystem. When TCMalloc failed under 6.19, the kernel developers faced a dilemma. The rseq optimization was beneficial for many users, but breaking TCMalloc violated the no‑regressions policy. The policy forced the team to seek a solution that preserved the optimization while still allowing TCMalloc to function.

6. The Search for a Compromise

Initial attempts to fix the conflict focused on adding a new kernel API that would expose the rseq area's location in a stable manner. However, this approach risked creating another hidden contract. Other suggestions included patching TCMalloc to use only the documented interfaces, but the kernel team generally avoids forcing userspace changes. Eventually, a compromise emerged: the kernel reverted the specific rseq area relocation while keeping other optimizations intact. This restored TCMalloc's functionality without sacrificing all performance gains, though it left the root cause of the hidden dependency unaddressed.

7. How the Community Responded

The incident sparked heated discussions on the Linux kernel mailing list. Some developers argued that TCMalloc's violation of the documented API should not be the kernel's problem; others pointed out that Hyrum's Law makes such violations inevitable and that the kernel must be defensive. Many highlighted the need for better documentation and for more explicit warnings about internal structures. The consensus leaned toward accepting the compromise as a pragmatic solution, while calling for a clearer boundary between kernel internals and user‑visible interfaces to prevent similar issues in the future.

8. Implications for Library Developers

For developers of performance‑critical libraries like TCMalloc, this episode is a cautionary tale. Relying on undocumented kernel internals may provide short‑term speed gains, but it creates fragility. The key lesson is: always use the published, stable APIs, even if they are slightly less efficient. When a library does require an optimization that seems to depend on internal details, the proper approach is to propose and standardize a new official interface. Ignoring this discipline can cause cascading breakage across the ecosystem when the kernel evolves.

9. The Broader Lesson for System Software

Hyrum's Law applies beyond kernels and allocators. Any system with observable behavior—whether a database, web framework, or cloud service—will have its unwritten rules hardened into de facto contracts. The rseq‑TCMalloc clash demonstrates that even well‑intentioned optimizations can stumble on these hidden assumptions. System designers should therefore clearly delineate which parts of a codebase are internal and subject to change, and they should provide stable abstractions for features that might otherwise be abused. Regularly auditing dependencies can also reveal hidden assumptions before they cause failures.

10. Looking Forward: A More Robust Path

Moving forward, the kernel community is exploring ways to make the rseq interface more resilient to Hyrum's Law. Proposals include adding an official query mechanism for the rseq area location, as well as extending the documented API to cover previously undefined behaviors. At the same time, TCMalloc's maintainers are refactoring their code to rely solely on the stable interface, even if it means a minor performance trade‑off. This dual effort—improving the kernel's API documentation and encouraging userspace to comply—offers a blueprint for managing Hyrum's Law in any complex software system.

In conclusion, the restartable‑sequences episode illustrates the perennial tension between innovation and compatibility in software engineering. Hyrum's Law ensures that the smallest observable detail can become a dependency. For the kernel, accommodating TCMalloc's undocumented reliance was a practical necessity, but it also highlighted the need for clearer contracts. As the ecosystem evolves, both kernel developers and library authors must work together to prevent such clashes—by documenting interfaces better, resisting the temptation to exploit internals, and maintaining a culture of stable APIs. Only then can we reconcile the relentless drive for performance with the essential requirement of stability.