Optimizing Go Performance with Stack Allocation for Fixed-Size Slices

Introduction

Every Go developer knows that heap allocations can be a major bottleneck in high-performance applications. The garbage collector must eventually reclaim that memory, and even with modern improvements like the Green Tea collector, the overhead remains significant. Stack allocations, by contrast, are almost free—they vanish when the function returns and need no garbage collection. This article explores a powerful technique to shift certain slice allocations from the heap to the stack, dramatically improving speed and reducing GC pressure.

Optimizing Go Performance with Stack Allocation for Fixed-Size Slices — Source: blog.golang.org

The Cost of Heap Allocations

When a Go program allocates memory on the heap, the runtime must find a suitable block, update metadata, and later track it for collection. These steps involve a substantial amount of code and can stall execution, especially in hot loops. Moreover, each heap allocation adds load to the garbage collector. Even with efficient collectors, the cost multiplies when allocations occur frequently. Stack allocations, on the other hand, are essentially free—they just move the stack pointer and the memory is automatically reclaimed when the function exits. They also promote cache locality because stack frames are contiguous and hot in the CPU cache.

Understanding Slice Growth

Consider a common pattern: collecting items from a channel into a slice. The code below shows a typical example:

func process(c chan task) {
    var tasks []task
    for t := range c {
        tasks = append(tasks, t)
    }
    processAll(tasks)
}

Let’s trace what happens at runtime. On the first iteration, tasks has no backing array, so append allocates one of size 1. The second iteration finds the existing array full, so it allocates a new array of size 2 and discards the old. The third iteration allocates size 4, then 8, and so on, doubling each time. This exponential growth pattern works well for large slices, but it imposes a heavy cost during the startup phase. Each allocation triggers the heap allocator and leaves garbage behind. If your slice seldom grows beyond a few elements, you pay this overhead on every batch.

Furthermore, the discarded small arrays become garbage, adding pressure on the GC. In performance-critical code, this waste can cause noticeable slowdowns.

When Slice Size Is Constant

The situation changes dramatically if you know the maximum number of elements the slice will ever hold. In our channel example, perhaps you can bound the number of tasks to, say, 100. Instead of letting append manage dynamic growth from the heap, you can preallocate a fixed-size array on the stack and use it as the slice’s backing store.

The Stack Array Technique

The technique is simple: declare a fixed-size array and take a slice with zero length but full capacity. For example:

func process(c chan task) {
    const maxTasks = 100
    var buf [maxTasks]task
    tasks := buf[:0]
    for t := range c {
        tasks = append(tasks, t)
    }
    processAll(tasks)
}

Here, buf is allocated on the stack—no heap allocation at all. The slice tasks begins empty, and as you append items, they fill the underlying array. Once the array is full, any further appends will cause appendmaxTasks, you completely avoid heap allocations. The stack memory is automatically reclaimed when the function returns, and the garbage collector sees nothing.

Performance Benefits

Using this approach yields several advantages:

Zero heap allocations for the slice backing store—no allocator calls, no GC tracking.
Better cache performance because the array resides in the stack frame, which is typically hot in the CPU cache.
Reduced GC pressure since no temporary garbage is generated by the growth process.
Faster appends—once the slice fits within the fixed array, every append is a simple length increment and assignment.

Benchmarks show that avoiding heap allocations in tight loops can improve throughput by 2–5x, depending on the environment. For serverless or real-time workloads, this reduction in latency variance is especially valuable.

Practical Considerations

While the stack array technique is powerful, it comes with caveats:

Known maximum size: You must be certain the slice will never exceed the array’s length. If it does, the escape analysis may move the array to the heap, defeating the purpose. Use a constant that comfortably exceeds any possible input.
Stack size limits: Goroutine stacks start small (2–4 KB) but can grow dynamically. Still, allocating a very large array on the stack could cause a stack overflow. Stick to arrays of moderate size—dozens to low hundreds of elements—unless you are sure the stack can accommodate them.
Performance trade-offs: If the slice is usually small but sometimes large, you might want a hybrid approach: use a stack array for the common case and fall back to heap allocation when it overflows. You can implement this by checking len(tasks) == cap(tasks) and then switching to a heap-allocated slice.

Finally, remember that Go’s escape analysis is smart. If the compiler detects that the array does not escape to the heap, it will keep it on the stack. However, passing the slice to a function that stores it in a global or sends it on a channel will cause the entire array to escape. Always verify with go build -gcflags=-m to see where allocations occur.

Conclusion

Stack allocation for fixed-size slices is a low‑effort, high‑impact optimization that every Go programmer should have in their toolkit. By understanding how append grows slices and exploiting stack arrays when the size is bounded, you can eliminate heap allocations in hot paths. The result is faster code, less garbage, and happier users. Next time you write a loop that builds a slice with a known maximum, consider reaching for a stack‑allocated backing array instead of relying on the heap.

Back to cost of heap allocations | Back to slice growth | Back to solution

Tags: