Performance Impact of Pre-allocating Slice Memory in Golang

2023-08-25

When I review code, I often focus on whether the initialization of slice in the code allocates the expected memory space, i.e., anything that var init []int64 I ask to change to init := make([]int64, 0, length) format as much as possible. But there’s no quantitative idea of how much of an impact this improvement will have on performance, it’s just dogmatically required. This blog will go over the theoretical basis for preallocated memory to improve performance, quantitative measurements, and tools to automate detection and discovery.

Slice Memory Allocation Theory Basics

The code for Golang Slice expansion is in growslice under slice.go. The general idea is that when the slice is smaller than 256, each expansion creates a new slice that doubles its capacity, and when the capacity is larger than 256, each expansion creates a new slice that is 1.25 times larger than the original one, and then copies the data from the old slice to the new slice. After that, the data of the old slice will be copied to the new slice, and the new slice will be returned.

The code for slice expansion is as follows:

newcap := oldCap
doublecap := newcap + newcap
if newLen > doublecap {
    newcap = newLen
} else {
    const threshold = 256
    if oldCap < threshold {
        newcap = doublecap
    } else {
        // Check 0 < newcap to detect overflow
        // and prevent an infinite loop.
        for 0 < newcap && newcap < newLen {
            // Transition from growing 2x for small slices
            // to growing 1.25x for large slices. This formula
            // gives a smooth-ish transition between the two.
            newcap += (newcap + 3*threshold) / 4
        }
        // Set newcap to the requested cap when
        // the newcap calculation overflowed.
        if newcap <= 0 {
            newcap = newLen
        }
    }
}

So theoretically if the slice is pre-allocated and doesn’t need to be dynamically scaled then there can be performance gains in several places:

Memory only needs to be allocated once, not repeatedly.
No need for repeated data replication.
No need to repeatedly garbage collect old slice.
Memory is allocated exactly as it should be, with no wasted capacity due to dynamic allocation.

Theoretically, pre-allocated slice capacity will bring performance improvement compared to dynamic allocation, but the exact amount of improvement needs to be measured quantitatively.

Quantitative measurements

We refer to the prealloc code and make a simple modification to measure the performance impact of preallocating and dynamically allocating slice of different sizes.

The test code is as follows. By modifying length, we can observe the performance data in different cases:

package prealloc_test

import "testing"

var length = 1000

func BenchmarkNoPreallocate(b *testing.B) {
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        // Don't preallocate our initial slice
        var init []int64
        for j := 0; j < length; j++ {
            init = append(init, 0)
        }
    }
}

func BenchmarkPreallocate(b *testing.B) {
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        // Preallocate our initial slice
        init := make([]int64, 0, length)
        for j := 0; j < length; j++ {
            init = append(init, 0)
        }
    }
}

The first function tests the performance of dynamic allocation and the second function tests the performance of preallocation. The test can be executed with the following command:

go test -bench=. -benchmem prealloc_test.go

The result in the case of length = 1 is as follows:

BenchmarkNoPreallocate-12       40228154                27.36 ns/op            8 B/op          1 allocs/op
BenchmarkPreallocate-12         55662463                19.97 ns/op            8 B/op          1 allocs/op

With length being 1, theoretically there should not be a performance difference between dynamic and static allocations as both of them have to do an initial memory allocation, but as measured, pre-allocation takes 70% of the time of dynamic allocation, and pre-allocation still has a performance advantage of 1.4x, even if the number of allocations of both of them is always the same. It is possible that the performance improvement is related to the continuous allocation of variables.

The result in the case of length = 10 is as follows:

BenchmarkNoPreallocate-12        5402014               228.3 ns/op           248 B/op          5 allocs/op
BenchmarkPreallocate-12         21908133                50.46 ns/op           80 B/op          1 allocs/op

With a length of 10, preallocation still performs only one performance allocation, while dynamic allocation performs 5 performance allocations, and the performance of preallocation is 4 times higher than that of dynamic allocation. This shows that even when the slice size is small, pre-allocation still has a significant performance improvement.

Here are the results with length of 129, 1025 and 10000:

# length = 129
BenchmarkNoPreallocate-12         743293              1393 ns/op            4088 B/op          9 allocs/op
BenchmarkPreallocate-12          3124831               386.1 ns/op          1152 B/op          1 allocs/op

# length = 1025
BenchmarkNoPreallocate-12         169700              6571 ns/op           25208 B/op         12 allocs/op
BenchmarkPreallocate-12           468880              2495 ns/op            9472 B/op          1 allocs/op

# length = 10000
BenchmarkNoPreallocate-12          14430             86427 ns/op          357625 B/op         19 allocs/op
BenchmarkPreallocate-12            56220             20693 ns/op           81920 B/op          1 allocs/op

At higher capacities, static allocations still only do one memory allocation, but the performance gain is not exponentially greater, and the overall performance is 2 to 4 times that of dynamic allocations. There should be some other consumption in the process, or golang has a special optimization for high-capacity copying, so the performance gap does not widen.

When replacing the contents of the slice with a more complex struct, I thought that replication would have a bigger performance overhead, but the performance gap between pre-allocation and dynamic allocation of complex structs is smaller, so it seems that there are still a lot of internal optimizations, and the performance is not consistent with my intuition.

Prealloc

Although preallocated memory can bring some performance improvements, it is easy to miss the mark in larger projects that rely entirely on manual review. This is where lint tools are needed to automate code scanning; prealloc is one such tool that scans for slice that could potentially be preallocated but isn’t, and can be integrated into golangci-lint.