The golang for-loop gotcha
This bit me again the other day so what better way to exorcize the resulting bug in production code than to write a small article about it?
So, let’s raise hands: who among us has not written code that looks somewhat like this trivial example:
package main
import (
"fmt"
"time"
)
func main() {
for i := 0; i<5; i++ {
go func() {
fmt.Print(i)
}()
}
// let's wait a little bit for the goroutines to execute before we exit
time.Sleep(100 * time.Millisecond)
}
(very few hands raised in the audience) :grin:
Now, one reasonable assumption would be that this program will print the numbers from 1 to 5.
This assumption would be wrong though; the program consistently prints
the string 55555
(if the result is different on your computer, which
shouldn’t be, then try increasing the sleep duration at the end).
Why is that? The reason is that all goroutines close over the same
variable (i
in this case) so, when they are executed, the variable
will have assumed its last value after the completion of the iteration
loop (which is 5
).
We can observe the difference in behaviour when we introduce a small delay at the end of each iteration like so:
...
// data race
for i := 0; i<5; i++ {
go func() {
fmt.Print(i)
}()
time.Sleep(50 * time.Millisecond)
}
...
The output in this case would be 01234
because each goroutine would
most probably be executed before the value of the shared loop variable
i
was incremented by one.
In essence, this data race (which can lead to wicked and hard-to-find
bugs) is caused by the unsafe sharing of mutable state (in this
case i
) among the competing concurrent goroutines and is well
described on the interwebs (this
article
for example has an excellent explanation). The official Golang wiki
even lists this situation as one of the two most common golang
mistakes that one
can make (hint: the other mistake also has to do with for
loop
variables).
So, what is the correct way of implementing the use case of
asynchronously processing a series of values in a for
loop using
goroutines? The answer is simply to stop sharing the mutable state,
i.e. copy the state variable (in this case i
) within each iteration
and pass it to each goroutine like so:
...
// no data race
for i := 0; i<5; i++ {
go func(n int) {
fmt.Print(n)
}(i)
}
...
In this implementation, we are passing i
by value as an argument
(n
) into the goroutine; n
is now a goroutine-local copy of i
and
further changes to i
are not reflected on n
.
Alternatively, we could close over an iteration-local copy of i
as follows:
...
// no data race
for i := 0; i<5; i++ {
n := i
go func() {
fmt.Print(n)
}()
}
...
It is perhaps worth pointing out that Golang’s for
loop variable
gotcha is not some idiosyncratic language runtime misbehaviour that is
specific to the for
loop but, rather, a classic example of failing
to synchronize concurrent access to a shared resource either by
locking it or copying it (our preferred solution). Here’s an example
of the same problem without a for
loop:
package main
import (
"fmt"
"time"
)
func main() {
n := 10
// data race
go func() {
time.Sleep(100 * time.Millisecond)
fmt.Print(n)
}()
n = 11
time.Sleep(200 * time.Millisecond)
}
This will print 11
instead of 10
.
So, when closing over an outer variable in something like a gouroutine, a good tip would be to always think: who can access the outer variable and if/how access needs to be synchronized. Passing a copy of the state is always a safe choice!