Stupid RCU Tricks: Memory Ordering and Grace Periods

Here is the scenario again:

CPU 0CPU 1CPU 2CPU 3
rcu_read_lock();
r1 = A;
r2 = B;
rcu_read_unlock();
r3 = B;
synchronize_rcu();
r4 = A;
A = 1; B = 1;

What can we say about the values of r1, r2, r3, and r4? In particular, once execution is complete, can this counter-intuitive result occur?

r1 == 1 && r2 == 0 && r3 == 1 && r4 == 0

In other words, can CPUs 0 and 1 see CPU 2's and CPU 3's stores as having happened in opposite orders?

Given that both the compiler and the CPU are within their rights (at least as far as the Linux kernel is concerned) to reorder CPU 0's loads, it is quite tempting to say “yes.” But we should instead take this question one step at a time, using the fundamental property of RCU.

Let's start by assuming that r3 == 1 && r4 == 0. This is simple enough: the store to B happened before CPU 1's load from B (and thus before the grace period) and the store to A happened after CPU 1's load from A (and thus after the grace period). Then r1 == 1 implies that CPU 0's load from A sees CPU 2's store. Because this store happened after the grace period, CPU 0's load must also have happened after the grace period.

However, the fundamental property of RCU says that if part of an RCU read-side critical section comes after a given grace period, none of that same critical section can come before that same grace period. This implies that CPU 0's load from B cannot precede the grace period. Because we know that CPU 3's store to B happened before the grace period, CPU 0's load from B must see the results of that store, so that r2 will be equal to the value 1, not 0.

Therefore, the above counter-intuitive result is prohibited, even though both the CPU and the compiler would be within their rights (at least as far as the Linux kernel is concerned) to reorder CPU 0's loads.