How herd Works

February 9, 2017

This article was contributed by Jade Alglave, Paul E. McKenney, Alan Stern, Luc Maranget, and Andrea Parri

The herd program reads in a litmus test and evaluates it according to a memory model. The model is contained in a .cat file specified by the “-model” command-line option (or a “model” line in the configuration file) plus an optional .bell file specified by the “-bell” option (or a “bell” line in the configuration file). The Bell file, if present, gets processed first. Both files consist of statements in the cat language, and herd treats them almost identically—the only real difference is that the Bell file is allowed to contain “instructions” statements but the Cat file is not. (The “instructions” statement defines tags which can decorate elementary operations. For instance, one such statement might specify that a barrier operation can be decorated with an “mb” tag, which could indicate that the operation is a full memory barrier.) Typically a Bell file is used for common code that is shared among multiple models, where each model would have its own Cat file.

After loading the Bell and Cat files, herd parses the litmus-test program. The program may contain macros (for instance, “smp_mb()”) which are expanded to internal constructs (such as a barrier with an “mb” tag). The mapping from user-level macros to internal constructs is defined in a file like linux.def that is specified by the “-macros” command-line option (or a “macros” line in the configuration file).

Next, herd interprets the program by constructing a list of events for each thread. For programs written in a high-level language like C, this involves breaking statements and expressions down into a series of elementary operations (read, write, arithmetic/logic on registers, branch, and so on); for programs in assembly language, the individual instructions generally correspond directly to these operations. However, atomic read-modify-write instructions always get represented by two operations, a read and a write, linked by the built-in rmw relation.

The events are organized as one list per thread, in program order. This is not always straightforward, because of a subtle but important fact: “Program order” refers to the order of instructions as they are presented to the processor's execution unit, not their order in the program's source or object code. While the two orders are often the same, they will differ when branches are present. A forward branch causes some instructions to be left out of the event list, and a backward branch can cause some instructions to be repeated in the list.

Since it is not known in advance whether a conditional branch will be taken, each such branch causes herd to generate two event lists: one in which the branch is taken and one in which it isn't. Thus, a program containing two conditional branches will give rise to four lists, a program containing three will give rise to eight, and so on. The po relation then refers to the order of the events in an individual list, and herd has to test each list separately, as a possible program execution.

When the program contains a loop, a conditional branch may be taken an indefinitely large number of times. In this situation the number of possible executions would quickly get out of hand, so there is a limit on how many times herd will allow a particular branch to appear in an execution (specified by the “-unroll” command-line option), typically set to 2. Executions with a higher number of iterations simply will not be considered.

Then, given a candidate execution, herd has to determine, for each read event, which write event stored the value that the read will retrieve. Again, there's no way to know this in advance, so if a program has more than one write to a particular variable, herd has to try all possible combinations for the rf relation. Just as with conditional branches, this can lead to exponential growth in the number of possible executions to be tested.

As part of its processing of a candidate execution, herd carries out a dataflow analysis of the values computed and stored in the local variables (or CPU registers) for each thread. This analysis gets used in several ways:

  • herd checks each conditional branch, making sure that the branch's condition is true if and only if the execution has decided to take the branch.
  • herd checks the target address of each indirect memory access (i.e., access through a pointer or relative to a CPU register), making sure that the rf relation really does link writes and reads to the same target address.
  • herd determines exactly what data, address, or control dependencies exist between memory-access events. These dependencies are made available to the model through the built-in data, addr, and ctrl relations.
As an example of this last point, given the statement
WRITE_ONCE(*x, READ_ONCE(*y));
in the litmus-test program, herd would break it down into two events:
rtemp = READ_ONCE(*y)
WRITE_ONCE(*x, rtemp)
(where rtemp is a temporary local variable), and it would add a link
rtemp = READ_ONCE(*y)WRITE_ONCE(*x, rtemp)
to the data relation.

Finally, once a particular choice for the po and rf relations has been settled on, the execution checks out okay, and the data, addr, and ctrl relations have been set up, herd runs the statements in the Bell and Cat files to see whether the memory model considers the candidate execution to be allowed. If any of the model's checks fail, the execution is abandoned. Otherwise, herd evaluates the logical assertion at the end of the litmus test. It keeps track of the number of allowed executions for which the assertion is true and the number for which it is false; these are the numbers reported at the end in herd's output.

Unlike po and rf, the co relation is not built-in. It has to be computed explicitly by the memory model. In practice this is done by the cos.cat file, which is included near the start of the model's Cat file (see for example line 3 in RMO memory-model files here and here, as discussed here). This involves another potentially exponential computation, because it is necessary to try all possible orderings of the write accesses to each variable.

herd works in terms of sets of events and relations between events. (A relation is a collection of ordered pairs of events; you can think of each ordered pair as a link going from the first event in the pair to the second.) The cat language used in the Bell and Cat files is rich in operators for constructing and testing these sets and relations.

To begin with, herd has a number of built-in sets used for classifying events. Each event is automatically added to the appropriate sets.

Name Contents Comment
R Read events
W Write events
IW Initial Write events “writes” that set a variable's initial value
FW Final Write events values that are tested in the final assertion
M Memory access events same as “R | W
B Branch events
RMW Read-Modify-Write events the component events of an atomic RMW instruction
F Fence events also known as Barrier events
_ All events wildcard

In addition, using the “enum” and “instructions” statements, a Bell file can define an enumerated list of tag values and specify that instructions of a specific kind must be labelled with one of these tags. herd then creates a set for each tag value; the name of the set is the same as the name of the tag but with the first letter capitalized, and the set contains all events generated from instructions labelled by the corresponding tag. For example, the following cat code:

enum Accesses = 'once || 'release || 'acquire || 'deref
instructions R[{'once,'acquire,'deref}]
defines a bunch of Accesses tags, and says that all read (“R”) instructions should be labelled with a “once”, an “acquire”, or a “deref” tag. Given this, the “Once” set would contain all events corresponding to an instruction (possibly a read, possibly something else) labelled with the “once” tag, and similarly for the “Release”, “Acquire” and “Deref” sets.

herd also comes with a selection of built-in relations, some of which we have already mentioned:

Name Relation Comment
0 Empty empty relation, contains no links
id Identity links each event to itself
int Internal links events that are in the same thread
ext External links events that are in different threads
loc Location links memory-access events that target the same variable
rf Reads-From links a write event to any read event that loads the value stored by that write
rmw Read-Modify-Write links the read and write component events of an RMW instruction
po Program Order links events in the same thread, in the order they occur in the instruction stream
addr Address dependency links a read event to any memory-access event whose target address depends on the value loaded by the read
ctrl Control dependency links a read event to all events that are executed conditionally depending on the value loaded by the read
data Data dependency links a read event to any write event that stores a value which depends on the value loaded by the read

herd's standard libraries stdlib.cat (automatically included for all models) and cos.cat define a few extra relations, which can be used as though they were built-in:

Name Relation Comment
po-loc po for the same location links memory-access events that target the same variable, in program order; same as “po & loc
rfe external reads-from rf restricted to pairs of accesses in different threads; same as “rf & ext
rfi internal reads-from rf restricted to pairs of accesses in the same thread; same as “rf & int
co coherence order total ordering of all writes to the each variable
coe external coherence order co restricted to pairs of writes in different threads; same as “co & ext
coi internal coherence order co restricted to pairs of writes in the same thread; same as “co & int
fr from-read links a read event to any write event for the same variable that comes after (in the variable's coherence order) the write which the read event reads from; same as “rf^-1 ; co
fre external from-read fr restricted to pairs of accesses in different threads; same as “fr & ext
fri internal from-read fr restricted to pairs of accesses in the same thread; same as “fr & int

Bell and Cat files can compute their own sets and relations using functions and operators provided by the cat language and libraries, as summarized in the following table. Operators with higher precedence (tighter binding) are higher up in the table.

Operator Operation Example Applicability
domain Domain of a relation domain(x)
computes the set of all events that are the start of a link in x
Applies to a relation, yields a set
range Range of a relation range(x)
computes the set of all events that are the end of a link in x
Applies to a relation, yields a set
fencerel Link events separated by a fence fencerel(x)
computes the relation consisting of all pairs of events where the first precedes (in program order) an event in the set x and the second follows it; the same as “(po & (_ * x)) ; po
Applies to a set, yields a relation
[ ] Identity on a set [x]
computes the identity relation restricted to events in the set x
Applies to a set, yields a relation
Postfix ^-1 Inversion x^-1
computes the relation obtained by reversing all the links in x
Applies to a relation
Postfix ? Reflexive closure x?
computes the relation consisting of all pairs of events that can be connected by a chain of links from x of length 0 or 1; the same as “id | x
Applies to a relation
Postfix + Transitive closure x+
computes the relation consisting of all pairs of events that can be connected by a chain of links from x of length 1 or greater; the same as “x | (x;x) | (x;x;x) | ...
Applies to a relation
Postfix * Reflexive-transitive closure x*
computes the relation consisting of all pairs of events that can be connected by a chain of links from x of length 0 or greater; the same as “id | x | (x;x) | (x;x;x) | ...
Applies to a relation
Prefix ~ Complement ~x
computes the relation (or set) consisting of all links (or events) not in x
Applies to a relation or a set
Infix * Cartesian product x * y
computes the relation consisting of all links from an event in set x to an event in set y
Applies to sets, yields a relation
\ Difference x \ y
computes the relation (or set) consisting of all links (or events) in x that are not in y
Applies to relations or sets
& Intersection x & y
computes the relation (or set) consisting of all links (or events) in both x and y
Applies to relations or sets
; Sequencing x ; y
computes the relation consisting of all links a⟶c such that for some event b, x contains a⟶b and y contains b⟶c
Applies to relations
| Union x | y
computes the relation (or set) consisting of all links (or events) in x or y or both
Applies to relations or sets
(* *) Encloses comments (* This is a comment *)

Although the language includes a variety of statement types, the ones found most frequently in Bell and Cat files are assignment (“let” or “let rec”) and check statements. “let” statements are self-explanatory; we have already seen several examples in the RMO memory models above. “let rec” statements, used for mutually recursive definitions, are more complex; we will see an example here. Check statements come in three forms:

  • acyclic <expr>” requires the relation computed from “expr” not to have any cycles;
  • irreflexive <expr>” requires the relation computed from “expr” not to have any links from an event to itself;
  • empty <expr>” requires the set or relation computed from “expr” to be empty.
If a check fails, it indicates that the memory model prohibits the candidate execution under consideration; that is, the memory model says that this execution could never occur.

A check can be negated by prefixing it with “~”. Also, a check can be flagged by putting the “flag” keyword in front of it. Unlike a normal check, when a flagged check succeeds it indicates that the execution has encountered some sort of semantic problem. If this happens, herd adds a warning message to its output.

Along with the “let”, “include”, and check statements, the only other types of statement we will see are “enum” and “instructions”, both briefly mentioned earlier.

The cat language supports many features we won't cover here, including higher-order sets and tuples, pattern matching, user-defined functions, iteration, and recursion in the style of OCaml (the language herd is written in). herd itself also has a large number of features we won't discuss, such as the ability to skip certain checks when testing a memory model or to produce figures (like the ones in this document) illustrating the events and relations in a particular program execution. More complete documentation can be found in the herd manual.

With this background, we are ready to examine larger, more realistic memory models. Something like the Introduction to the Strong Linux-Kernel Memory Model, in fact.