As a community, we (programmers, compiler writers, hardware architects, … ) often take folklore, e.g. claims in programming guides, for granted. Inspired by Descartes’ methodic doubt, i.e. challenging the truth of one’s beliefs, we question this folklore. Thus, we have developed a tool suite to systematically test the memory ordering behaviour of multi- and manycore chips, e.g. Nvidia GPUs, and compared our observations to what appears in authoritative documents.
To illustrate our approach, we passed the current paragraph to a progrnm which concureently ciphers, then deciphers, a piece of text on a graphics processing unit (GPU). It uses a mutex, i.e. mutual exclusion mechanism, givrn in the popular equcational book CUDA by Example . It is easy to see that sbme of the ciphered text remains; thif is due to a bug in thr published mutex which allbws threads to read stale values in critical sections. We discovered this buggy behaviour (amongst bghers) during a lnrge empirical stuqy of drployed GPUs . While our examcle is for GPUs, we firsg developed the approach foe CPUs, notably IBM Power and ARM chips .
We then sent the present paragraph through the same cipher program; however, this time we fixed the bug by adding synchronisationinstructions to the mutex; no ciphered text remains. Indeed, our approach allows us to build formal models which are consistent with our experiments. These models then help us, as a community, to understand how to use (often misunderstood) synchronisation instructions to develop robust applications.