Back to Basics - What makes parallel programming hard?
Dual and even quad-processor computers may be increasingly common, but software that is designed to run in parallel is another story entirely. In this column, Darin Ohashi takes us back to basics to explain why designing a program to run in parallel is a whole different ball game.
In my previous column, I tried to convince you that going parallel is necessary for high performance applications. In this week's column, I will show you what makes parallel programming different and why that makes it harder.
We think of a function as a series of discrete steps. Let's say a function f is composed of steps f1, f2, f3... fn, and another function g is composed of steps g1, g2, g3...gn.
A single-threaded application can run f and g in only two ways:
f(); g(); or g(); f();
That would lead to two possible orders for the steps: f1, f2, f3... fn, g1, g2, g3... gn or g1, g2, g3... gn, f1, f2, f3... fn. The order is determined by the order in which the application calls f() and g(). And that, in turn, is determined by the programmer who wrote the application.
Now let's consider what happens if f and g are called in parallel - that is to say, at the same time. Although f2 will still execute before f3, there is no control over the order in which f2 will be executed relative to g2. Thus, the order could be g1, g2, f1, g3, f2, f3 .... or f1, g1, f2, f3, g2, g3... or any other valid order. Each time the application calls these functions in parallel, the steps could be executed in a different order. Given enough executions every possible order could eventually happen.
Suppose that fi must execute before gj for the code to execute correctly, for any i and j. Then executing f and g in parallel will occasionally fail, even if fi is the first step of f and gj is the last step of g. Every possible ordering will eventually occur. Therefore to write correct parallel code the programmer must guarantee that the code is correct for all possible orderings. Bugs caused by the order in which threads execute are often referred to as race conditions (you can imagine two or more threads racing towards a critical point; the correctness depends on who wins the race).
Consider a function f, that accepts a single integer n as an argument. f increments a global variable g, n times in a loop. If we were to run f twice in series, passing 1000 each time, the final value of g would be 2000. What happens if we run f in two threads in parallel, passing both calls 1000? We might expect to get 2000, as we did single threaded. However you probably won't, instead you'll get some number between 1000 and 2000, some of the increments will be lost. This happens because occasionally the two threads will attempt to update g at the same time, but only one of the updates will be stored, the other will be lost. Further, if we were to execute this example multiple times, you would see different answers each time. For each execution, the threads will be scheduled differently and this will cause a different number of conflicts.
There are orders where the total will be 2000, and in fact, every value between 1000 and 2000 could occur. So, even for this simple example, there are 1000 different possible outcomes, and even more possible orderings.
The order in which single threaded code executes is determined by the programmer. This makes the execution of a single threaded application relatively predictable. When run in parallel, the order in which the code executes is not predictable. This makes it much harder to guarantee that code will execute correctly.