Sunday, July 27, 2008

Take $1000 out of my pocket for Thinking Parallel

Over the past few years, I've sat through a few dozen keynote talks on multi-core. In almost all cases, there's some comment along the lines of computer scientists don't know how to think parallel, and if they'd just smarten up, parallel computing will take off. This usually comes from people who have a management background, and sometimes from electrical engineers.

Like it or not, there are some things that can't be made faster with parallel machines (as far as I, or a number of people I respect, know). So here's an opportunity to take $1000 out of my pocket, just for showing me how to think parallel. A very common task in software applications is finding the shortest path between a pair of vertices in a graph; the best known algorithm for this is Dijkstra's, which has a Big-O complexity of O(E + V log V). Dijkstra uses a heap, which is a serial bottleneck; there's no way that's obvious to me to use multiple processors.

The Bellman-Ford algorithm also computes shortest path, and is very very easy to make massively parallel. So why not use the parallel version, you ask? Bellman-Ford is O(EV), which has higher computational complexity. For even modest sized graphs, Dijkstra will be faster in terms of wall-clock time, require less total power, and does all of this with just a single processor. There are very few types of graphs where Bellman-Ford might be better; you'd have to work hard to find one where a serial Dijkstra would not blow the doors off a parallel Bellman-Ford. If your objective is to get the answer as fast as possible with the minimum power, serial Dijkstra is the obvious choice.

How to take $1000 out of my pocket.

So here's the offer. Show me how to think parallel for this problem. Demonstrate a factor of two speedup for shortest path, and use as many parallel processors as you want. The run-time comparison should be between comparable processors (for example, one core of a multi-core AMD/Intel/Sun processor versus all available cores, not a 1987 IBM PC vs a new Xeon). The speedup should come on a real graph, and be compared to a solid version of the serial Dijkstra code. In particular, let's use the 9th DIMACS Implementation Challenge for Shortest Path. We can use the full USA distance map (24 million nodes, 58 million arcs), and find the vertex with the greatest shortest path distance from vertex 0. No cheating by precomputing the result, though - this has to work for other graphs. You have to actually use the other processors, not just tune the serial code. I'll throw in another $1000 if you can show that your cleverness scales linearly up to 16 processors.

Why am I focused on this?

Shortest path is used all over the place; if you need it, it's part of the serial fraction that shows up in Amdahl's Law. There seems to be nothing that can be done about it; if you need to know the shortest way to get from point A to point B, you're stuck. Getting performance scaling to thousands of cores isn't just about having a clever architecture, or a fancy language, or thinking parallel. For massive scaling, you need to eliminate almost all serial bottlenecks, and Dijkstra is one of the ones sitting in the way. With a non-zero serial fraction, the law kicks in, and it's not some sort of failing on the part of the programmer, the hardware, the language.

To pull in the bigger picture; shortest path is not the only problem where it's hard to get speedup. There are plenty of others, and if they're essential parts of an application, there's just no way to scale to large numbers of processors. All the wishing in the world won't change this. This is important to understand: the sooner we stop looking for some sort of magic bullet architecture/language/paradigm, the sooner we can start looking for real solutions.


Anonymous said...

your blog is really interesting one. But I am sure we can a discuss on something:

Amir said...

While it is likely, it has not yet been proven that the class of polylogarithmic time algorithms parallelized on polynomial prcoessors is not equal to the class P. This parallelizable class is called "Nick's Class" (NC) and NC != P is an open problem much like proving NP != P.

So Amdahl's law is indeed Amdahl's conjecture until you prove that there are such things as necessarily serial code. Unfortunately, offering up $1000 and not getting a winner does not qualify as a proof that NC != P.

It seems like it should be simpler to prove that NC = P and NP = P cannot both be true...

Aater Suleman said...

Shortest path can be parallelized. The speedup will not be impressive but it will be high enough to win your $1000. How can I claim it?

On a different note, nice blog. Great work. I agree with you that there are codes that cannot be paralllelized regardless of the language and framework. See my post on the same topic here: .