Over the past few years, I've sat through a few dozen keynote talks on multicore. In almost all cases, there's some comment along the lines of computer scientists don't know how to think parallel, and if they'd just smarten up, parallel computing will take off. This usually comes from people who have a management background, and sometimes from electrical engineers.
Like it or not, there are some things that can't be made faster with parallel machines (as far as I, or a number of people I respect, know). So here's an opportunity to take $1000 out of my pocket, just for showing me how to think parallel. A very common task in software applications is finding the shortest path between a pair of vertices in a graph; the best known algorithm for this is Dijkstra's, which has a BigO complexity of O(E + V log V). Dijkstra uses a heap, which is a serial bottleneck; there's no way that's obvious to me to use multiple processors.
The BellmanFord algorithm also computes shortest path, and is very very easy to make massively parallel. So why not use the parallel version, you ask? BellmanFord is O(EV), which has higher computational complexity. For even modest sized graphs, Dijkstra will be faster in terms of wallclock time, require less total power, and does all of this with just a single processor. There are very few types of graphs where BellmanFord might be better; you'd have to work hard to find one where a serial Dijkstra would not blow the doors off a parallel BellmanFord. If your objective is to get the answer as fast as possible with the minimum power, serial Dijkstra is the obvious choice.
How to take $1000 out of my pocket.
So here's the offer. Show me how to think parallel for this problem. Demonstrate a factor of two speedup for shortest path, and use as many parallel processors as you want. The runtime comparison should be between comparable processors (for example, one core of a multicore AMD/Intel/Sun processor versus all available cores, not a 1987 IBM PC vs a new Xeon). The speedup should come on a real graph, and be compared to a solid version of the serial Dijkstra code. In particular, let's use the 9th DIMACS Implementation Challenge for Shortest Path. We can use the full USA distance map (24 million nodes, 58 million arcs), and find the vertex with the greatest shortest path distance from vertex 0. No cheating by precomputing the result, though  this has to work for other graphs. You have to actually use the other processors, not just tune the serial code. I'll throw in another $1000 if you can show that your cleverness scales linearly up to 16 processors.
Why am I focused on this?
Shortest path is used all over the place; if you need it, it's part of the serial fraction that shows up in Amdahl's Law. There seems to be nothing that can be done about it; if you need to know the shortest way to get from point A to point B, you're stuck. Getting performance scaling to thousands of cores isn't just about having a clever architecture, or a fancy language, or thinking parallel. For massive scaling, you need to eliminate almost all serial bottlenecks, and Dijkstra is one of the ones sitting in the way. With a nonzero serial fraction, the law kicks in, and it's not some sort of failing on the part of the programmer, the hardware, the language.
To pull in the bigger picture; shortest path is not the only problem where it's hard to get speedup. There are plenty of others, and if they're essential parts of an application, there's just no way to scale to large numbers of processors. All the wishing in the world won't change this. This is important to understand: the sooner we stop looking for some sort of magic bullet architecture/language/paradigm, the sooner we can start looking for real solutions.
DAC 2012: Mystical Confluence: ESL Hockey Stick and The Cup!

Another note from DAC 2012: In Gary Smith’s Sunday night preDAC talk, he
mentioned that in 2011, ESL tools took off – the famous Hockey Stick. See
his s...
4 years ago
3 comments:
your blog is really interesting one. But I am sure we can a discuss on something:
http://avinashbhojwani.wordpress.com/2008/11/27/programmersvsarchitects/
While it is likely, it has not yet been proven that the class of polylogarithmic time algorithms parallelized on polynomial prcoessors is not equal to the class P. This parallelizable class is called "Nick's Class" (NC) and NC != P is an open problem much like proving NP != P.
So Amdahl's law is indeed Amdahl's conjecture until you prove that there are such things as necessarily serial code. Unfortunately, offering up $1000 and not getting a winner does not qualify as a proof that NC != P.
It seems like it should be simpler to prove that NC = P and NP = P cannot both be true...
Shortest path can be parallelized. The speedup will not be impressive but it will be high enough to win your $1000. How can I claim it?
On a different note, nice blog. Great work. I agree with you that there are codes that cannot be paralllelized regardless of the language and framework. See my post on the same topic here: http://www.futurechips.org/softwareforhardwareguys/parallelprogrammingframeworkssolvepartproblem.html .
Post a Comment