Over the past few years, I've sat through a few dozen keynote talks on multi-core. In almost all cases, there's some comment along the lines of computer scientists don't know how to think parallel, and if they'd just smarten up, parallel computing will take off. This usually comes from people who have a management background, and sometimes from electrical engineers.
Like it or not, there are some things that can't be made faster with parallel machines (as far as I, or a number of people I respect, know). So here's an opportunity to take $1000 out of my pocket, just for showing me how to think parallel. A very common task in software applications is finding the shortest path between a pair of vertices in a graph; the best known algorithm for this is Dijkstra's, which has a Big-O complexity of O(E + V log V). Dijkstra uses a heap, which is a serial bottleneck; there's no way that's obvious to me to use multiple processors.
The Bellman-Ford algorithm also computes shortest path, and is very very easy to make massively parallel. So why not use the parallel version, you ask? Bellman-Ford is O(EV), which has higher computational complexity. For even modest sized graphs, Dijkstra will be faster in terms of wall-clock time, require less total power, and does all of this with just a single processor. There are very few types of graphs where Bellman-Ford might be better; you'd have to work hard to find one where a serial Dijkstra would not blow the doors off a parallel Bellman-Ford. If your objective is to get the answer as fast as possible with the minimum power, serial Dijkstra is the obvious choice.
How to take $1000 out of my pocket.
So here's the offer. Show me how to think parallel for this problem. Demonstrate a factor of two speedup for shortest path, and use as many parallel processors as you want. The run-time comparison should be between comparable processors (for example, one core of a multi-core AMD/Intel/Sun processor versus all available cores, not a 1987 IBM PC vs a new Xeon). The speedup should come on a real graph, and be compared to a solid version of the serial Dijkstra code. In particular, let's use the 9th DIMACS Implementation Challenge for Shortest Path. We can use the full USA distance map (24 million nodes, 58 million arcs), and find the vertex with the greatest shortest path distance from vertex 0. No cheating by precomputing the result, though - this has to work for other graphs. You have to actually use the other processors, not just tune the serial code. I'll throw in another $1000 if you can show that your cleverness scales linearly up to 16 processors.
Why am I focused on this?
Shortest path is used all over the place; if you need it, it's part of the serial fraction that shows up in Amdahl's Law. There seems to be nothing that can be done about it; if you need to know the shortest way to get from point A to point B, you're stuck. Getting performance scaling to thousands of cores isn't just about having a clever architecture, or a fancy language, or thinking parallel. For massive scaling, you need to eliminate almost all serial bottlenecks, and Dijkstra is one of the ones sitting in the way. With a non-zero serial fraction, the law kicks in, and it's not some sort of failing on the part of the programmer, the hardware, the language.
To pull in the bigger picture; shortest path is not the only problem where it's hard to get speedup. There are plenty of others, and if they're essential parts of an application, there's just no way to scale to large numbers of processors. All the wishing in the world won't change this. This is important to understand: the sooner we stop looking for some sort of magic bullet architecture/language/paradigm, the sooner we can start looking for real solutions.
DAC 2012: Mystical Confluence: ESL Hockey Stick and The Cup! - Another note from DAC 2012: In Gary Smith’s Sunday night pre-DAC talk, he mentioned that in 2011, ESL tools took off – the famous Hockey Stick. See his s...
11 months ago