From the post:
Yet, we have seen many programs get really good performance with Cluster OpenMP. So, how can you know whether your code is one of the high-performance ones or the REALLY SLOW ones?
One could ask the same question about any kind of software, with respect to multiple processors and any architecture. How can we know if our code will run well on multi-core? If there are memory accesses to the same locations, there's always going to be serialization -- locks, cache coherency issues, and so on. This is not something that is going to go away with a new compiler, a new programming paradigm, some sort of clever network interconnect. It's a fundamental aspect of the problem being solved. For a given program, is it going to scale well to multiple cores, or will it be one of the REALLY SLOW ones?
Hoeflinger notes that applications like ray tracing work well on distributed machines. Ray tracing is the poster child for the parallel community; one might note that parallel POVRAY benchmarks have been around since the
dawn of time. If an application has the obvious parallelism of ray tracing, and there's value in speeding it up, chances are it's already parallel.
What kind of program is a simple question, and a lot of the reason I'm ranting here. If we're not going to increase serial clock rates (and that seems to be impractical), we need to know what sort of programs work well with multiple cores. Ray tracing? check. High volume web servers? check. Something that will draw in lots of customers and serve a broad audience? hmmmm.
My research group is working on this. We've got some ideas, some things we're trying. And we're doing our best to not fall into the same rabbit holes that so many others seem to enjoy