My M.Sc. thesis is about predicting performance of parallel algorithms implemented in C++. I look at basic communication, matrix multiplication, and longest common subsequence computation.
Krusche, P. & Tiskin, A., 2006. Efficient Longest Common Subsequence Computation Using Bulk-Synchronous Parallelism. In Proceedings of ICCSA 2006, LNCS 3984. pp. 165–174.
Krusche, P., 2005. Experimental Evalution of BSP Programming Libraries. Parallel Processing Letters, 18(1), pp.7–21.
The model of bulk-synchronous parallel computation (BSP) helps to implement portable general purpose algorithms while maintaining predictable performance on different parallel computers. In the last few years, frameworks for implementing BSP algorithms were proposed, each having different optimizations and programming models. This work gives an overview of approaches to implementing BSP algorithms in C/C++ or Fortran and of methods for predicting their performance. Experiments were run on three parallel machines, using optimized special purpose communications libraries for BSP algorithms. In the first set of experiments, communication and computational performance of all three parallel computers was measured separately to obtain the machine dependent parameters that describe them in the BSP model. The second and third set of experiments were concerned with measuring the performance for two common types of algorithms: memory efficient matrix multiplication and longest common subsequence computation. Based on the experimental results, we compare the performance of the matrix multiplication implementation to an optimized standard library and study performance predictability. Simple extensions to the standard BSP model for performance prediction are shown, their accuracy is evaluated and effects that cause prediction errors are discussed. The results indicate that the performance of BSP algorithm implementations can be highly dependent on the communication library that is used and hence compare their performance using different optimized communication libraries on different systems. When using the best suited library on each system, BSP implementations can achieve predictable performance and efficiency competitive with optimized standard libraries.