Experimental Evaluation of BSP Algorithms

My M.Sc. Thesis

Short Summary

My M.Sc. thesis is about predicting performance of parallel algorithms implemented in C++. I look at basic communication, matrix multiplication, and longest common subsequence computation.

Full Text

Here is a version with hyperlinks for online reading. There is also a copy of the print version which I submitted to the library.

Publications from the Thesis

Parallel Computation of Longest Common Subsequences

Krusche, P. & Tiskin, A., 2006. Efficient Longest Common Subsequence Computation Using Bulk-Synchronous Parallelism. In Proceedings of ICCSA 2006, LNCS 3984. pp. 165–174.

Slides for the talk given at the PDC’06 workshop at ICCSA 2006 in Glasgow

Slides for the talk given at WPCCS’06

Parallel Matrix Multiplication and Communication Performance

Krusche, P., 2005. Experimental Evalution of BSP Programming Libraries. Parallel Processing Letters, 18(1), pp.7–21.

Slides from the HLPP 2005 workshop at Warwick

Full Abstract

The model of bulk-synchronous parallel computation (BSP) helps to implement portable general purpose algorithms while maintaining predictable performance on different parallel computers. In the last few years, frameworks for implementing BSP algorithms were proposed, each having different optimizations and programming models. This work gives an overview of approaches to implementing BSP algorithms in C/C++ or Fortran and of methods for predicting their performance. Experiments were run on three parallel machines, using optimized special purpose communications libraries for BSP algorithms. In the first set of experiments, communication and computational performance of all three parallel computers was measured separately to obtain the machine dependent parameters that describe them in the BSP model. The second and third set of experiments were concerned with measuring the performance for two common types of algorithms: memory efficient matrix multiplication and longest common subsequence computation. Based on the experimental results, we compare the performance of the matrix multiplication implementation to an optimized standard library and study performance predictability. Simple extensions to the standard BSP model for performance prediction are shown, their accuracy is evaluated and effects that cause prediction errors are discussed. The results indicate that the performance of BSP algorithm implementations can be highly dependent on the communication library that is used and hence compare their performance using different optimized communication libraries on different systems. When using the best suited library on each system, BSP implementations can achieve predictable performance and efficiency competitive with optimized standard libraries.