HP-SEE Computing Challenge
We appreciate your programming skills, but at the same time we offer you a challenge! Are you able to write the fastest matrix-matrix multiplication code?
Consider a matrix of a variable size N × N and a matrix-matrix multiplication code given here. Try to modify this code and improve its performance. The modifications can be implemented in C and/or assembly language, with OpenMP parallelization. The total number of floating point operations during the matrix-matrix multiplication is (2N − 1)N^2.
In addition to the main(int argc, char **argv) function, initial version of the code makes use of three functions:
- gettime(void) returns system time and is used for timing the code;
- matfill(long N, double *mat, double val) initializes all elements of an N×N matrix mat to the value val; in this way you can easily check that new algorithm produces correct result, but the algorithm has to cover randomly generated matrices A and B;
- matmul(long N, double *a, double *b, double *c) performs matrix-matrix multiplication of matrices a and b and stores the result in a matrix c. All matrices have the same size N × N.
Modifications of the provided code should be made only in the function matmul(long N, double *a, double *b, double *c). The format of the input and output data should remain the same: the code should only accept one argument from the command line (the linear matrix size N), and when executed, it should produce a single-line output, containing matrix size, the elapsed time (in seconds), and the estimate count of floating point operations per second (flops). For example:
[[email protected] mmm]$ ./mmm-v1 1000
1000 7.144675e+00 2.799288e+08
The performance of your code will be measured on a computer with two quad-core Intel Xeon E5345 @ 2.33 GHz CPUs (4 MB of L3 cache) and 32 GB of RAM. The code will be compiled with the Intel compiler (version 13.0.0) without optimization (-O0) and with the flag for OpenMP parallelization (-openmp). No external library may be called, while any reordering of matrix elements (such as transposition) must be done inside the timed function matmul. The submitted source code must run without any errors and memory leaks, and the accuracy of the multiplication result must correspond to 64-bit (double precision). Scoring will be done for the sizes N = 400, 800, 1200, 1600, 2400, 3200, 4800, and 9600. The same code has to be used for all matrix sizes, so hard-coding of particular values of N is not allowed.
Results of the HP-SEE Computing Challenge
The HP-SEE Computing Challenge was organized between 26 September and 15 November 2012. In total, 49 codes were submitted by 14 contestants from 5 different countries. The contestants have been able to submit several versions of the code, but only the code with the highest performance from each contestant was taken into account. The performance of each code was measured manually by the challenge committee. The code with the highest overall performance was submitted by Alexandros Sokratis Papadakis (University of Patras, Greece). Figure below gives the comparison of performances of the contestants’ codes measured for matrix sizes N = 400, 800, 1200, 1600, 2400, 3200, 4800, and 9600.
INIT - performance of the initial version of the code
Eligible contestants are ungraduate/graduate students from the following countries: Greece, Bulgaria, Romania, Hungary, Serbia, FYR of Macedonia, Albania, Bosnia & Herzegovina, Montenegro, Moldova, Armenia, Georgia, Azerbaijan.
HP-SEE computing challenge is open up to 31 October 2012. Deadline extended to 15 November 2012.
Contestant with the best code will be awarded with cost coverage of up to 900 Euros for his participation in an HPC training event organised by either HP-SEE or PRACE. For information about upcoming training events please visit the HP-SEE Training Agenda page or the PRACE Training portal. The winner will have to provide us with a proof of his student status in order to be eligible to receive his/her prize.