**We appreciate your programming skills, but at the same time we offer you a challenge! Are you able to write the fastest matrix-matrix multiplication code?**

Consider a matrix of a variable size **N × N** and a matrix-matrix multiplication code given here. Try to modify this code and improve its performance. The modifications can be implemented in C and/or assembly language, with OpenMP parallelization. The total number of floating point operations during the matrix-matrix multiplication is **(2N − 1)N^2**.

In addition to the **main(int argc, char **argv)** function, initial version of the code makes use of three functions:

**gettime(void)**returns system time and is used for timing the code;**matfill(long N, double *mat, double val)**initializes all elements of an**N×N**matrix mat to the value**val**; in this way you can easily check that new algorithm produces correct result, but the algorithm has to cover randomly generated matrices A and B;**matmul(long N, double *a, double *b, double *c)**performs matrix-matrix multiplication of matrices**a**and**b**and stores the result in a matrix**c**. All matrices have the same size**N × N**.

Modifications of the provided code should be made only in the function **matmul(long N, double *a, double *b, double *c)**. The format of the input and output data should remain the same: the code should only accept one argument from the command line (the linear matrix size **N**), and when executed, it should produce a single-line output, containing matrix size, the elapsed time (in seconds), and the estimate count of floating point operations per second (flops). For example:

**[[email protected] mmm]$ ./mmm-v1 1000
1000 7.144675e+00 2.799288e+08**

The performance of your code will be measured on a computer with two quad-core Intel Xeon E5345 @ 2.33 GHz CPUs (4 MB of L3 cache) and 32 GB of RAM. The code will be compiled with the Intel compiler (version 13.0.0) without optimization (**-O0**) and with the flag for OpenMP parallelization (**-openmp**). No external library may be called, while any reordering of matrix elements (such as transposition) must be done inside the timed function matmul. The submitted source code must run without any errors and memory leaks, and the accuracy of the multiplication result must correspond to 64-bit (double precision). Scoring will be done for the sizes **N = 400**, **800**, **1200**, **1600**, **2400**, **3200**, **4800**, and **9600**. The same code has to be used for all matrix sizes, so hard-coding of particular values of **N** is not allowed.

INIT - performance of the initial version of the code

Eligible contestants are ungraduate/graduate students from the following countries: Greece, Bulgaria, Romania, Hungary, Serbia, FYR of Macedonia, Albania, Bosnia & Herzegovina, Montenegro, Moldova, Armenia, Georgia, Azerbaijan.

Your version of the code for challenge has to be submitted via e-mail to This e-mail address is being protected from spambots. You need JavaScript enabled to view it . In the same e-mail you should provide the following information: full name, name of the institution, and country. One contestant can submit several versions of the code, but only the code with the best performance will be taken into account. Performance of the code will be measured manually by the challenge committee and published at this page together with challenge results. In addition, each contestant will receive challenge results by e-mail.

HP-SEE computing challenge is open up to 31 October 2012. Deadline extended to 15 November 2012.

Contestant with the best code will be awarded with cost coverage of up to 900 Euros for his participation in an HPC training event organised by either HP-SEE or PRACE. For information about upcoming training events please visit the HP-SEE Training Agenda page or the PRACE Training portal. The winner will have to provide us with a proof of his student status in order to be eligible to receive his/her prize.