Optimization/Porting

Many codes that are used in eSTICC have been optimized or ported to new HPC platforms with support from CSC.

Optimization of NorESM table look-up routines

Tests with idealized cases were run on different platforms, Cray XT 30 and HP ProLiant Cluster, as well as different compiler suites, Cray, Intel and GNU compilers. At best about 50% faster run times for 5D case was achieved. In a further step, real-world tests shall help to profile optimize similar lookup-tables that are implemented in NorESM, which is work in progress. This is a direct support of activities in WP4 (CSC, MetNor).

Optimization of FLEXInvert

eSTICC helped in improving the serial performance of FlexINVERT, an atmospheric Bayesian inversion framework. For a small test case the computing time has been reduced to less than one third of the original. This was achieved by profiling the code and rearranging data as well as optimizing loops. This is a direct support of activities in WP2 (CSC, NILU).

Optimization of Elmer(/Ice) on Intel Xeon and Xeon Phi platforms

The role of CSC as a partner in the Intel Parallel Computing Center (IPCC) we could take advantage of early access to the new Xeon Phi processor (Knights Landing – KNL) hardware as well as direct support from Intel in porting the community ice sheet code to this platform. First benchmark tests with the code show a similar performance on KNL in comparison to a standard compute node equipped with Xeon CPU’s. Elmer as a whole code base is also currently optimized within IPCC on the MPI level, which will also feed back into the performance of the ice dynamic module, Elmer/Ice. Further activity will concentrate on threaded and SIMD approach to bulk-assembly, which is achieved by adding missing components of the bilinear forms needed in FEM to the code-base. IPCC has been prolonged until almost the end of eSTICC lifetime. This cooperation is work in progress (CSC, Intel Corp.).

Porting of GPGPU ocean model

CSC and FMI got direct support from NVIDIA (J.Appleyard) concerning the GPGPU (OpenACC) version of NEMO. The test of NEMO GPGPU version ported to CSC’s Bull cluster (equipped with K40 accelerator cards) brought a lot of good information on the principles of porting an OpenACC-code (using PGI compiler enabling card-to-card communication). The fact that the setup of CSC’s Bull cluster only gave an acceleration of factor 2 compared to a pure CPU version, nevertheless, excluded this platform for serious consideration of production runs (CSC,FMI,NVIDIA).

Continuation and finalization of SVALI projects

eSTICC contributed to making projects from the NCoE SVALI sustainable by continuing and finalizing the following projects: