Friday, February 20, 2015

Χρηση R815 στο HellasGRID

Πληροφορίες για τον R815
Προκειται για ενα server DELL R815 με 4 επεξεργαστες AMD Opteron 6378 και μνήμη 128 Gb LVRDIMM .
Με το εργαλείο cpuid.exe (παρέχεται μαζί με τις βιβλιοθήκες AMD ACML) πέρνουμε την πληροφορία,

Chip manufacturer: AuthenticAMD
AuthenticAMD family 15 extended family 6 model 2
Model Name: AMD Opteron(tm) Processor 6378
Chip supports SSE
Chip supports SSE2
Chip supports SSE3
Chip supports AVX
Chip supports FMA3
Chip supports FMA4

============================================================
MESSy tests στο server  R815

MESSy configured as ECHAM5
Τα mpi commands γίνονται με το  mpich3-x86_64. Στο script χρησιμοποιούμε,
module load  mpi/mpich3-x86_64
Χρησιμοποιώ τον INTEL compiler .
module load intel/14.0 

netcdf/standard blas /lapack (-mieee-fp -O2 -fpp -heap-arrays)
(one day)
  Wallclock        :      89.30 s
  CPU-time (user)  :      72.08 s
  CPU-time (system):       1.57 s
  Ratio            :      82.47 %
(10 days)
  Wallclock        :    1006.33 s
  CPU-time (user)  :     835.18 s
  CPU-time (system):      14.96 s
  Ratio            :      84.48 %

Pnetcdf /standard blas /lapack (-mieee-fp -O2 -fpp -heap-arrays)
Σε αυτο το test χρησιμοποιω το parallel-netcdf-1.6.0

(one day) 
  Wallclock        :      93.03 s
  CPU-time (user)  :      71.84 s
  CPU-time (system):       1.49 s
  Ratio            :      78.82 %
(10 days) 
  Wallclock        :    1006.33 s
  CPU-time (user)  :     842.73 s
  CPU-time (system):      15.55 s
  Ratio            :      85.29 %


Pnetcdf/ACML (-mieee-fp -O2 -fpp -heap-arrays)
Σε αυτο το test χρησιμοποιώ την AMD BLAS/LAPACK
export LD_LIBRARY_PATH=...<BASE_DIR>/acml-5-3-1-intel/ifort64_fma4_mp
(one day) 
  Wallclock        :      83.37 s
  CPU-time (user)  :      69.18 s
  CPU-time (system):       1.52 s
  Ratio            :      84.80 %
(10 days)
  Wallclock        :     960.80 s
  CPU-time (user)  :     777.49 s
  CPU-time (system):      12.94 s
  Ratio            :      82.27 %

Pnetcdf/ACML (-mavx -fp-model source -O2 -fpp )
(echam5_v1.exe)
(one day)
  Wallclock        :      80.87 s
  CPU-time (user)  :      66.37 s
  CPU-time (system):       1.48 s
  Ratio            :      83.90 %
(10 days)
  Wallclock        :     823.92 s
  CPU-time (user)  :     677.38 s
  CPU-time (system):      14.33 s
  Ratio            :      83.95 %
 
Pnetcdf/ACML ( -msse4.2 -fp-model source -O3 -no-prec-div -fpp)
(echam5_v2.exe)
(one day)
  Wallclock        :      78.56 s
  CPU-time (user)  :      52.68 s
  CPU-time (system):       1.61 s
  Ratio            :      69.11 %
(10 days)
Wallclock        :     637.83 s
  CPU-time (user)  :     496.92 s
  CPU-time (system):      13.70 s
  Ratio            :      80.06 %

MESSy configured as EMAC (chemistry: (((Tr && (G || Het) && \!I) || St) && \!Hg) )

Pnetcdf/ACML ( -msse4.2 -fp-model source -O3 -no-prec-div -fpp)
(echam5_v3.exe)
1 day (2x16)
  Wallclock        :     566.20 s
  CPU-time (user)  :     551.46 s
  CPU-time (system):       2.07 s
  Ratio            :      97.76 %
1 day (3x16)
  Wallclock        :     518.25 s
  CPU-time (user)  :     504.30 s
  CPU-time (system):       2.75 s
  Ratio            :      97.84 %
1 day (4x16)
  Wallclock        :     423.08 s
  CPU-time (user)  :     409.29 s
  CPU-time (system):       1.86 s
  Ratio            :      97.18 %
1 day (8x8)
  Wallclock        :     423.18 s
  CPU-time (user)  :     409.49 s
  CPU-time (system):       2.12 s
  Ratio            :      97.27 %


Pnetcdf/ACML ( -msse4.2 -fp-model source -O3 -no-prec-div -fpp)
(chemistry: G && St )
(one day)
  Wallclock        :     150.13 s
  CPU-time (user)  :     136.04 s
  CPU-time (system):       1.54 s
  Ratio            :      91.64 %


F90FLAGS= -msse4.2 -fp-model source -O3 -no-prec-div -fpp
(chemistry: G && St  ) configured for ml15002
1 day (4x16)
  Wallclock        :     180.07 s
  CPU-time (user)  :     159.60 s
  CPU-time (system):       2.22 s
  Ratio            :      89.87 %

1 day (3x16) 
 Wallclock        :     199.79 s
  CPU-time (user)  :     185.49 s
  CPU-time (system):       1.84 s
  Ratio            :      93.76 %

1 day (2x16)
  Wallclock        :     212.95 s
  CPU-time (user)  :     199.25 s
  CPU-time (system):       1.56 s
  Ratio            :      94.30 %

1 day (1x16)
  Wallclock        :     334.53 s
  CPU-time (user)  :     321.85 s
  CPU-time (system):       1.74 s
  Ratio            :      96.73 %



============================================================
MESSy tests σε INTEL CLUSTER (Thunder)

FFLAGS=   -mieee-fp -O3 -fpp -heap-arrays

(one day) 4x16
  Wallclock        :     284.52 s
  CPU-time (user)  :     263.37 s
  CPU-time (system):       6.45 s
  Ratio            :      94.83 %

FFLAGS=   -mieee-fp -O3 -fpp -march=corei7-avx -heap-arrays)
echam5_v1.exe
(one day) 4x16
  Wallclock        :     266.20 s
  CPU-time (user)  :     244.48 s
  CPU-time (system):       7.17 s
  Ratio            :      94.53 %

FFLAGS=   -mieee-fp -O3 -fpp -march=corei7-avx -heap-arrays
link to Intel MKL library
echam5_v2.exe
(one day) 4x16

  Wallclock        :     270.85 s
  CPU-time (user)  :     248.06 s
  CPU-time (system):       6.84 s
  Ratio            :      94.11 %

FFLAGS=   -O3 -fpp -march=corei7-avx -heap-arrays -fp-model source
echam5_v3.exe
(one day) 4x16
Wallclock        :     233.55 s
  CPU-time (user)  :     212.90 s
  CPU-time (system):       7.09 s
  Ratio            :      94.19 %