Πληροφορίες για τον R815
Προκειται για ενα server DELL R815 με 4 επεξεργαστες AMD Opteron 6378 και μνήμη 128 Gb LVRDIMM .
Με το εργαλείο cpuid.exe (παρέχεται μαζί με τις βιβλιοθήκες AMD ACML) πέρνουμε την πληροφορία,
Chip manufacturer: AuthenticAMD
AuthenticAMD family 15 extended family 6 model 2
Model Name: AMD Opteron(tm) Processor 6378
Chip supports SSE
Chip supports SSE2
Chip supports SSE3
Chip supports AVX
Chip supports FMA3
Chip supports FMA4
============================================================
MESSy tests στο server R815
MESSy configured as ECHAM5
Τα mpi commands γίνονται με το mpich3-x86_64. Στο script χρησιμοποιούμε,
module load mpi/mpich3-x86_64
Χρησιμοποιώ τον INTEL compiler .
module load intel/14.0
netcdf/standard blas /lapack (-mieee-fp -O2 -fpp -heap-arrays)
(one day)
Wallclock : 89.30 s
CPU-time (user) : 72.08 s
CPU-time (system): 1.57 s
Ratio : 82.47 %
(10 days)
Wallclock : 1006.33 s
CPU-time (user) : 835.18 s
CPU-time (system): 14.96 s
Ratio : 84.48 %
Pnetcdf /standard blas /lapack (-mieee-fp -O2 -fpp -heap-arrays)
Σε αυτο το test χρησιμοποιω το parallel-netcdf-1.6.0
(one day)
Wallclock : 93.03 s
CPU-time (user) : 71.84 s
CPU-time (system): 1.49 s
Ratio : 78.82 %
(10 days)
Wallclock : 1006.33 s
CPU-time (user) : 842.73 s
CPU-time (system): 15.55 s
Ratio : 85.29 %
Pnetcdf/ACML (-mieee-fp -O2 -fpp -heap-arrays)
Σε αυτο το test χρησιμοποιώ την AMD BLAS/LAPACK
export LD_LIBRARY_PATH=...<BASE_DIR>/acml-5-3-1-intel/ifort64_fma4_mp
(one day)
Wallclock : 83.37 s
CPU-time (user) : 69.18 s
CPU-time (system): 1.52 s
Ratio : 84.80 %
(10 days)
Wallclock : 960.80 s
CPU-time (user) : 777.49 s
CPU-time (system): 12.94 s
Ratio : 82.27 %
Pnetcdf/ACML (-mavx -fp-model source -O2 -fpp )
(echam5_v1.exe)
(one day)
Wallclock : 80.87 s
CPU-time (user) : 66.37 s
CPU-time (system): 1.48 s
Ratio : 83.90 %
(10 days)
Wallclock : 823.92 s
CPU-time (user) : 677.38 s
CPU-time (system): 14.33 s
Ratio : 83.95 %
Pnetcdf/ACML ( -msse4.2 -fp-model source -O3 -no-prec-div -fpp)
(echam5_v2.exe)
(one day)
Wallclock : 78.56 s
CPU-time (user) : 52.68 s
CPU-time (system): 1.61 s
Ratio : 69.11 %
(10 days)
Wallclock : 637.83 s
CPU-time (user) : 496.92 s
CPU-time (system): 13.70 s
Ratio : 80.06 %
MESSy configured as EMAC (chemistry: (((Tr && (G || Het) && \!I) || St) && \!Hg) )
Pnetcdf/ACML ( -msse4.2 -fp-model source -O3 -no-prec-div -fpp)
(echam5_v3.exe)
1 day (2x16)
Wallclock : 566.20 s
CPU-time (user) : 551.46 s
CPU-time (system): 2.07 s
Ratio : 97.76 %
1 day (3x16)
Wallclock : 518.25 s
CPU-time (user) : 504.30 s
CPU-time (system): 2.75 s
Ratio : 97.84 %
1 day (4x16)
Wallclock : 423.08 s
CPU-time (user) : 409.29 s
CPU-time (system): 1.86 s
Ratio : 97.18 %
1 day (8x8)
Wallclock : 423.18 s
CPU-time (user) : 409.49 s
CPU-time (system): 2.12 s
Ratio : 97.27 %
Pnetcdf/ACML ( -msse4.2 -fp-model source -O3 -no-prec-div -fpp)
(chemistry: G && St )
(one day)
Wallclock : 150.13 s
CPU-time (user) : 136.04 s
CPU-time (system): 1.54 s
Ratio : 91.64 %
F90FLAGS= -msse4.2 -fp-model source -O3 -no-prec-div -fpp
(chemistry: G && St ) configured for ml15002
1 day (4x16)
Wallclock : 180.07 s
CPU-time (user) : 159.60 s
CPU-time (system): 2.22 s
Ratio : 89.87 %
1 day (3x16)
Wallclock : 199.79 s
CPU-time (user) : 185.49 s
CPU-time (system): 1.84 s
Ratio : 93.76 %
1 day (2x16)
Wallclock : 212.95 s
CPU-time (user) : 199.25 s
CPU-time (system): 1.56 s
Ratio : 94.30 %
1 day (1x16)
Wallclock : 334.53 s
CPU-time (user) : 321.85 s
CPU-time (system): 1.74 s
Ratio : 96.73 %
============================================================
MESSy tests σε INTEL CLUSTER (Thunder)
FFLAGS= -mieee-fp -O3 -fpp -heap-arrays
(one day) 4x16
Wallclock : 284.52 s
CPU-time (user) : 263.37 s
CPU-time (system): 6.45 s
Ratio : 94.83 %
FFLAGS= -mieee-fp -O3 -fpp -march=corei7-avx -heap-arrays)
echam5_v1.exe
(one day) 4x16
Wallclock : 266.20 s
CPU-time (user) : 244.48 s
CPU-time (system): 7.17 s
Ratio : 94.53 %
FFLAGS= -mieee-fp -O3 -fpp -march=corei7-avx -heap-arrays
link to Intel MKL library
echam5_v2.exe
(one day) 4x16
Wallclock : 270.85 s
CPU-time (user) : 248.06 s
CPU-time (system): 6.84 s
Ratio : 94.11 %
FFLAGS= -O3 -fpp -march=corei7-avx -heap-arrays -fp-model source
echam5_v3.exe
(one day) 4x16
Wallclock : 233.55 s
CPU-time (user) : 212.90 s
CPU-time (system): 7.09 s
Ratio : 94.19 %
Προκειται για ενα server DELL R815 με 4 επεξεργαστες AMD Opteron 6378 και μνήμη 128 Gb LVRDIMM .
Με το εργαλείο cpuid.exe (παρέχεται μαζί με τις βιβλιοθήκες AMD ACML) πέρνουμε την πληροφορία,
Chip manufacturer: AuthenticAMD
AuthenticAMD family 15 extended family 6 model 2
Model Name: AMD Opteron(tm) Processor 6378
Chip supports SSE
Chip supports SSE2
Chip supports SSE3
Chip supports AVX
Chip supports FMA3
Chip supports FMA4
============================================================
MESSy tests στο server R815
MESSy configured as ECHAM5
Τα mpi commands γίνονται με το mpich3-x86_64. Στο script χρησιμοποιούμε,
module load mpi/mpich3-x86_64
Χρησιμοποιώ τον INTEL compiler .
module load intel/14.0
netcdf/standard blas /lapack (-mieee-fp -O2 -fpp -heap-arrays)
(one day)
Wallclock : 89.30 s
CPU-time (user) : 72.08 s
CPU-time (system): 1.57 s
Ratio : 82.47 %
(10 days)
Wallclock : 1006.33 s
CPU-time (user) : 835.18 s
CPU-time (system): 14.96 s
Ratio : 84.48 %
Pnetcdf /standard blas /lapack (-mieee-fp -O2 -fpp -heap-arrays)
Σε αυτο το test χρησιμοποιω το parallel-netcdf-1.6.0
(one day)
Wallclock : 93.03 s
CPU-time (user) : 71.84 s
CPU-time (system): 1.49 s
Ratio : 78.82 %
(10 days)
Wallclock : 1006.33 s
CPU-time (user) : 842.73 s
CPU-time (system): 15.55 s
Ratio : 85.29 %
Pnetcdf/ACML (-mieee-fp -O2 -fpp -heap-arrays)
Σε αυτο το test χρησιμοποιώ την AMD BLAS/LAPACK
export LD_LIBRARY_PATH=...<BASE_DIR>/acml-5-3-1-intel/ifort64_fma4_mp
(one day)
Wallclock : 83.37 s
CPU-time (user) : 69.18 s
CPU-time (system): 1.52 s
Ratio : 84.80 %
(10 days)
Wallclock : 960.80 s
CPU-time (user) : 777.49 s
CPU-time (system): 12.94 s
Ratio : 82.27 %
Pnetcdf/ACML (-mavx -fp-model source -O2 -fpp )
(echam5_v1.exe)
(one day)
Wallclock : 80.87 s
CPU-time (user) : 66.37 s
CPU-time (system): 1.48 s
Ratio : 83.90 %
(10 days)
Wallclock : 823.92 s
CPU-time (user) : 677.38 s
CPU-time (system): 14.33 s
Ratio : 83.95 %
Pnetcdf/ACML ( -msse4.2 -fp-model source -O3 -no-prec-div -fpp)
(echam5_v2.exe)
(one day)
Wallclock : 78.56 s
CPU-time (user) : 52.68 s
CPU-time (system): 1.61 s
Ratio : 69.11 %
(10 days)
Wallclock : 637.83 s
CPU-time (user) : 496.92 s
CPU-time (system): 13.70 s
Ratio : 80.06 %
MESSy configured as EMAC (chemistry: (((Tr && (G || Het) && \!I) || St) && \!Hg) )
Pnetcdf/ACML ( -msse4.2 -fp-model source -O3 -no-prec-div -fpp)
(echam5_v3.exe)
1 day (2x16)
Wallclock : 566.20 s
CPU-time (user) : 551.46 s
CPU-time (system): 2.07 s
Ratio : 97.76 %
1 day (3x16)
Wallclock : 518.25 s
CPU-time (user) : 504.30 s
CPU-time (system): 2.75 s
Ratio : 97.84 %
1 day (4x16)
Wallclock : 423.08 s
CPU-time (user) : 409.29 s
CPU-time (system): 1.86 s
Ratio : 97.18 %
1 day (8x8)
Wallclock : 423.18 s
CPU-time (user) : 409.49 s
CPU-time (system): 2.12 s
Ratio : 97.27 %
Pnetcdf/ACML ( -msse4.2 -fp-model source -O3 -no-prec-div -fpp)
(chemistry: G && St )
(one day)
Wallclock : 150.13 s
CPU-time (user) : 136.04 s
CPU-time (system): 1.54 s
Ratio : 91.64 %
F90FLAGS= -msse4.2 -fp-model source -O3 -no-prec-div -fpp
(chemistry: G && St ) configured for ml15002
1 day (4x16)
Wallclock : 180.07 s
CPU-time (user) : 159.60 s
CPU-time (system): 2.22 s
Ratio : 89.87 %
1 day (3x16)
Wallclock : 199.79 s
CPU-time (user) : 185.49 s
CPU-time (system): 1.84 s
Ratio : 93.76 %
1 day (2x16)
Wallclock : 212.95 s
CPU-time (user) : 199.25 s
CPU-time (system): 1.56 s
Ratio : 94.30 %
1 day (1x16)
Wallclock : 334.53 s
CPU-time (user) : 321.85 s
CPU-time (system): 1.74 s
Ratio : 96.73 %
============================================================
MESSy tests σε INTEL CLUSTER (Thunder)
FFLAGS= -mieee-fp -O3 -fpp -heap-arrays
(one day) 4x16
Wallclock : 284.52 s
CPU-time (user) : 263.37 s
CPU-time (system): 6.45 s
Ratio : 94.83 %
FFLAGS= -mieee-fp -O3 -fpp -march=corei7-avx -heap-arrays)
echam5_v1.exe
(one day) 4x16
Wallclock : 266.20 s
CPU-time (user) : 244.48 s
CPU-time (system): 7.17 s
Ratio : 94.53 %
FFLAGS= -mieee-fp -O3 -fpp -march=corei7-avx -heap-arrays
link to Intel MKL library
echam5_v2.exe
(one day) 4x16
Wallclock : 270.85 s
CPU-time (user) : 248.06 s
CPU-time (system): 6.84 s
Ratio : 94.11 %
FFLAGS= -O3 -fpp -march=corei7-avx -heap-arrays -fp-model source
echam5_v3.exe
(one day) 4x16
Wallclock : 233.55 s
CPU-time (user) : 212.90 s
CPU-time (system): 7.09 s
Ratio : 94.19 %