A plane wave DFT + CPMD calculation from our user is unscaled beyond 16 cores, ( I cannot disclose his input and results here, sorry).

I reproduce this performance on a Altix UV 1000 server (a SMP machine, each blade is equipped with two 8-core Intel CPU):

          NWPW  DFT                     CPMD
 N     CPU        WALL      |       CPU         WALL
 1    633.6s     634.8s     |     1335.2s     1340.0s
 2    355.4s     359.8s     |      790.8s      802.5s
 4    192.7s     194.3s     |      466.6s      469.4s
 8    116.9s     118.0s     |      339.1s      339.9s
16    159.8s     163.9s     |      494.8s      496.1s
32    326.0s     352.1s     |      830.7s      894.6s
64    332.0s     341.5s     |      962.4s      982.3s

The code was compiled with ifort 11.1.038 and gcc 4.3.4 and SGI MPT 2.04, no external math library was used in this testing build. We have reproduced the excellent scalability for the C240 bulkyball DFT calculations (atomic basis set) on this system before.

Can anyone comment on the scalability of NWPW and CPMD module in NWChem?

I would also like to see if I can reproduce the performance for UO_2^{+2} (H_2O)_{122}, Zn_2 (H_2O)_{64} and the 80 atom cell of hematite calculation reported in on our Altix uv 1000 machine. Can anyone share me the input file?

