armci malloc:malloc 1 failed

Gets Around
Threads 18
Posts 67

I am using this code to calculate MR-CCSD energy of cubane transition state.

title "cubane MK-CCSD(2,2)/cc-pVDZ calculation"
scratch_dir /mnt/scratch
memory stack 100 mb heap 100 mb global 11000 mb
 symmetry C2v
 H         -1.32680        2.05167       -0.03231
 C         -0.78899        1.09343        0.03178
 H          1.32680        2.05167       -0.03231
 C          0.78899        1.09343        0.03178
 H         -1.48781       -0.00000        2.07216
 C         -1.20592        0.00000        1.01567
 H          1.48781       -0.00000        2.07216
 C          1.20592       -0.00000        1.01567
 H         -1.44937        0.00000       -1.98042
 C         -0.77946        0.00000       -1.10963
 H          1.44937        0.00000       -1.98042
 C          0.77946        0.00000       -1.10963
 H         -1.32680       -2.05167       -0.03231
 C         -0.78899       -1.09343        0.03178
 H          1.32680       -2.05167       -0.03231
 C          0.78899       -1.09343        0.03178
basis spherical
 H library cc-pVDZ
 C library cc-pVDZ
 2emet 1
 freeze atomic
 root 1
 nref 2
task tce energy

I am using Intel(R) Core(TM) i5-4670 CPU @ 3.40GHz with 8GB memory + 120Gb swap.
SHMMAX set to 16GB (echo 16384000000 > /proc/sys/kernel/shmmax)
With openblas parallelization I'm running calculation in single process.
mpirun -np 1 nwchem N8.nw > N8.nwo

But I've got issue in computation of 2-e integrals:

MRCC tiling completed in             0.0            0.0
tce_ao1e_fock2e       36.28000       36.36079
F:     1 in bytes =                87040
tce_mo1e        0.03200        0.06773
eone,etwo,enrep,energy  -1126.356944754460    460.249162408168    358.838446347621   -307.269335998670
mrcc_uhf_energy        8.78800        8.78590
tce_ao1e_fock2e       35.68800       35.74738
F:     2 in bytes =                87040
tce_mo1e        0.02400        0.02567
eone,etwo,enrep,energy  -1125.936417042576    459.900381432998    358.838446347621   -307.197589261957
mrcc_uhf_energy        9.31200        9.33206
2-e(intermediate) /mnt/scratch/cubane. in bytes=         8159223808
Ref.   1 Half 2-e         915.42        1115.33
V 2-e /mnt/scratch/cubane. in bytes=         1437934592
0:armci_malloc:malloc 1 failed: 1437934600
(rank:0 hostname:kbob-G41MT-S2 pid:23581):ARMCI DASSERT fail. ../../ga-5-2/armci/src/memory/memory.c:PARMCI_Malloc():880 cond:0

I tried to set the environment variable ARMCI_DEFAULT_SHMMAX in the different values (4096, 16000, 16384) but nothing has changed.
No additional errors reported with 16000 value.

I tried not to use GA IO-scheme.
In this case calculation of the 2-e integrals have been successfully completed, but MRCC iterations itself fails without GA initialization.

I am using development snapshot May 03, 2014 Nwchem-6.3.revision25564-src.2014-05-03 with patch

Can anyone help to figure out what might be the problem and show workarounds?
Edited On 8:36:39 PM PDT - Sat, May 31st 2014 by Vladimir

Forum Vet
Threads 7
Posts 1147
What value of ARMCI_NETWORK have you used for your installation? Have you left it undefined?
Thanks, Edo

Gets Around
Threads 18
Posts 67
my build file

#sudo apt-get install python2.7-dev zlib1g-dev libssl-dev gfortran
#Edit src/config/makefile.h and add "-lz -lssl" to the end of line 2094 (needed by python)
export TCGRSH=/usr/bin/ssh
export NWCHEM_TOP=`pwd`
export NWCHEM_MODULES="all python"
export PYTHONHOME=/usr
export BLASOPT="-L/usr/lib/openblas-base -lopenblas"
export LIBRARY_PATH=$LIBRARY_PATH:/usr/lib/openblas-base
#sudo apt-get install libopenmpi-dev openmpi-bin
export USE_MPI=y
export USE_MPIF=y
export USE_MPIF4=y
export MPI_LOC=/usr/lib/openmpi/lib
export MPI_INCLUDE=/usr/lib/openmpi/include
export LIBMPI="-lmpi -lopen-rte -lopen-pal -ldl -lmpi_f77 -lpthread"
export LIBRARY_PATH=$LIBRARY_PATH:/usr/lib/openmpi/lib
export FC=gfortran
cd $NWCHEM_TOP/src
make clean
make nwchem_config
make > make.log 2>&1
cd ../contrib

I found some note about ARMCI_NETWORK on this forum
So in my case (with openblas parallelization I'm running mpirun -np 1 nwchem N8.nw).
Do I need to compile nwhem with mpi or can I compile without?
What value should I set for the variable ARMCI_NETWORK if I use only one node?
Edited On 1:10:23 AM PDT - Tue, Jun 3rd 2014 by Vladimir

Forum Vet
Threads 7
Posts 1147
I suggest you to try first with ARMCI_NETWORK=MPI-TS (the default value).
I have just tried you input and it works with the following memory and nproc=4

memory stack 400 mb heap 100 mb global 3100 mb

The ARMCI_DEFAULT_SHMMAX story become irrelevant for ARMCI_NETWORK=MPI-TS

Gets Around
Threads 18
Posts 67
Thank you very much, Edo.
Your magical spell worked perfectly.
But I still do not understand why the calculation do not work in a single thread.

Gets Around
Threads 18
Posts 67
I found another way to fix the issue.!msg/hpctools/-bYstidUAYA/LqZ38W1f1ukJ

just set
-#define DEFAULT_MAX_NALLOC   (4*1024*1024*16)
+#define DEFAULT_MAX_NALLOC   (8*1024*1024*16)

Incidentally, nproc=1 with openblas parallelization is 7 times faster than nproc=4.

Forum Vet
Threads 7
Posts 1147
Thank you very much for your feedback.
In order to reproduce your single processor find, I need a few more details about your setting.
1) memory line used in NWChem input file
2) value of ARMCI_NETWORK used

Cheers, Edo

Gets Around
Threads 18
Posts 67
1. Memory line like in 1-st post:
memory stack 400 mb heap 100 mb global 11000 mb

2. ARMCI_NETWORK not set (default).

