How to use ARMCI NETWORK for NWChem 6.3 on SGI ICE X

From NWChem

You are viewing a single post from the thread title above
Jump to: navigation, search

Click here for full thread
Clicked A Few Times
Threads 3
Posts 6
I compiled 6.3 with following settings:

setenv NWCHEM_TARGET LINUX64
setenv USE_MPI y
  1. setenv ARMCI_NETWORK VAPI
  2. setenv ARMCI_NETWORK MPI
  3. setenv ARMCI_NETWORK MPI2
setenv ARMCI_NETWORK OPENIB
setenv IB_HOME /usr
setenv IB_INCLUDE $IB_HOME/include
setenv IB_LIB $IB_HOME/lib64
setenv IB_LIB_NAME "-libverbs -libumad -lpthread"
setenv MA_USE_ARMCI_MEM 1
setenv MPI_LOC /opt/sgi/mpt/mpt-2.08
  1. setenv MPI_LOC /app/intel/impi/4.0.3.008/intel64/
setenv MPI_LIB $MPI_LOC/lib
setenv MPI_INCLUDE $MPI_LOC/include
setenv MPI_LIB $MPI_LOC/lib
setenv MPI_INCLUDE $MPI_LOC/include
setenv LIBMPI -lmpi
setenv NWCHEM_MODULES all
setenv DISABLE_F77 1
setenv MKL_LIB /app/intel/mkl/lib/intel64
setenv MKL_INC /app/intel/mkl/include
setenv INTEL_LIB /app/intel/lib/intel64/
setenv LASOPT "-L$MKL_LIB -I$MKL_INC -L$INTEL_LIB -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm"

But the binary doesn't work when it runs across nodes (it works within a node) , no matter what ARMCI_NETWORK used. It gives following errors:

from .out file:
argument  1 = kiet_scf.nw
-10016:Segmentation Violation error, status=: 11
(rank:-10016 hostname:r28i1n16 pid:2263722):ARMCI DASSERT fail. ../../ga-5-2/armci/src/common/signaltrap.c:SigSegvHandler():310 cond:0
16:Child process terminated prematurely, status=: 256
(rank:16 hostname:r28i1n16 pid:2263705):ARMCI DASSERT fail. ../../ga-5-2/armci/src/common/signaltrap.c:SigChldHandler():178 cond:0
-10000:Segmentation Violation error, status=: 11
(rank:-10000 hostname:r27i0n17 pid:4037891):ARMCI DASSERT fail. ../../ga-5-2/armci/src/common/signaltrap.c:SigSegvHandler():310 cond:0
0:Child process terminated prematurely, status=: 256
(rank:0 hostname:r27i0n17 pid:4037874):ARMCI DASSERT fail. ../../ga-5-2/armci/src/common/signaltrap.c:SigChldHandler():178 cond:0

from error file:

ARMCI master: wait for child process (server) failed:: No child processes
MPT: Global rank 16 is aborting with error code 256.
    Process ID: 2263705, Host: r28i1n16, Program: /work1/app/nwchem/nwchem-6.3.revision2b/bin/LINUX64/nwchem

Please advise!


Who's here now Members 0 Guests 0 Bots/Crawler 0


AWC's: 2.5.10 MediaWiki - Stand Alone Forum Extension
Forum theme style by: AWC