Strange behavior running parallel

From NWChem

You are viewing a single post from the thread title above
Jump to: navigation, search

Click here for full thread
Just Got Here
Threads 1
Posts 4
Hello, I'm experienceing the stange issue that I'm not sure it's related to MPI, SGE scheduler or NWChem itself.
When running with 1, 2, 4 or 8 procs on a single node, it runs fine. But when I run with 6 or 12 procs, it failed with the error message below. And for certain input files, I get the same errors when running with a particular number of procs. Can some one explain this? And point me to a direction to troubleshoot this please.
symmetry adapt = T

Here is snippet from the output

Forming initial guess at       1.1s

Error in pstein5. eval  is different on processors 0 and 1 
Error in pstein5. me = 0 exiting via pgexit.
Error in pstein5. eval is different on processors 1 and 0
Error in pstein5. me = 1 exiting via pgexit.
Last System Error Message from Task 1:: Inappropriate ioctl for device
Last System Error Message from Task 0:: Inappropriate ioctl for device
 ME =                      0  Exiting via 
0:0: peigs error: mxpend:: 0
(rank:0 hostname:node13 pid:13469):ARMCI DASSERT fail. ../../ga-5-1/armci/src/common/armci.c:ARMCI_Error():208 cond:0
 ME =                      1  Exiting via 
1:1: peigs error: mxpend:: 0
(rank:1 hostname:node13 pid:13470):ARMCI DASSERT fail. ../../ga-5-1/armci/src/common/armci.c:ARMCI_Error():208 cond:0


MPI_ABORT was invoked on rank 1 in communicator MPI COMMUNICATOR 4 DUP FROM 0
with errorcode 0.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.


forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
libmkl_sequential 00002AD009ED6150 Unknown Unknown Unknown
Last System Error Message from Task 2:: Inappropriate ioctl for device
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
nwchem 0000000002FF271E Unknown Unknown Unknown
nwchem 0000000002FF11B6 Unknown Unknown Unknown
nwchem 0000000002F939B2 Unknown Unknown Unknown
nwchem 0000000002F4135B Unknown Unknown Unknown
nwchem 0000000002F46E53 Unknown Unknown Unknown
nwchem 0000000002EB5F3F Unknown Unknown Unknown
nwchem 0000000002E9108F Unknown Unknown Unknown
libc.so.6 000000309A432920 Unknown Unknown Unknown
libmpi.so.1 00002B3007E26A99 Unknown Unknown Unknown
libmpi.so.1 00002B3007D594C2 Unknown Unknown Unknown
mca_coll_tuned.so 00002B300DB4F8EE Unknown Unknown Unknown
mca_coll_tuned.so 00002B300DB58618 Unknown Unknown Unknown
libmpi.so.1 00002B3007D680FD Unknown Unknown Unknown
nwchem 0000000002E125A0 Unknown Unknown Unknown
nwchem 0000000002E72CB2 Unknown Unknown Unknown
nwchem 0000000002E4471B Unknown Unknown Unknown
nwchem 00000000009AA79E Unknown Unknown Unknown
nwchem 00000000009C7347 Unknown Unknown Unknown
nwchem 00000000009ACA49 Unknown Unknown Unknown
nwchem 00000000005B944A Unknown Unknown Unknown
nwchem 0000000000501C57 Unknown Unknown Unknown
nwchem 000000000050118B Unknown Unknown Unknown
nwchem 000000000064BE1F Unknown Unknown Unknown
nwchem 00000000005049C1 Unknown Unknown Unknown
nwchem 00000000004F17A2 Unknown Unknown Unknown
nwchem 00000000004E639B Unknown Unknown Unknown
nwchem 00000000004E5E7C Unknown Unknown Unknown
libc.so.6 000000309A41ECDD Unknown Unknown Unknown
nwchem 00000000004E5D79 Unknown Unknown Unknown
Last System Error Message from Task 3:: Inappropriate ioctl for device
forrtl: error (78): process killed (SIGTERM)


Who's here now Members 0 Guests 0 Bots/Crawler 0


AWC's: 2.5.10 MediaWiki - Stand Alone Forum Extension
Forum theme style by: AWC