SEARCH
TOOLBOX
LANGUAGES
Forum Menu

Armci error 260 cond:0

From NWChem

You are viewing a single post from the thread title above
Jump to: navigation, search

Click here for full thread
Clicked A Few Times
Threads 2
Posts 5
Hello, the last few weeks, I have been trying to analyse a nwchem crash.
The input of the calculation is from the Benchmarks of this site and called C 240 Buckminster Fullerene.
This is being calculated on 32 nodes with 2 Xeon CPU's both with hyperthreading enabled so each compute
node has 4 computational units. The network interconnections are plain Gigabit Ethernet.

The first crashes were with a home built binary with O3 compiler optimisation. Then I built it again with
O2 optimisation and everything stops at the exactly same spot and both binarys stop after a computation
of almost equal duration. Now both builds were done with Intel MKL so the next step ist to remove MKL and
see what it does. Also the program is built with mpich2 and ifort compiler.

It seems that ARMCI is somehow incorrectly configured or somehow does not now how to communicate.
The significant error seems to be
ARMCI DASSERT fail. armci.c:ARMCI_Error():260 cond:0

I still have not dug into the code to find out what that means.

Here is an excerpt from the nwchem log.

dft energy failed                                                                       0
 ------------------------------------------------------------------------
 ------------------------------------------------------------------------
  current input line : 
   278: task dft energy
 ------------------------------------------------------------------------
 ------------------------------------------------------------------------
 This type of error is most commonly 
 associatated with calculations not reaching convergence criteria
 ------------------------------------------------------------------------
 For more information see the NWChem manual at 
 http://www.emsl.pnl.gov/docs/nwchem/nwchem.html
 For further details see manual section:
0:0:dft energy failed:: 0
(rank:0 hostname:j314.jotunn.rhi.hi.is pid:13071):ARMCI DASSERT fail. armci.c:ARMCI_Error():260 cond:0
Last System Error Message from Task 0:: Inappropriate ioctl for device
application called MPI_Abort(MPI_COMM_WORLD, 0) - process 0


I am working on testing some alternatives to try out: Eliminating MKL, Eliminating BLAS altogether, Trying Atlas and lapack.
Should I use Intel CC instead of the GNU CC.
Best regards, Anna Jonna.


Who's here now Members 0 Guests 0 Bots/Crawler 0


AWC's: 2.5.10 MediaWiki - Stand Alone Forum Extension
Forum theme style by: AWC