6.6 cosmo segmentation violation

From NWChem

Viewed 779 times, With a total of 6 Posts
Jump to: navigation, search

Clicked A Few Times
Threads 6
Posts 25
I just upgraded to 6.6 (Ubuntu package 6.6+r27746-2), and things seemed ok until I tried a COSMO run. My input file:

start job

memory 2500 mb
scratch_dir /scratch

 N-hi                 -1.84595152    -1.71169857    -0.80468442
 H-hi                 -2.23135939    -2.13842499    -1.63435909
 H-hi                 -2.13056247    -2.19277487     0.03616278
 C-lo                 -1.99497458     1.82126368    -1.89647314
 C-lo                 -1.97147556     0.42852243    -1.94312899
 C-lo                 -1.93620542    -0.32250179    -0.75546229
 C-lo                 -1.93262749     0.35533944     0.47496152
 C-lo                 -1.95743521     1.74856253     0.51065998
 C-lo                 -1.98501983     2.49393120    -0.67116188
 H-lo                 -2.00395507     3.58461879    -0.63815448
 H-lo                 -2.02184088     2.38682430    -2.83047244
 H-lo                 -1.97716709    -0.09286832    -2.90333700
 H-lo                 -1.90489042    -0.22309453     1.40155848
 H-lo                 -1.95262993     2.25741650     1.47728966

  N-hi  library def2-tzvp
  H-hi  library def2-tzvp
  C-lo  library def2-svp
  H-lo  library def2-svp

  xc m06-2x
  grid fine
  iterations 50

  dielec 5.6968

  maxiter 100
  xyz strct

task dft optimize

The job ends in a segmentation violation error at the end of the COSMO gas phase calculation:

   convergence    iter        energy       DeltaE   RMS-Dens  Diis-err    time
 ---------------- ----- ----------------- --------- --------- ---------  ------
     COSMO gas phase
 d= 0,ls=0.0,diis     1   -287.2831281650 -5.58D+02  5.47D-03  5.30D-01   104.3
 Grid integrated density:      50.000013592439
 Requested integration accuracy:   0.10E-06
 d= 0,ls=0.0,diis     2   -287.3369826790 -5.39D-02  1.71D-03  7.43D-02   183.5
 Grid integrated density:      50.000013799568
 Requested integration accuracy:   0.10E-06
 d= 0,ls=0.0,diis     3   -287.3411124499 -4.13D-03  8.21D-04  3.53D-02   262.6
 Grid integrated density:      50.000013805174
 Requested integration accuracy:   0.10E-06
 d= 0,ls=0.0,diis     4   -287.3445806482 -3.47D-03  3.29D-04  1.26D-03   351.3
 Grid integrated density:      50.000013816685
 Requested integration accuracy:   0.10E-06
 d= 0,ls=0.0,diis     5   -287.3447274923 -1.47D-04  1.34D-04  2.06D-04   435.1
 Grid integrated density:      50.000013812989
 Requested integration accuracy:   0.10E-06
  Resetting Diis
 d= 0,ls=0.0,diis     6   -287.3447545517 -2.71D-05  2.41D-05  1.15D-05   538.5
 Grid integrated density:      50.000013813024
 Requested integration accuracy:   0.10E-06
 d= 0,ls=0.0,diis     7   -287.3447558069 -1.26D-06  8.43D-06  9.35D-07   630.1
 Grid integrated density:      50.000013815859
 Requested integration accuracy:   0.10E-06
 d= 0,ls=0.0,diis     8   -287.3447558024  4.49D-09  4.03D-06  1.16D-06   716.6
0:Segmentation Violation error, status=: 11
(rank:0 hostname:BeastOfBurden pid:20235):ARMCI DASSERT fail. ../../ga-5-4/armci/src/common/signaltrap.c:SigSegvHandler():315 cond:0
Last System Error Message from Task 0:: Numerical result out of range
MPI_ABORT was invoked on rank 0 in communicator MPI COMMUNICATOR 4 DUP FROM 0 
with errorcode 11.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
Last System Error Message from Task 1:: Numerical result out of range
[BeastOfBurden:20233] 1 more process has sent help message help-mpi-api.txt / mpi-abort
[BeastOfBurden:20233] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

When I don't use COSMO, the job succeeds. I tried changing the memory settings, but that didn't help.
As I understand, this package already has the cosmo_meminit-patch applied (if that matters).

Forum Vet
Threads 7
Posts 1355
Unfortunately, I have not been able to reproduce your failure with my 6.6 builds.
Could you try to run it by adding the "direct" keyword in the dft field to see if it makes any difference? How many processors are you using?

Clicked A Few Times
Threads 1
Posts 7
cosmo runs

Out of curiosity I checked your input too. It also runs runs fine on my workstation with build 6.6.

Your failure report indicates some issues with the ARMCI settings of your compilation.


Gets Around
Threads 33
Posts 138
Dear Drs. Ivo, Edoapra and Manfred

    Employing NWChem6.6 on MAC OS X EI Capitan 10.11.3 with mpich 3.1.4_1 installed, a
parallel run of the optimization by three cores through the original and unaltered input
converged at the 9th step and the following is obtained

Optimization converged

 Step       Energy      Delta E   Gmax     Grms     Xrms     Xmax   Walltime
---- ---------------- -------- -------- -------- -------- -------- --------
@ 9 -287.35442308 -1.5D-07 0.00006 0.00001 0.00038 0.00179 1042.2
                                    ok       ok       ok       ok  


Best regards!
Edited On 9:23:34 PM PST - Fri, Feb 19th 2016 by Xiongyan21

Forum Vet
Threads 7
Posts 1355
debian unstable OK
I have tried the input shown here in a debian sid/unstable installation with the 6.6+r27746-2 package and the code did not stop with the Segv reported.

Clicked A Few Times
Threads 6
Posts 25
In that case it must be my installation.

I tried running with "direct", but it gives the same result.

I also tried reinstalling the package; that didn't help either.

My machine is a workstation with two 12-core processors with hyperthreading. With nwchem 6.5 and openmpi 1.6 it was most efficient to run with 44 processes. Nwchem 6.6 needs openmpi 1.10, and it looks like the behavior changed a bit. I'm not quite sure yet what I the optimal way is now. I tried from 2 to 44 processes, but always get the above behavior.

Clicked A Few Times
Threads 6
Posts 25
I solved the problem by updating the libgcc1 and libgfortran3 packages and their dependencies to the latest version.

Now everything seems to work ok.

Thanks for narrowing it down to a local problem

Forum >> NWChem's corner >> Running NWChem

Who's here now Members 0 Guests 0 Bots/Crawler 1

AWC's: 2.5.10 MediaWiki - Stand Alone Forum Extension
Forum theme style by: AWC