6.6 cosmo segmentation violation

From NWChem

Viewed 779 times, With a total of 6 Posts
Jump to: navigation, search

Clicked A Few Times
Threads 6
Posts 25
I just upgraded to 6.6 (Ubuntu package 6.6+r27746-2), and things seemed ok until I tried a COSMO run. My input file:


start job

memory 2500 mb
scratch_dir /scratch

geometry
 N-hi                 -1.84595152    -1.71169857    -0.80468442
 H-hi                 -2.23135939    -2.13842499    -1.63435909
 H-hi                 -2.13056247    -2.19277487     0.03616278
 C-lo                 -1.99497458     1.82126368    -1.89647314
 C-lo                 -1.97147556     0.42852243    -1.94312899
 C-lo                 -1.93620542    -0.32250179    -0.75546229
 C-lo                 -1.93262749     0.35533944     0.47496152
 C-lo                 -1.95743521     1.74856253     0.51065998
 C-lo                 -1.98501983     2.49393120    -0.67116188
 H-lo                 -2.00395507     3.58461879    -0.63815448
 H-lo                 -2.02184088     2.38682430    -2.83047244
 H-lo                 -1.97716709    -0.09286832    -2.90333700
 H-lo                 -1.90489042    -0.22309453     1.40155848
 H-lo                 -1.95262993     2.25741650     1.47728966
end

basis
  N-hi  library def2-tzvp
  H-hi  library def2-tzvp
  C-lo  library def2-svp
  H-lo  library def2-svp
end

dft
  xc m06-2x
  grid fine
  iterations 50
end

cosmo
  dielec 5.6968
end

driver
  maxiter 100
  xyz strct
end

task dft optimize



The job ends in a segmentation violation error at the end of the COSMO gas phase calculation:



   convergence    iter        energy       DeltaE   RMS-Dens  Diis-err    time
 ---------------- ----- ----------------- --------- --------- ---------  ------
     COSMO gas phase
 d= 0,ls=0.0,diis     1   -287.2831281650 -5.58D+02  5.47D-03  5.30D-01   104.3
 Grid integrated density:      50.000013592439
 Requested integration accuracy:   0.10E-06
 d= 0,ls=0.0,diis     2   -287.3369826790 -5.39D-02  1.71D-03  7.43D-02   183.5
 Grid integrated density:      50.000013799568
 Requested integration accuracy:   0.10E-06
 d= 0,ls=0.0,diis     3   -287.3411124499 -4.13D-03  8.21D-04  3.53D-02   262.6
 Grid integrated density:      50.000013805174
 Requested integration accuracy:   0.10E-06
 d= 0,ls=0.0,diis     4   -287.3445806482 -3.47D-03  3.29D-04  1.26D-03   351.3
 Grid integrated density:      50.000013816685
 Requested integration accuracy:   0.10E-06
 d= 0,ls=0.0,diis     5   -287.3447274923 -1.47D-04  1.34D-04  2.06D-04   435.1
 Grid integrated density:      50.000013812989
 Requested integration accuracy:   0.10E-06
  Resetting Diis
 d= 0,ls=0.0,diis     6   -287.3447545517 -2.71D-05  2.41D-05  1.15D-05   538.5
 Grid integrated density:      50.000013813024
 Requested integration accuracy:   0.10E-06
 d= 0,ls=0.0,diis     7   -287.3447558069 -1.26D-06  8.43D-06  9.35D-07   630.1
 Grid integrated density:      50.000013815859
 Requested integration accuracy:   0.10E-06
 d= 0,ls=0.0,diis     8   -287.3447558024  4.49D-09  4.03D-06  1.16D-06   716.6
0:Segmentation Violation error, status=: 11
(rank:0 hostname:BeastOfBurden pid:20235):ARMCI DASSERT fail. ../../ga-5-4/armci/src/common/signaltrap.c:SigSegvHandler():315 cond:0
Last System Error Message from Task 0:: Numerical result out of range
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI COMMUNICATOR 4 DUP FROM 0 
with errorcode 11.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
Last System Error Message from Task 1:: Numerical result out of range
[BeastOfBurden:20233] 1 more process has sent help message help-mpi-api.txt / mpi-abort
[BeastOfBurden:20233] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages




When I don't use COSMO, the job succeeds. I tried changing the memory settings, but that didn't help.
As I understand, this package already has the cosmo_meminit-patch applied (if that matters).

Forum Vet
Threads 7
Posts 1355
Unfortunately, I have not been able to reproduce your failure with my 6.6 builds.
Could you try to run it by adding the "direct" keyword in the dft field to see if it makes any difference? How many processors are you using?

Clicked A Few Times
Threads 1
Posts 7
cosmo runs
Ivo,

Out of curiosity I checked your input too. It also runs runs fine on my workstation with build 6.6.

Your failure report indicates some issues with the ARMCI settings of your compilation.

Regards
Manfred

Gets Around
Threads 33
Posts 138
Dear Drs. Ivo, Edoapra and Manfred

    Employing NWChem6.6 on MAC OS X EI Capitan 10.11.3 with mpich 3.1.4_1 installed, a
parallel run of the optimization by three cores through the original and unaltered input
converged at the 9th step and the following is obtained

   ...
----------------------
Optimization converged
----------------------


 Step       Energy      Delta E   Gmax     Grms     Xrms     Xmax   Walltime
---- ---------------- -------- -------- -------- -------- -------- --------
@ 9 -287.35442308 -1.5D-07 0.00006 0.00001 0.00038 0.00179 1042.2
                                    ok       ok       ok       ok  

   ...

Best regards!
Edited On 9:23:34 PM PST - Fri, Feb 19th 2016 by Xiongyan21

Forum Vet
Threads 7
Posts 1355
debian unstable OK
I have tried the input shown here in a debian sid/unstable installation with the 6.6+r27746-2 package and the code did not stop with the Segv reported.

Clicked A Few Times
Threads 6
Posts 25
In that case it must be my installation.

I tried running with "direct", but it gives the same result.

I also tried reinstalling the package; that didn't help either.


My machine is a workstation with two 12-core processors with hyperthreading. With nwchem 6.5 and openmpi 1.6 it was most efficient to run with 44 processes. Nwchem 6.6 needs openmpi 1.10, and it looks like the behavior changed a bit. I'm not quite sure yet what I the optimal way is now. I tried from 2 to 44 processes, but always get the above behavior.

Clicked A Few Times
Threads 6
Posts 25
I solved the problem by updating the libgcc1 and libgfortran3 packages and their dependencies to the latest version.

Now everything seems to work ok.

Thanks for narrowing it down to a local problem


Forum >> NWChem's corner >> Running NWChem



Who's here now Members 0 Guests 0 Bots/Crawler 1


AWC's: 2.5.10 MediaWiki - Stand Alone Forum Extension
Forum theme style by: AWC