Support address bounceback

From NWChem

Viewed 2862 times, With a total of 11 Posts
Jump to: navigation, search

Clicked A Few Times
Threads 1
Posts 9
Hi folks,

I'm seeing several failures in the QA tests from a vanilla build of Nwchem-6.5 I created recently. The QA / HOWTO doc advises emailing details to nwchem-support@emsl.pnl.gov, but my email bounced back. Is the address no longer valid?

If this forum is a good place to post details, can someone advise in which section I should doqmtest failure details?

Thanks,
Mike.

Forum Vet
Threads 8
Posts 1388
Mike
Thanks for reporting this problem.
We will remove mention of the non working nwchem-support@emsl.pnl.gov email address.
This is a good place to report QA tests failures or another couple of forum sections would work, too:
http://www.nwchem-sw.org/index.php/Special:AWCforum/sf/id5/Running_NWChem.html
http://www.nwchem-sw.org/index.php/Special:AWCforum/sf/id4/Compiling_NWChem.html

Clicked A Few Times
Threads 1
Posts 9
I've been trying to start a new thread, but pressing Submit gives the error message:

The specified URL cannot be found

Clicked A Few Times
Threads 1
Posts 9
Quote:Mpacey Oct 20th 2:21 am
I've been trying to start a new thread, but pressing Submit gives the error message:

The specified URL cannot be found


As I seem to be able to post here, I thought I'd cut and psate my error report into another reply - but I got the same error message. Is there some limit on long posts?

Forum Vet
Threads 8
Posts 1388
Quote:Mpacey Oct 20th 1:24 am
Quote:Mpacey Oct 20th 2:21 am
I've been trying to start a new thread, but pressing Submit gives the error message:

The specified URL cannot be found


As I seem to be able to post here, I thought I'd cut and paste my error report into another reply - but I got the same error message. Is there some limit on long posts?


There should not be one.

Anyhow, please post your problems here.

Clicked A Few Times
Threads 1
Posts 9
Quote:Edoapra Oct 20th 12:03 pm
Quote:Mpacey Oct 20th 1:24 am
Quote:Mpacey Oct 20th 2:21 am
I've been trying to start a new thread, but pressing Submit gives the error message:

The specified URL cannot be found


As I seem to be able to post here, I thought I'd cut and paste my error report into another reply - but I got the same error message. Is there some limit on long posts?


There should not be one.

Anyhow, please post your problems here.


I'll try to split my post in half:

I’ve built a very vanilla MPI version of Nwchem-6.5 from osurce on our local cluster, and I ran the doqmtests.mpi script in the QA directory to check the numerical accuracy. I stopped the run during tce_hyperpolar_ccsd_small after 12+ hours of running, but I’m already seeing several failures in earlier tests (details in the next post). My build process is this:

module add openmpi/1.8.1-gcc

export NWCHEM_TOP=/usr/shared_apps/packages/src/Nwchem-6.5
export NWCHEM_TARGET=LINUX64
export NWCHEM_MODULES=all
export USE_MPI=y
export USE_MPIF=y
export USE_MPIF4=y
export MPI_LOC=/usr/shared_apps/packages/openmpi-1.8.1-gcc/bin/mpicc
export MPI_LIB=/usr/shared_apps/packages/openmpi-1.8.1-gcc/lib
export MPI_INCLUDE=/usr/shared_apps/packages/openmpi-1.8.1-gcc/include
export LIBMPI="-lmpi_usempi -lmpi_mpifh -lmpi -lpthread"

cd $NWCHEM_TOP/src
make nwchem_config
make

The gcc version is 4.4.7, and OpenMPI 1.8.1 was built with the same compiler. The build system is a 12-core Westmere server running Scientific Linux 6.5. The tests were run with 16 cores on a 16-core Ivy Bridge system with the same OS.

I’ve included a summary of failures at the bottom, with details manually extracted from the testoutputdir (which prompts the question: have I missed an automated tool to help me here?). If I understand correctly, the test script runs a diff of $testname.ok.out.nwparse (the gold standard?) and $testname.out.nwparse (the job output) with the nwparse filename component indicating that it’s been passed through the nwparse.pl script to extract the relevant output lines to diff?

Most of the errors are down in the 4th sig fig, meaning that the relative error is low, but not being a chemist (I’m a sysadmin with a comp sci background) I’m not sure how significant such differences are, nor if they’re likely to propagate to larger errors in larger models. (And in one case, the answer is wrong in the first sig fig). I'd like to understand the implications of the test failures and possibly fix them before making this application generally available to my users.

I also have a follow on question: once I do get the numerics right I’m looking to create an optimised version (e.g. using Intel’s MKL, and optimising for a more modern architecture than the build process’ default of Nocona). I note that the build process defaults to using the Gnu compiler flag –ffast-math, which will produce non-IEEE 754 compliant results. Are the 'gold standard' outputs produced using non-IEE 754 compliant optimisation flags? My concern is that if I’m comparing a IEEE 754 compliant optimised build to a ‘gold standard’ output known not to be IEEE 754 compliant, I’m likely to see more test failures even if the numeric results are technically more accurate. I note from the FAQ that you’re understandably hesitant to assist in individual optimised builds, but I’m wondering if you have any general advice?

Regards,
Mike.

Clicked A Few Times
Threads 1
Posts 9
Test: autosyn

Output diff:

45c45
< Effective nuclear repulsion energy (a.u.) 4265.6221
---
> Effective nuclear repulsion energy (a.u.) 4265.6222

Contents

=


Test: bsse_dft_trimer

Output diff:

27c27
< P.Frequency 162 230 354 485 616 717
---
> P.Frequency 162 229 354 485 616 717

=


Test: cosmo_h3co

83,84c83,84
< h 0.4929 -1.8393 1.9074
< h -1.8393 0.4929 1.9074
---
> h 0.4928 -1.8393 1.9074
> h -1.8393 0.4928 1.9074

=


Test h2o_diag_to_cg_ub3lyp:

Output diff:

2c2
< Total DFT energy = -75.79890
---
> Total DFT energy = -75.79889

=


Test: oh2

Output diff:
2c2
< Total SCF energy = -67.01054
---
> Total SCF energy = -76.01054

=


Test: dft_cr2

Output diff:

25c25
< Effective nuclear repulsion energy (a.u.) 180.3590
---
> Effective nuclear repulsion energy (a.u.) 180.3591
34c34
< Effective nuclear repulsion energy (a.u.) 180.3312
---
> Effective nuclear repulsion energy (a.u.) 180.3313
42,43c42,43
< Effective nuclear repulsion energy (a.u.) 180.3312
< Effective nuclear repulsion energy (a.u.) 180.3312
---
> Effective nuclear repulsion energy (a.u.) 180.3313
> Effective nuclear repulsion energy (a.u.) 180.3313

=


Test: dft_x

Output diff:

21c21
< Effective nuclear repulsion energy (a.u.) 40.1201
---
> Effective nuclear repulsion energy (a.u.) 40.1199
45c45
< H 1.9714 1.9436 0.0000
---
> H 1.9714 1.9437 0.0000
84c84
< H -1.5528 -2.0650 0.0000
---
> H -1.5527 -2.0650 0.0000
103c103
< H -1.5528 -2.0650 0.0000
---
> H -1.5527 -2.0650 0.0000
108c108
< C -0.0019 -0.0037 0.0000
---
> C -0.0020 -0.0037 0.0000

=


Test: dielsalder

Output diff:

458,459c458,459
< C -0.0007 -0.0011 -0.0003
< C -0.0007 -0.0011 0.0003
---
> C -0.0006 -0.0011 -0.0003
> C -0.0006 -0.0011 0.0003

=


Test: dft_ozone

Output diff:

9,10c9,10
< O 0.0000 -0.0715 -0.0413
< O 0.0000 0.0715 -0.0413
---
> O 0.0000 -0.0715 -0.0414
> O 0.0000 0.0715 -0.0414

=


Test: sadsmall

Output diff:

76c76
< o 0.2090 -1.4478 0.0000
---
> o 0.2091 -1.4478 0.0000

=


Test: pspw
Output diff:

3c3
< Total PSPW energy : -22.81402
---
> Total PSPW energy : -22.76985
6c6
< Total PSPW energy : -21.61074
---
> Total PSPW energy : -21.58623

=


Test: pspw_md

Output diff:

8c8
< Total PSPW energy : -14.32395
---
> Total PSPW energy : -14.32394
11c11
< Total PSPW energy : -14.12140
---
> Total PSPW energy : -14.12141

=


Test: paw

Output diff:

3,4c3,4
< Total PAW energy : -75.79997
< Total PAW energy : -75.79997
---
> Total PAW energy : -75.80490
> Total PAW energy : -75.80490
6,7c6,7
< Total PAW energy : -75.79997
< Total PAW energy : -75.79997
---
> Total PAW energy : -75.80488
> Total PAW energy : -75.80490

=

Test: dft_xdm1

Output diff:

28c28
< O -0.0688 -2.9682 0.0000
---
> O -0.0687 -2.9682 0.0000

=

Pairwise diff of nwchem tests that failed:

Test: tce_cr_eom_t_ch_rohf

Output diff:

8,9d7
< CR-EOMCCSD(T) total energy / hartree = -38.2675765
< CR-EOMCCSD(T) total energy / hartree = -38.2255642
11a10,11
> CR-EOMCCSD(T) total energy / hartree = -38.2675765
> CR-EOMCCSD(T) total energy / hartree = -38.2255642

Clicked A Few Times
Threads 1
Posts 9
Attempt 2 - attempting to ignore the wiki notation:

Test: autosyn Output diff: 45c45 < Effective nuclear repulsion energy (a.u.) 4265.6221 --- > Effective nuclear repulsion energy (a.u.) 4265.6222 === Test: bsse_dft_trimer Output diff: 27c27 < P.Frequency 162 230 354 485 616 717 --- > P.Frequency 162 229 354 485 616 717 === Test: cosmo_h3co 83,84c83,84 < h 0.4929 -1.8393 1.9074 < h -1.8393 0.4929 1.9074 --- > h 0.4928 -1.8393 1.9074 > h -1.8393 0.4928 1.9074 === Test h2o_diag_to_cg_ub3lyp: Output diff: 2c2 < Total DFT energy = -75.79890 --- > Total DFT energy = -75.79889 === Test: oh2 Output diff: 2c2 < Total SCF energy = -67.01054 --- > Total SCF energy = -76.01054 === Test: dft_cr2 Output diff: 25c25 < Effective nuclear repulsion energy (a.u.) 180.3590 --- > Effective nuclear repulsion energy (a.u.) 180.3591 34c34 < Effective nuclear repulsion energy (a.u.) 180.3312 --- > Effective nuclear repulsion energy (a.u.) 180.3313 42,43c42,43 < Effective nuclear repulsion energy (a.u.) 180.3312 < Effective nuclear repulsion energy (a.u.) 180.3312 --- > Effective nuclear repulsion energy (a.u.) 180.3313 > Effective nuclear repulsion energy (a.u.) 180.3313 === Test: dft_x Output diff: 21c21 < Effective nuclear repulsion energy (a.u.) 40.1201 --- > Effective nuclear repulsion energy (a.u.) 40.1199 45c45 < H 1.9714 1.9436 0.0000 --- > H 1.9714 1.9437 0.0000 84c84 < H -1.5528 -2.0650 0.0000 --- > H -1.5527 -2.0650 0.0000 103c103 < H -1.5528 -2.0650 0.0000 --- > H -1.5527 -2.0650 0.0000 108c108 < C -0.0019 -0.0037 0.0000 --- > C -0.0020 -0.0037 0.0000 === Test: dielsalder Output diff: 458,459c458,459 < C -0.0007 -0.0011 -0.0003 < C -0.0007 -0.0011 0.0003 --- > C -0.0006 -0.0011 -0.0003 > C -0.0006 -0.0011 0.0003 === Test: dft_ozone Output diff: 9,10c9,10 < O 0.0000 -0.0715 -0.0413 < O 0.0000 0.0715 -0.0413 --- > O 0.0000 -0.0715 -0.0414 > O 0.0000 0.0715 -0.0414 === Test: sadsmall Output diff: 76c76 < o 0.2090 -1.4478 0.0000 --- > o 0.2091 -1.4478 0.0000 === Test: pspw Output diff: 3c3 < Total PSPW energy : -22.81402 --- > Total PSPW energy : -22.76985 6c6 < Total PSPW energy : -21.61074 --- > Total PSPW energy : -21.58623 === Test: pspw_md Output diff: 8c8 < Total PSPW energy : -14.32395 --- > Total PSPW energy : -14.32394 11c11 < Total PSPW energy : -14.12140 --- > Total PSPW energy : -14.12141 === Test: paw Output diff: 3,4c3,4 < Total PAW energy : -75.79997 < Total PAW energy : -75.79997 --- > Total PAW energy : -75.80490 > Total PAW energy : -75.80490 6,7c6,7 < Total PAW energy : -75.79997 < Total PAW energy : -75.79997 --- > Total PAW energy : -75.80488 > Total PAW energy : -75.80490 === Test: dft_xdm1 Output diff: 28c28 < O -0.0688 -2.9682 0.0000 --- > O -0.0687 -2.9682 0.0000 === Test: tce_cr_eom_t_ch_rohf Output diff: 8,9d7 < CR-EOMCCSD(T) total energy / hartree = -38.2675765 < CR-EOMCCSD(T) total energy / hartree = -38.2255642 11a10,11 > CR-EOMCCSD(T) total energy / hartree = -38.2675765 > CR-EOMCCSD(T) total energy / hartree = -38.2255642

Forum Vet
Threads 8
Posts 1388
Mike
Thank you very much for the detailed report.
After a quick look at your results, only the pspw and paw seem to exhibit serious issues.
I will try to reproduce your problem. I have a few questions about your build.
Did you set the BLASOPT env. at all?
What is the output you get of the following commands

gcc -v
rpm -q -i gcc
ldd $NWCHEM_TOP/bin/LINUX64/nwchem

Clicked A Few Times
Threads 1
Posts 9
% gcc -v
Using built-in specs.
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib --with-ppl --with-cloog --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux
Thread model: posix
gcc version 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC)

% rpm -q -i gcc
Name  : gcc Relocations: (not relocatable)
Version  : 4.4.7 Vendor: Scientific Linux
Release  : 3.el6 Build Date: Thu 21 Feb 2013 05:35:31 PM GMT
Install Date: Thu 04 Jul 2013 01:19:48 PM BST Build Host: sl6.fnal.gov
Group  : Development/Languages Source RPM: gcc-4.4.7-3.el6.src.rpm
Size  : 19405002 License: GPLv3+ and GPLv3+ with exceptions and GPLv2+ with exceptions
Signature  : DSA/SHA1, Fri 22 Feb 2013 03:47:32 PM GMT, Key ID b0b4183f192a7d7d
Packager  : Scientific Linux
URL  : http://gcc.gnu.org
Summary  : Various compilers (C, C++, Objective-C, Java, ...)
Description :
The gcc package contains the GNU Compiler Collection version 4.4.
You'll need this package in order to compile C code.

% ldd /usr/shared_apps/packages/src/Nwchem-6.5/bin/LINUX64/nwchem
linux-vdso.so.1 => (0x00007fff4478a000)
libmpi_usempi.so.1 => /usr/shared_apps/packages/openmpi-1.8.1-gcc/lib/libmpi_usempi.so.1 (0x00007f9ce08ac000)
libmpi_mpifh.so.2 => /usr/shared_apps/packages/openmpi-1.8.1-gcc/lib/libmpi_mpifh.so.2 (0x00007f9ce065b000)
libmpi.so.1 => /usr/shared_apps/packages/openmpi-1.8.1-gcc/lib/libmpi.so.1 (0x00007f9ce0387000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x000000324aa00000)
libgfortran.so.3 => /usr/lib64/libgfortran.so.3 (0x00007f9ce0085000)
libm.so.6 => /lib64/libm.so.6 (0x000000324b200000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x000000324c600000)
libc.so.6 => /lib64/libc.so.6 (0x000000324a200000)
libopen-rte.so.7 => /usr/shared_apps/packages/openmpi-1.8.1-gcc/lib/libopen-rte.so.7 (0x00007f9cdfe0c000)
libopen-pal.so.6 => /usr/shared_apps/packages/openmpi-1.8.1-gcc/lib/libopen-pal.so.6 (0x00007f9cdfb39000)
libdl.so.2 => /lib64/libdl.so.2 (0x000000324a600000)
librt.so.1 => /lib64/librt.so.1 (0x000000324b600000)
libnsl.so.1 => /lib64/libnsl.so.1 (0x000000324ca00000)
libutil.so.1 => /lib64/libutil.so.1 (0x0000003250200000)
/lib64/ld-linux-x86-64.so.2 (0x0000003249e00000)

Clicked A Few Times
Threads 1
Posts 9
Quote:Edoapra Oct 21st 9:32 am
Mike
Thank you very much for the detailed report.
After a quick look at your results, only the pspw and paw seem to exhibit serious issues.
I will try to reproduce your problem. I have a few questions about your build.
Did you set the BLASOPT env. at all?
What is the output you get of the following commands

gcc -v
rpm -q -i gcc
ldd $NWCHEM_TOP/bin/LINUX64/nwchem


And to confirm - BLASOPT wasn't set. Having the value unset means that Nwchem falls back to its own copies of the netlib reference versions of BLAS and LAPACK?

Clicked A Few Times
Threads 1
Posts 9
Any update on this? It will be useful to know what to advise my users.


Forum >> NWChem's corner >> Feedback



Who's here now Members 0 Guests 0 Bots/Crawler 1


AWC's: 2.5.10 MediaWiki - Stand Alone Forum Extension
Forum theme style by: AWC