Running Parallel NWChem on Linux Workstation

From NWChem

Viewed 5609 times, With a total of 8 Posts
Jump to: navigation, search

Clicked A Few Times
Threads 4
Posts 17
Hi,

I'm trying to run the application on my Linux workstation using the command given in the documentation:

  1. mpirun -np 2 $NWCHEM_TOP/bin/$NWCHEM_TARGET/nwchem input.nw

However, that has failed and gave the error shown below. Any advice why is that?

[cs-nsl17:06616] [[21723,1],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file ../../../orte/util/nidmap.c at line 398
[cs-nsl17:06616] [[21723,1],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file ../../../../../orte/mca/ess/base/ess_base_nidmap.c at line 62
[cs-nsl17:06616] [[21723,1],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file ../../../../../../orte/mca/ess/env/ess_env_module.c at line 173


It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

 orte_ess_base_build_nidmap failed
--> Returned value Data unpack would read past end of buffer (-26) instead of ORTE_SUCCESS




It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

 orte_ess_set_name failed
--> Returned value Data unpack would read past end of buffer (-26) instead of ORTE_SUCCESS


      • The MPI_Init() function was called before MPI_INIT was invoked.
      • This is disallowed by the MPI standard.
      • Your MPI job will now abort.
[cs-nsl17:6616] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed!
[cs-nsl17:06616] [[21723,1],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file ../../../orte/runtime/orte_init.c at line 132


It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

 ompi_mpi_init: orte_init failed
--> Returned "Data unpack would read past end of buffer" (-26) instead of "Success" (0)




mpirun has exited due to process rank 0 with PID 6616 on
node cs-nsl17 exiting improperly. There are two reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).

Edited On 9:00:50 PM PST - Thu, Nov 8th 2012 by Dhaminah

  • Bert Forum:Admin, Forum:Mod, NWChemDeveloper, bureaucrat, sysop
    Profile
    Send PM
Forum Vet
Threads 4
Posts 597
Can you provide your compile environment, i.e. all environment variables set, how you actually compiled NWChem, etc. Looks like there are some issues with your compilation.

Bert


Quote:Dhaminah Nov 9th 3:58 am
Hi,

I'm trying to run the application on my Linux workstation using the command given in the documentation:

  1. mpirun -np 2 $NWCHEM_TOP/bin/$NWCHEM_TARGET/nwchem input.nw

However, that has failed and gave the error shown below. Any advice why is that?

[cs-nsl17:06616] [[21723,1],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file ../../../orte/util/nidmap.c at line 398
[cs-nsl17:06616] [[21723,1],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file ../../../../../orte/mca/ess/base/ess_base_nidmap.c at line 62
[cs-nsl17:06616] [[21723,1],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file ../../../../../../orte/mca/ess/env/ess_env_module.c at line 173


It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

 orte_ess_base_build_nidmap failed
--> Returned value Data unpack would read past end of buffer (-26) instead of ORTE_SUCCESS




It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

 orte_ess_set_name failed
--> Returned value Data unpack would read past end of buffer (-26) instead of ORTE_SUCCESS


      • The MPI_Init() function was called before MPI_INIT was invoked.
      • This is disallowed by the MPI standard.
      • Your MPI job will now abort.
[cs-nsl17:6616] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed!
[cs-nsl17:06616] [[21723,1],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file ../../../orte/runtime/orte_init.c at line 132


It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

 ompi_mpi_init: orte_init failed
--> Returned "Data unpack would read past end of buffer" (-26) instead of "Success" (0)




mpirun has exited due to process rank 0 with PID 6616 on
node cs-nsl17 exiting improperly. There are two reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).


Clicked A Few Times
Threads 4
Posts 17
These are the environment variables I've set:

export NWCHEM_TOP=/root/nwchem/nwchem-6.1.1-src/
export NWCHEM_TARGET=LINUX64
export NWCHEM_MODULES=all
export USE_MPI=y
export USE_MPIF=y
export USE_MPIF4=y
export MPI_LOC=/usr/lib/openmpi/
export MPI_LIB=/usr/lib/openmpi/lib
export MPI_INCLUDE=/usr/lib/openmpi/include
export LIBMPI="-lmpi_f90 -lmpi_f77 -lmpi -lpthread"

then I did the following:

% make nwchem_config
% make FC=gfortran >& make.log

I didn't see any errors on the make.log file.

That's it...
Edited On 4:14:17 PM PST - Fri, Nov 9th 2012 by Dhaminah

  • Bert Forum:Admin, Forum:Mod, NWChemDeveloper, bureaucrat, sysop
    Profile
    Send PM
Forum Vet
Threads 4
Posts 597
Looks like an issue with openMPI or mpirun. Are you running on one node or multiple nodes?

If you do a which mpirun, are you pointing to the openmpi directory from which NWChem was linked?

Bert

Quote:Dhaminah Nov 9th 9:43 pm
These are the environment variables I've set:

export NWCHEM_TOP=/root/nwchem/nwchem-6.1.1-src/
export NWCHEM_TARGET=LINUX64
export NWCHEM_MODULES=all
export USE_MPI=y
export USE_MPIF=y
export USE_MPIF4=y
export MPI_LOC=/usr/lib/openmpi/
export MPI_LIB=/usr/lib/openmpi/lib
export MPI_INCLUDE=/usr/lib/openmpi/include
export LIBMPI="-lmpi_f90 -lmpi_f77 -lmpi -lpthread"

then I did the following:

% make nwchem_config
% make FC=gfortran >& make.log

I didn't see any errors on the make.log file.

That's it...

Clicked A Few Times
Threads 4
Posts 17
I'm running on one node, my local machine (Intel Xeon - quad core).

When I do which mpirun, it points to /usr/local/bin/mpirun

Clicked A Few Times
Threads 4
Posts 17
Recompiling OpenMPI from the source resolved the problem. I was able to run the application with the sample input file (nwchem.nw), which is very small problem. I would like to have a larger problem so that the application can run longer (say more than 20 minutes)...

Thank you so much.

Clicked A Few Times
Threads 4
Posts 17
Can someone help me please by providing an input file for a larger problem?

Much appreciated...

  • Bert Forum:Admin, Forum:Mod, NWChemDeveloper, bureaucrat, sysop
    Profile
    Send PM
Forum Vet
Threads 4
Posts 597
There are many cases in QA/tests. Most of them are small, but you can make it run longer by using a larger basis set.

Bert


Quote:Dhaminah Nov 13th 11:34 pm
Can someone help me please by providing an input file for a larger problem?

Much appreciated...

Clicked A Few Times
Threads 4
Posts 17
Thank you very much....


Forum >> NWChem's corner >> Running NWChem



Who's here now Members 0 Guests 1 Bots/Crawler 0


AWC's: 2.5.10 MediaWiki - Stand Alone Forum Extension
Forum theme style by: AWC