"rtdb open old failed" on BG/Q when old rtdb file exists

From NWChem

Viewed 346 times, With a total of 2 Posts
Jump to: navigation, search

Clicked A Few Times
Threads 5
Posts 19

I have run into an issue recently when running NWChem 6.6 on the Blue Gene/Q platform. When I try to restart a calculation that previously timed out, I receive the following error:
 rtdb_seq_open: ./second.db does not exist, cannot open old
 start: rtdb_open old failed                    0

I can reproduce this error any time I use the restart keyword, even if the rtdb already exists. Here is a minimal working example that reproduces this error:

 start first

 memory total 800 mb

   H	0.0000	0.0000	-0.5000
   H	0.0000	0.0000	 0.5000

   H library "6-311G**"

   xc b3lyp
   grid fine

 title "First calculation"
 task dft energy

This task completes successfully and produces the rtdb file first.db. I then copy this database, along with the .movecs file, to the same directory as second/nwchem.nw (renaming the files to second.db and second.movecs):
 restart second

 memory total 800 mb

 task dft energy

This job fails with the above error, even though second.db exists in the working directory.

Clicked A Few Times
Threads 5
Posts 19
I have found the cause of this problem, as well as a simple solution. It would seem that rtdb_open_seq is failing when set to "old" mode because it checks for file existence using the access() syscall. Apparently the BG/Q access() implementation (at least on our BG/Q - I do not know if it is universal) always fails for R_OK | W_OK flags unless the file permission bits are set to 777 (or possibly 666, it occurred to me just now that I did not test this case). In any case, these are not the default permission bits for new files, so rtdb files produced by a calculation cannot be reused without first modifying the permissions accordingly.

To get around this, I have modified src/rtdb/rtdb_seq.c, changing the code near line 313 in rtdb_seq_open():
   int exists = access(filename, R_OK | W_OK) == 0;

 #if defined(__bgq__)
   int exists = access(filename, F_OK) == 0;
   int exists = access(filename, R_OK | W_OK) == 0;

This fixes this behavior on BG/Q while leaving behavior on other platforms unchanged.

Forum Vet
Threads 7
Posts 1355
Thanks for the patch
I have just committed this change to the repository

Cheers, Edo

Forum >> NWChem's corner >> Running NWChem

Who's here now Members 0 Guests 0 Bots/Crawler 1

AWC's: 2.5.10 MediaWiki - Stand Alone Forum Extension
Forum theme style by: AWC