# Benchmarks

### From NWChem

(→Current developments for high accuracy: GPGPU and alternative task schedulers) |
(→Current developments for high accuracy: GPGPU and alternative task schedulers) |
||

Line 61: | Line 61: | ||

File:gpu_speedup_uracil.png|<small>''Speedup of GPU over CPU of the (T) part of the (T) part of the Reg-CCSD(T) approach as a function of the tile size for the uracil molecule. | File:gpu_speedup_uracil.png|<small>''Speedup of GPU over CPU of the (T) part of the (T) part of the Reg-CCSD(T) approach as a function of the tile size for the uracil molecule. | ||

The calculations were performed using Barracuda cluster at EMSL.</small> | The calculations were performed using Barracuda cluster at EMSL.</small> | ||

- | File:ccsd_eomccsd_new.png|<small>''Comparison of the CCSD/EOMCCSD iteration times for BacterioChlorophyll (BChl) for various tile sizes. Calculations were performed for 3-21G basis set (503 basis functions, C1 symmetry, 240 correlated electrons, 1020 cores).</small> | + | File:ccsd_eomccsd_new.png|<small>''Comparison of the CCSD/EOMCCSD iteration times for BacterioChlorophyll (BChl, Mg O6 N4 C 36 H38) for various tile sizes. Calculations were performed for 3-21G basis set (503 basis functions, C1 symmetry, 240 correlated electrons, 1020 cores).</small> |

File:bchl_6_311G_ccsd.png|<small>''Time per CCSD iteration for BChl in 6-311G basis set (733 basis functions, C1 symmetry, 240 correlated electrons, 1020 cores) as a function of tile size.</small> | File:bchl_6_311G_ccsd.png|<small>''Time per CCSD iteration for BChl in 6-311G basis set (733 basis functions, C1 symmetry, 240 correlated electrons, 1020 cores) as a function of tile size.</small> | ||

File:eomccsd_scaling_ic.png|<small>''Scalability of the CCSD/EOMCCSD codes for BChl in 6-311G basis set (733 basis functions; tilesize=40, C1 symmetry, 240 correlated electrons).</small> | File:eomccsd_scaling_ic.png|<small>''Scalability of the CCSD/EOMCCSD codes for BChl in 6-311G basis set (733 basis functions; tilesize=40, C1 symmetry, 240 correlated electrons).</small> | ||

Line 68: | Line 68: | ||

Other tests: | Other tests: | ||

- | Luciferin (aug-cc-pVDZ basis set; RHF reference; frozen core) | + | Luciferin (aug-cc-pVDZ basis set; RHF reference; frozen core) - time per CCSD iteration |

tilesize = 30 | tilesize = 30 | ||

Line 82: | Line 82: | ||

- | Sucrose (6-311G** basis set; RHF reference; frozen core) | + | Sucrose (6-311G** basis set; RHF reference; frozen core) - time per CCSD iteration |

tilesize = 40 | tilesize = 40 | ||

Line 88: | Line 88: | ||

512 910 sec. | 512 910 sec. | ||

1024 608 sec. | 1024 608 sec. | ||

+ | |||

+ | |||

+ | Cytosine-OH (POL1; UHF reference; frozen core) - time per EOMCCSD iteration | ||

+ | |||

+ | tilesize = 30 | ||

+ | 256 cores 44.5 sec. | ||

+ | |||

+ | tilesize = 40 | ||

+ | 128 cores 55.6 sec. |

## Revision as of 14:30, 20 September 2010

# Benchmarks performed with NWChem

This page contains a suite of benchmarks performed with NWChem. The benchmarks include a variety of computational chemistry methods on a variety of high performance computing platforms. The list of benchmarks available will evolve continuously as new data becomes available. If you have benchmark information you would like to add for your computing system, please contact one of the developers.

# Hybrid density functional calculation on the C_{240} Buckyball

Performance of the Gaussian basis set DFT module in NWChem. This calculation involved performing a PBE0 calculation (in direct mode) on the on C_{240} system with the 6-31G* basis set (3600 basis functions). These calculations were performed on the Chinook supercomputer located at PNNL. Timings are per step for the various components. The input file is available.

# Parallel performance of *Ab initio* Molecular Dynamics using plane waves

# Parallel performance of the CR-EOMCCSD(T) method (triples part)

An example of the scalability of the triples part of the CR-EOMCCSD(T) approach for Green Fluorescent Protein Chromophore (GFPC) described by cc-pVTZ basis set (648 basis functions) as obtained from NWChem. Timings were determined from calculations on the Franklin Cray-XT4 computer system at NERSC. See the input file for details.

# Timings of CCSD/EOMCCSD for the oligoporphyrin dimer

CCSD/EOMCCSD timings for oligoporphyrin dimer (942 basis functions, 270 correlated electrons, D2h symmetry, excited-state calculations were performed for state of b1g symmetry, in all test calculation convergence threshold was relaxed, 1024 cores were used). See the input file for details.

-------------------------------------------------------- Iter Residuum Correlation Cpu Wall -------------------------------------------------------- 1 0.7187071521175 -7.9406033677717 640.9 807.7 ...... MICROCYCLE DIIS UPDATE: 10 5 11 0.0009737920958 -7.9953441809574 691.1 822.2 -------------------------------------------------------- Iterations converged CCSD correlation energy / hartree = -7.995344180957357 CCSD total energy / hartree = -2418.570838364838890 EOM-CCSD right-hand side iterations -------------------------------------------------------------- Residuum Omega / hartree Omega / eV Cpu Wall -------------------------------------------------------------- ...... Iteration 2 using 6 trial vectors 0.1584284659595 0.0882389635508 2.40111 865.3 1041.2 Iteration 3 using 7 trial vectors 0.0575982107592 0.0810948687618 2.20670 918.0 1042.2

# Current developments for high accuracy: GPGPU and alternative task schedulers

Currently various development efforts are underway for high accuracy methods that will be available in future releases of NWChem. The examples below shows the first results of the performance of the triples part of Reg-CCSD(T) on GPGPUs (left two examples) and of using alternative task schedules for the iterative CCSD and EOMCCSD.

Other tests:

Luciferin (aug-cc-pVDZ basis set; RHF reference; frozen core) - time per CCSD iteration

tilesize = 30 256 cores 644 sec. 512 378 sec. 664 314 sec. 1020 278 sec. 1300 237 sec.

tilesize = 40 128 998 sec. 256 575 sec.

Sucrose (6-311G** basis set; RHF reference; frozen core) - time per CCSD iteration

tilesize = 40 256 cores 1486 sec. 512 910 sec. 1024 608 sec.

Cytosine-OH (POL1; UHF reference; frozen core) - time per EOMCCSD iteration

tilesize = 30 256 cores 44.5 sec.

tilesize = 40 128 cores 55.6 sec.