From NWChem
			Viewed 446 times, With a total of 3 Posts
												
			
                  
        
            
                | 
                    
                 | 
            
            
                
                    
                        
                            | 
                 Gets Around 
                
                                Threads 24 
                                Posts 86                              
                             | 
                         
                     
                 | 
            		
		                
		                    
		                        | 9:00:10 PM PDT - Sat, Sep 30th 2017  | 
		                             | 
		                     
		                    
		                        Hello Nwchem users 
 
I have following input 
 
title "cubane CCSD/cc-pVDZ hessian"
memory stack 500 mb heap 500 mb global 22000 mb
geometry
 symmetry C2v
 H                     2.05734129    -1.32981120     0.02496974
 C                     1.09520984    -0.79607173    -0.03872782
 H                     2.05734129     1.32981120     0.02496974
 C                     1.09520984     0.79607173    -0.03872782
 H                     0.00000000    -1.59258557    -2.05035614
 C                     0.00000000    -1.20794408    -1.02589691
 H                     0.00000000     1.59258557    -2.05035614
 C                     0.00000000     1.20794408    -1.02589691
 H                     0.00000000    -1.45028854     1.97989592
 C                     0.00000000    -0.78106969     1.10677261
 H                     0.00000000     1.45028854     1.97989592
 C                     0.00000000     0.78106969     1.10677261
 H                    -2.05734129    -1.32981120     0.02496974
 C                    -1.09520984    -0.79607173    -0.03872782
 H                    -2.05734129     1.32981120     0.02496974
 C                    -1.09520984     0.79607173    -0.03872782
end
basis spherical
 H library cc-pVDZ
 C library cc-pVDZ
end
scf
 direct
end
tce
 mkccsd
 2emet 1
 freeze atomic
end
mrccdata
 root 1
 nref 2
 22222222222222222222222222220
 22222222222222222222222222202
end
task tce freq
  
 
I ran it on my personal 4-core computer and 
get the following benchmarks 
 
Symmetry of references
Ref.   1 sym:a   
Ref.   2 sym:a   
MR MkCCSD, version 1.0
Heff
=============================================
    0    1 -664.87322151    0.09099694
    0    2    0.09099694 -664.79730579
Eigenvalues (real and imaginary)
=============================================
 -664.933860013217    0.00000000
 -664.736667291187    0.00000000
Left eigenvectors
=============================================
    1   -0.83216055   -0.55453478
    2    0.55453478   -0.83216055
Right eigenvectors
=============================================
VR    1   -0.83216055   -0.55453478
VR    2    0.55453478   -0.83216055
Target root:    1
MkCC iter. #   1      -664.9338600132171      -307.3328942731508      -664.9338600132171
ddot R:  0.049763349351  1.692918082150
Iter cpu           236.1          315.6   1
 
 
I ran it on Kogence 128-core instance 
https://kogence.com/app/jobs/files/list/-632%5ETransition_state_of_Cubane_and_azo-Cubane_t.... 
and have got 
Symmetry of references
Ref.   1 sym:a   
Ref.   2 sym:a   
MR MkCCSD, version 1.0
Heff
=============================================
    0    1 -664.87322151    0.09099694
    0    2    0.09099694 -664.79730579
Eigenvalues (real and imaginary)
=============================================
 -664.933860013248    0.00000000
 -664.736667291216    0.00000000
Left eigenvectors
=============================================
    1   -0.83216055   -0.55453478
    2    0.55453478   -0.83216055
Right eigenvectors
=============================================
VR    1   -0.83216055   -0.55453478
VR    2    0.55453478   -0.83216055
Target root:    1
MkCC iter. #   1      -664.9338600132476      -307.3328942731813      -664.9338600132476
ydot R:  0.043047295224  1.692940929883
Iter cpu             3.7           64.3   1
 
 
128-core isn't faster 32 times, it faster only 5 times. 
 
Why this happens? 
 
Best Vladimir.
 | 
		                     
		                    
	| 
		Edited On 9:19:36 PM PDT - Sat, Sep 30th 2017 by Vladimir
	 | 
 
		                 
		             | 
        
 
         | 
        
              
        
            
                | 
                    
                 | 
            
            
                
                    
                        
                            | 
                 Gets Around 
                
                                Threads 1 
                                Posts 171                              
                             | 
                         
                     
                 | 
            		
		                
		                    
		                        | 4:00:55 AM PDT - Thu, Oct 5th 2017  | 
		                             | 
		                     
		                    
		                        | Nothing scales perfectly. Your calculation isn't big enough to expect reasonable scaling to 128 cores.
 | 
		                     
		                 
		             | 
        
 
         | 
        
              
        
            
                | 
                    
                 | 
            
            
                
                    
                        
                            | 
                 Clicked A Few Times 
                
                                Threads 1 
                                Posts 5                              
                             | 
                         
                     
                 | 
            		
		                
		                    
		                        | 8:13:51 AM PDT - Mon, Oct 9th 2017  | 
		                             | 
		                     
		                    Re: MK-CCSD on Kogence isn't much faster as expected
  |                   
		                    
		                        Vladimir, 
 
I suggest you try running same problem on 8 threads, 16 threads and 32 threads on Kogence and see how computational time is scaling. I am guessing may be able to see that performance improves linearly initially and sub-linearly later. 
 
As Sean said, most computational algorithms do not scale linearly. Scaling is also problem dependent. 
You also have to understand what is taking time. If you try to run a very small problem on a 128 core computer, you dont expect to see much benefit because overhead is more expense and your problem is small enough to not compensate for overhead.
 | 
		                     
		                 
		             | 
        
 
         | 
        
              
        
            
                | 
                    
                 | 
            
            
                
                    
                        
                            | 
                 Gets Around 
                
                                Threads 24 
                                Posts 86                              
                             | 
                         
                     
                 | 
            		
		                
		                    
		                        | 8:20:54 AM PDT - Tue, Oct 10th 2017  | 
		                             | 
		                     
		                    
		                        Quote:JenniferCarter  Oct 9th 11:13 pmVladimir, 
 
I suggest you try running same problem on 8 threads, 16 threads and 32 threads on Kogence and see how computational time is scaling. I am guessing may be able to see that performance improves linearly initially and sub-linearly later. 
 
As Sean said, most computational algorithms do not scale linearly. Scaling is also problem dependent. 
You also have to understand what is taking time. If you try to run a very small problem on a 128 core computer, you dont expect to see much benefit because overhead is more expense and your problem is small enough to not compensate for overhead.  
 
I apologize for the fact that I did not quite accurately formulate my question. 
I would like to draw attention to the fact that only Multireference CC iterations not scaling linearly. 
 
I want to ask you what settings I should choose for the MRCC method to achieve the highest performance on 64-128 cores 
 
http://www.nwchem-sw.org/index.php/Release66:TCE#State-Specific_Multireference_Coupled_Clu... 
 
P.S. to Jennifer 
NWCHEM has many methods of calculating and many parameters to adjust their performance. Possible default settings do not fit. 
can You describe in more detail the topology of your computer and in particular the network through which the data is exchanged between processors and with which keys NWCHEM was compiled in accordance with this. 
 
Best, Vladimir.
 | 
		                     
		                    
	| 
		Edited On 8:25:57 AM PDT - Tue, Oct 10th 2017 by Vladimir
	 | 
 
		                 
		             | 
        
 
         | 
        
      
        	
            
                AWC's:
                 2.5.10 MediaWiki - Stand Alone Forum Extension
Forum theme style by: AWC