Skip to Content.
Sympa Menu

forum - Re: [abinit-forum] response-function jobs crash prematurely

forum@abinit.org

Subject: The ABINIT Users Mailing List ( CLOSED )

List archive

Re: [abinit-forum] response-function jobs crash prematurely


Chronological Thread 
  • From: "P. Ganesh" <pganesh@ciw.edu>
  • To: forum@abinit.org
  • Subject: Re: [abinit-forum] response-function jobs crash prematurely
  • Date: Sat, 28 Feb 2009 12:58:26 -0500

Dear Takeshi,

Thanks for the suggestions. 

I have previously run similar calculations for the same system on another cluster which had ~ 1GB/processor and was a quad-core machine and had no problem.   Even otherwise, for the present case, I found I can get the calculations to run if I submit it with: nodes=3:ppn=6 but it fails when I submit it with: nodes=6:ppn=6 or nodes=18:ppn=6 or nodes=27:ppn=4.  This seems to imply that the memory per processor is sufficient for the calculations to proceed. But then I don;t understand the standard error that says insufficient virtual memory. Also, the jobs always crash right after the first perturbation has finished (see below):

I had looked up the previous mailing list post "frustrating RF calculations" but the solution to get around my problem wasn't evident, except that the scaling of the array with k-points was supposedly going to be fixed in v5.6.5.  But I get the same error (as detailed above) with v5.6.5.



Thanks,
Ganesh


Last few line of the 'log' file:

----iterations are completed or convergence reached----


 Thirteen components of 2nd-order total energy (hartree) are
 1,2,3: 0th-order hamiltonian combined with 1st-order wavefunctions
     kin0=   1.46751026E+04 eigvalue=   9.16178409E+02  local=   6.04883542E+03
 4,5,6: 1st-order hamiltonian combined with 1st and 0th-order wfs
 loc psp =  -1.18192722E+03  Hartree=   1.02037734E+03     xc=  -2.65606951E+02
 note that "loc psp" includes a xc core correction that could be resolved
 7,8,9: eventually, occupation + non-local contributions
    edocc=   0.00000000E+00     enl0=   1.22011047E+04   enl1=  -6.81403715E+04
 1-9 gives the relaxation energy (to be shifted if some occ is /=2.0)
   erelax=  -3.47263072E+04
 10,11,12 Non-relaxation  contributions : frozen-wavefunctions and Ewald
 fr.local=   2.58350344E+02 fr.nonlo=   3.44237800E+04  Ewald=   2.13861335E+02
 prtene3 : non-relax=    3.489599E+04
 13,14 Frozen wf xc core corrections (1) and (2)
 frxc 1  =   0.00000000E+00  frxc 2 =   0.00000000E+00
 Resulting in :
 2DEtotal=    0.1696844672E+03 Ha. Also 2DEtotal=    0.461734917110E+04 eV
    (2DErelax=   -3.4726307230E+04 Ha. 2DEnonrelax=    3.4895991698E+04 Ha)
    (  non-var. 2DEtotal :    2.3484233656E+02 Ha)

rank 92 in job 1  abe0224_35176   caused collective abort of all ranks
  exit status of rank 92: killed by signal 9
rank 88 in job 1  abe0224_35176   caused collective abort of all ranks
  exit status of rank 88: killed by signal 9




NISHIMATSU Takeshi wrote:
--- I wrote:
  
1GB of memory per processor
      
You need more memory >= 4GB, I guess.

    

Or search in this ML with keyword of "frustrating RF calculation".

-- Takeshi


  


-- 
It is the very mind itself
That leads the mind astray;
Of the mind,
It is essential to lose it,
But, do not be mindless.
(The Unfettered Mind)



Archive powered by MHonArc 2.6.15.

Top of Page