Skip to Content.
Sympa Menu

forum - [abinit-forum] segmentation fault in response function calculation (vtorho3.F90)

forum@abinit.org

Subject: The ABINIT Users Mailing List ( CLOSED )

List archive

[abinit-forum] segmentation fault in response function calculation (vtorho3.F90)


Chronological Thread 
  • From: Vincent Chevrier <vincent.chevrier@dahn.phys.dal.ca>
  • To: forum@abinit.org
  • Subject: [abinit-forum] segmentation fault in response function calculation (vtorho3.F90)
  • Date: Tue, 24 Mar 2009 11:49:58 -0300 (ADT)
  • Importance: Normal

Hi all,

I'm trying to run a response function calculations with 5.6.3 but I am
having problems. This is on a cluster for which I do not have
administrator privileges. I did not compile the binaries.

I used to be able to run response function calculations without problem
with 5.4.4 (until it time-bombed). I am wondering if anyone has any
suggestions for the system administrator (since he nor I have much
experience compiling abinit). Open MPI is used for parallelization.

Any help is greatly appreciated.

Thanks,
Vincent Chevrier

Here is the problem:

If I run the trf2_1.in file from Tutorial #2 on response functions, it
goes through the WFK generation (DATASET 1) without problem, then dies in
DATASET 2 (RF). The job is submitted to a grid engine with abinip as the
executable but it is only on one node. The last lines of the log file are:

----------------- log file tail --------------------------
-P-0000 leave_test : synchronization done...
newkpt: loop on k-points done in parallel
pareigocc : MPI_ALLREDUCE


iter 2DEtotal(Ha) deltaE(Ha) residm vres2

getcut: wavevector= 0.0000 0.0000 0.0000 ngfft= 12 12 12
ecut(hartree)= 3.000 => boxcut(ratio)= 2.05142
scfcv3, nstep= 25
[cl-0-29:15595] *** Process received signal ***
[cl-0-29:15595] Signal: Segmentation fault (11)
[cl-0-29:15595] Signal code: Address not mapped (1)
[cl-0-29:15595] Failing at address: 0x954
[cl-0-29:15595] *** End of error message ***
mpirun noticed that job rank 0 with PID 15595 on node cl029.dal.acenet.ca
exited on signal 11 (Segmentation fault).
----------------- end of log file tail ---------------------

If I do a back trace using gdb of the dumped core.15595 file I get the
following:


#0 0x00000000006463a3 in vtorho3_ (atindx=(), atindx1=(), cg=(), cgq=(),
cg1=(), cplex=1, cprj=(), cprjq=(), cpus=0, dbl_nnsclo=0, gh1_rbz=(),
densymop_rf=Invalid F77 type code 3 in symbol table.
) at vtorho3.F90:634
#1 0x0000000000639700 in scfcv3_ (atindx=(), atindx1=(), blkflg=(),
cg=(), cgq=(), cg1=(), cplex=1, cprj=(), cprjq=(), cpus=0, dimpaw1=0,
gh1_rbz=(), densymop_rf=Invalid F77 type code 3 in symbol table.
)
at scfcv3.F90:702
#2 0x0000000000559b17 in loper3_ (amass=(), atindx=(), atindx1=(),
blkflg=(), codvsn='5.6.3 ', cpui=0, cpus=0, dimcprj=(), doccde=(),
ddkfil=(0, 0, 0), dtfil=Invalid F77 type code 3 in symbol table.
)
at loper3.F90:1213
#3 0x000000000051d483 in respfn_ (codvsn='5.6.3 ', cpui=0, dtfil=Invalid
F77 type code 3 in symbol table.
) at respfn.F90:1273
#4 0x0000000000441db1 in driver_ (codvsn='5.6.3 ', cpui=0, dtfil=Invalid
F77 type code 3 in symbol table.
) at driver.F90:841
#5 0x00000000004398f9 in MAIN_ ()
#6 0x0000000000436d00 in main ()





Archive powered by MHonArc 2.6.15.

Top of Page