Skip to Content.
Sympa Menu

forum - Re: [abinit-forum] 5.8.4 Parallel job crash

forum@abinit.org

Subject: The ABINIT Users Mailing List ( CLOSED )

List archive

Re: [abinit-forum] 5.8.4 Parallel job crash


Chronological Thread 
  • From: BOTTIN Francois <francois.bottin@cea.fr>
  • To: forum@abinit.org
  • Subject: Re: [abinit-forum] 5.8.4 Parallel job crash
  • Date: Thu, 24 Sep 2009 10:04:08 +0200
  • Organization: CEA-DAM

Dear Paul,

I feel it's coming from iprcel 45.
Your job seems to crash within the susceptibility matrix calculation.
If this part is allowed using paral_kgb=1, at this time it is very time consuming and ressource demanding.
So it is only allowed for testing purpose.

If not, please sent your whole ouptput and log files.

Regards,
Francois

Paul Fons a écrit :
I have been attempting to study the use of parallization keywords for abinit and have run into a situation where abinip crashes with a segmentation fault (brief snip of the log below). I have also included the input file for reference. I must admit I don't have a very good grasp of the parallization options at this point, but whatever I choose, I would guess that I should not end up with a segmentation fault. Any suggestions?
The code was compiled with ifort (specific options listed below) on an eight core Mac Pro.


Thanks

Paul Fons
############Log File Tail with error ####################

-P-0000 leave_test : synchronization done...
mkrho: loop on k-points and spins done in parallel
Total charge density [el/Bohr^3]
, Maximum= 8.8868E-02 at reduced coord. 0.9083 0.7778 0.9667
, Minimum= 1.2180E-03 at reduced coord. 0.5250 0.0000 0.5250
Total charge density [el/Bohr^3]
, Maximum= 8.8868E-02 at reduced coord. 0.9083 0.7778 0.9667
, Minimum= 1.2180E-03 at reduced coord. 0.5250 0.0000 0.5250
-P-0000 leave_test : synchronization done...
suscep_stat : loop on k-points and spins done in parallel
shape (susmat) = 2 1785 1 1785 1
shape (susmat) = 2 1785 1 1785 1
shape (susmat) = 2 1785 1 1785 1
shape (susmat) = 2 1785 1 1785 1
shape (susmat) = 2 1785 1 1785 1
shape (susmat) = 2 1785 1 1785 1
shape (susmat) = 2 1785 1 1785 1
shape (susmat) = 2 1785 1 1785 1
--------------------------------------------------------------------------
mpiexec noticed that process rank 5 with PID 68966 on node neutrino.a04.aist.go.jp exited on signal 11 (Segmentation fault).

##############Compilation Options ####################

=== Build Information ===
Version : 5.8.4
Build target : i386_darwin9.8.0_intel11.0
Build date : 20090919

=== Compiler Suite ===
C compiler : gnu
CFLAGS : -O2 -m64
C++ compiler : gnu4.0
CXXFLAGS : -O2 -m64
Fortran compiler : intel11.0
FCFLAGS : -O3 -g
FC_LDFLAGS :

=== Optimizations ===
Debug level : symbols
Optimization level : standard
Architecture : _

=== MPI ===
Parallel build : yes
Parallel I/O : yes
MPI CPPFLAGS : -DMPI=1 -DMPI2=1 -DMPI_IO=1

=== Linear algebra ===
Library type : external
Use ScaLAPACK : no

=== Plug-ins ===
BigDFT : yes
ETSF I/O : yes
LibXC : yes
FoX : no
NetCDF : yes
Wannier90 : yes
XMLF90 : no

=== Experimental features ===
Bindings : no
Error handlers : no
Exports : no
GW double-precision : no
Macroave build : yes


####################Input File#####################################


The input file is attached below for reference

# Lattice parameter relaxation (including re-optimization of
# internal coordinates)

dilatmx 1.2 # Maximum scaling allowed for lattice parameters
ionmov 3 # Use BFGS algorithm
ntime 120 # Maximum number of optimization steps
optcell 2 # Fully optimize unit cell geometry, keeping symmetry
tolmxf 1.0e-6 # Convergence limit for forces as above
strfact 100 # Test convergence of stresses (Hartree/bohr^3) by
# multiplying by this factor and applying force
# convergence test
prtgeo 6
ecutsm 0.5
#=================================================

# parallization variables
paral_kgb 1
wfoptalg 14
nloalg 4
fftalg 401
iprcch 4
intxc 0
istwfk 1
fft_opt_lob 2
npfft 2
npband 16


#=================================================
# Calculation
ixc 7 # LDA or LSD, Teter Pade parametrization
ecut 20
#nband 200
occopt 3
tsmear 0.001 Ha
diemix 1
diemac 12
iscf 17
npulayit 20
iprcel 45 #SCF preconditioning, compute the RPA dielectric matrix at the first step, and recompute each step
nstep 200
tolvrs 1.0d-15
#=================================================


#=================================================
# KptGrid
kptopt 1
ngkpt 2 2 2
#=================================================



#=================================================
# Structure
acell 2.3472990156E+01 2.1416202665E+01 2.2829820755E+01
rprim 9.9999040546E-01 -4.3278800423E-03 -6.7708842004E-04
-8.4365915123E-03 9.9995267729E-01 -4.8442866378E-03
-9.2839564304E-04 -8.2630793413E-03 9.9996542920E-01

natom 56
ntypat 3
znucl 32 51 52
typat 8*1 16*2 32*3
xcart
7.64903050727687E+00 3.70786933047274E+00 2.03854248709848E+01
1.53777783328731E+01 1.75129722033248E+01 8.91144538019716E+00
1.61043791880839E+01 8.42811621834851E+00 8.30391356166042E+00
1.92648347440380E+01 4.48585851017361E+00 1.41075418556874E+01
7.65528402890655E+00 1.32373536776582E+01 2.13621839955297E+01
3.53536790751728E+00 1.77037554287189E+01 7.83953650844295E+00
2.01669253437016E+01 1.43678663967696E+01 1.99921563932553E+01
3.41610243671658E+00 3.27583538347910E+00 2.18451293492937E+00
1.77658032309385E+01 1.44381527352672E+01 1.21808728166705E+01
3.99726773805650E+00 1.55301797871776E+01 1.44367783628060E+00
3.24381188162565E-02 9.90763595116681E+00 2.31616068517336E+01
1.15634952674770E+01 1.63469867258440E+01 1.67858050920974E+01
1.17234135518916E+01 5.15206944025922E+00 5.57997270756973E+00
3.38545937719249E-01 5.27152903896988E+00 1.64909066477600E+01
-1.25956396013325E+00 1.51941337814835E+01 5.66856539691514E+00
1.67427745587491E+01 1.01677363481710E+01 1.70356580625314E+01
1.74889112537567E+01 2.14415960427150E+01 5.08012987634538E+00
5.20436950708091E+00 2.04510938227703E+01 1.69328729383300E+01
5.83248173079816E+00 1.00668269543405E+01 5.82511338366536E+00
1.69058930916342E+01 5.44158534726656E+00 -2.16875982251231E-01
1.14478213641201E+01 2.14067680887845E+01 -7.37133630516936E-02
-4.56781545787971E-03 2.15082711873171E+01 1.11914229522603E+01
6.89325977321322E+00 5.19085086051822E+00 1.20732280739244E+01
1.18260878315009E+01 1.10106693193841E+01 1.09554286623317E+01
5.55293700386207E+00 -9.08379756199719E-01 3.66465273998066E-03
1.79985411043133E+01 6.24218498329430E-01 1.12402686810674E+01
1.71942001152461E+01 1.13101891917151E+01 -4.84440266489979E-01
1.26878781059539E+01 5.13251738639552E+00 1.09978256028475E+01
-6.50193761804745E-01 1.57802036826258E+01 1.11596387346547E+01
-1.02604457424129E+00 4.96634865210211E+00 -6.73097999699995E-01
2.33537680056396E+01 9.88707912220360E+00 6.10826699096729E+00
2.29852463452618E+01 2.12114518251038E+01 1.66599708565945E+01
1.22232885036521E+01 2.08727445643273E+01 5.53194801348795E+00
1.14667297727525E+01 1.08883768941973E+01 1.68133908463315E+01
1.20152100923160E+01 1.61669109268310E+01 -5.25089342964899E-01
6.50738025520855E+00 4.48682210053020E+00 6.33107290863955E+00
6.09706234778241E+00 1.51808280126207E+01 1.68378955464732E+01
1.69665142853349E+01 1.36189954044245E+01 5.85004443906740E+00
1.54155719512525E+01 5.02533339195595E+00 1.72012760815797E+01
1.97226735071461E+01 8.87990788178025E+00 1.17767449660632E+01
1.68905182361589E+01 2.12690677624622E+01 2.23479489549080E+01
5.75725578561531E+00 2.07078547580210E+01 1.13763969715566E+01
5.21245919927846E+00 8.44225866564750E+00 2.31958914257793E+01
1.08818580526721E+01 1.63624727256228E+01 1.08918294189764E+01
1.13124635080077E+01 6.30393394077432E+00 2.26878820078345E+01
2.49257212897598E+01 5.50572088462659E+00 1.11122967668993E+01
2.18464733892219E+01 1.75141649461871E+01 2.31127995799147E+01
-9.97470518267632E-02 1.15891619518773E+01 1.76507937284874E+01
9.50044364651022E-01 -1.64226282922227E-01 5.40194333962429E+00
1.07022490316033E+01 7.88369968438184E-01 1.72497450152962E+01
1.12766649184463E+01 1.07058794844567E+01 5.32218315850765E+00
1.74924886915205E+01 1.70921011246861E+01 1.67711014044835E+01
1.88674326277119E+01 5.26568571353990E+00 5.20322800151578E+00
5.95117393151507E+00 7.09327889287457E+00 1.71421788222188E+01
7.54305999769611E+00 1.58557245081277E+01 5.40369066115511E+00
6.52780647971927E+00 1.05300127764927E+01 1.13754713069850E+01





--
##############################################################
Francois Bottin tel: 01 69 26 41 73
CEA/DIF fax: 01 69 26 70 77
BP 12 Bruyeres-le-Chatel email: Francois.Bottin@cea.fr
##############################################################




Archive powered by MHonArc 2.6.16.

Top of Page