forum@abinit.org
Subject: The ABINIT Users Mailing List ( CLOSED )
List archive
- From: "Eric J. Walter" <ejwalt@wm.edu>
- To: forum@abinit.org
- Subject: Re: [abinit-forum] steady increase in memory for band parallel job
- Date: Tue, 01 Dec 2009 09:15:16 -0500
Dear Francois,
I have put a copy of my input, output, log (called OUTFILE) and psps at the following link:
http://piezo1.physics.wm.edu/~ewalter/Abinit_memoryleak/
NOTE: the input file is for a different system than the one I attached previously (the atom species are
different). Please disregard the previous one.
I have tried compiling with g95... The serial code works fine, but never showed the memory problem
anyway. The parallel version exits right after reading the psp files. I am continuing to look into this problem.
I have also tried compiling with pathscale v3.0, the result is the same as the Intel fortran result.
So far, my testing seems to show that the largest problem is in the lobpcgwf / lobpcgccwf routine. In my
test, I run for 5 iterations and output the "free-(buffers+cache)" result of "free -m" to the output file
(I have added call system('free -m') to various parts of the 5.8.4 code). The graph is posted at the same
url as above (called memgraph). You can clearly see 5 repeating patterns, one for each iteration. In the
serial case using wfoptalg4 (green), you can see that the memory usage from iteration to iteration doesn't
increase. For the parallel with the lobpcgwf routine (black) the memory keeps increasing during each iteration.
However, when the lobpcgwf routine is commented out (red) the memory increase is less quickly.
I will continue to try and track down the problem here.
Regards,
Eric
BOTTIN Francois wrote:
Hi,
Is it possible for you to compile abinit-5.8.4p with g95 in order to track memory leaks?
Perhaps your code will stop at a well defined line, with a tab "already allocated"!
It would be very nice to get rid of this trouble.
If not, could you sent your 3 pseudopotentials, output and log (of the proc 0), if this file is not too large.
Regards,
Francois Bottin
Eric J. Walter a écrit :
Hi,
It appears as though my jobs are suffering from the same problem outlined in this post from July of this year:
https://listes-2.sipr.ucl.ac.be/abinit.org/arc/forum/2009-07/msg00034.html
When I run the attached input file for ~14 hrs, the nodes I am using run out of memory.
I am using RHEL AS4 on dual core / dual processor Opteron 2200 with 2 GB memory per core (= 8 GB total).
I am compiling with Intel Fortran 10.1 and OpenMPI-1.2.5. This job's output file claims that the calculation
should require ~200 MB.
I have found that this increase only occurs when using band/fft parallel (kpt parallel and serial don't
have this steady increase). Besides version 5.8.4p, I have also tested versions 5.7.4 and 5.6.4, all three versions
seem to show this behavior. Version 5.4.4 does not have this problem, but is also much slower, for me at least.
I have tried changing versions of openmpi (from 1.2.5 to 1.3.3), this has no effect.
Has any progress been made on either finding the leak or another cause to this problem?
Thanks in advance for any help you can give.
Eric J. Walter
Department of Physics
College of William and Mary
---------------------------------------------------
Here is my input file:
---------------------------------------------------
ionmov 8 noseinert 5.0d5 mditemp 300 dtion 50 toldfe 1d-4 eV ntime 20 nsym 1 chkprim 0 kptopt 1 ngkpt 1 1 1 shiftk 0 0 0 occopt 3 tsmear 0.001 ecut 45 nstep 20 natom 80 ntypat 3 znucl 82 22 8 prtden 0 prtwf 0 wfoptalg 4
npfft 1 npband 26
npkpt 1
nloalg 4
fftalg 401
iprcch 4
intxc 0 fft_opt_lob 2
paral_kgb 1
typat 48*3 16*2 16*1
acell 3*20.599653
angdeg 3*56.39553912
xcart
-2.22069320900486E+00 -1.67615626136839E+00 1.02845225672214E+01
2.83216713500271E+00 -9.64895067984759E-01 1.06130211295748E+01
-3.07826006953531E-01 3.14017017198906E+00 1.02884003097039E+01
2.38571355304415E+00 1.94315354580616E+00 1.47667846684432E+01
-2.74032183791911E+00 1.44987913001174E+00 1.47058588105213E+01
4.24301353957222E-01 -3.02467161327066E+00 1.47270477618761E+01
4.37208548403877E-01 3.15701320773295E+00 1.63174232817734E+00
5.62754717842062E+00 3.65799392921832E+00 1.72482290066637E+00
2.56805130575382E+00 8.08007715450321E+00 1.75538816925331E+00
5.29946087386834E+00 6.62719979049667E+00 6.13774796347384E+00
-1.33476176608987E-01 6.02962682764747E+00 5.99150680371859E+00
3.25740331179827E+00 1.86629101076117E+00 5.92292965668993E+00
4.73721314332368E-01 -6.73544486518827E+00 1.53749001511875E+00
5.79909660316150E+00 -5.87516895113753E+00 1.52606696257784E+00
2.72661190097539E+00 -1.82547980521533E+00 1.72879478480763E+00
5.29308456737276E+00 -2.93742682967659E+00 5.98120816946857E+00
-1.02277107167427E-01 -3.47901581523419E+00 6.00815649851232E+00
3.22915540047783E+00 -7.91315859978646E+00 6.15697463378583E+00
3.20853423992771E+00 -1.70404358692119E+00 -6.97779629913824E+00
8.39651387970289E+00 -8.70615018894613E-01 -7.04102732844698E+00
5.39613269691849E+00 3.09966157186435E+00 -6.96223499113757E+00
8.17325435029689E+00 2.08765713206062E+00 -2.55251338829625E+00
2.82623828634786E+00 1.23322960474568E+00 -2.78889127839044E+00
6.11970263677967E+00 -3.17948832421889E+00 -2.34609526217649E+00
-7.91433373058299E+00 -1.81160718343090E+00 1.63075874446493E+00
-2.76694176578899E+00 -1.16579400400263E+00 1.74956240701256E+00
-5.81555263606813E+00 3.20412849389643E+00 1.72359061387011E+00
-2.79939696716810E+00 2.06735348854857E+00 6.15861322508217E+00
-8.31960266338488E+00 1.29355309750287E+00 5.93088060057318E+00
-5.10575950217045E+00 -2.98997425594191E+00 6.13900333992898E+00
-5.14327896212322E+00 3.29299293105111E+00 -7.09301147609970E+00
9.75650004473785E-02 3.78997253887449E+00 -6.91330351424288E+00
-2.92976440197528E+00 7.95211008443348E+00 -6.82517026645523E+00
-3.06355829923221E-01 6.76833507431424E+00 -2.77362691622797E+00
-5.53331641987666E+00 6.24968986891999E+00 -2.54578391594814E+00
-2.43999241882386E+00 1.80795237233396E+00 -2.67292251413726E+00
-5.14068521198933E+00 -6.68713377702076E+00 -6.86237306597303E+00
-6.12789514851627E-02 -5.98968660555720E+00 -6.97883625732929E+00
-2.99553503150223E+00 -1.83920851864552E+00 -6.83013099810669E+00
-1.80008738706434E-01 -2.89562125276255E+00 -2.65747984992241E+00
-5.56015371335734E+00 -3.55350951569161E+00 -2.53677784336301E+00
-2.23260539192524E+00 -7.88875944438475E+00 -2.59919279877736E+00
-2.31544564461024E+00 -1.60168511875929E+00 -1.57988254333016E+01
2.98206360781800E+00 -1.09336396682876E+00 -1.56367782886258E+01
-3.02653507337988E-01 3.20195569372773E+00 -1.56155029581190E+01
2.62978455935385E+00 1.90892236294971E+00 -1.12646465158811E+01
-2.74702844352294E+00 1.34758125055524E+00 -1.11329897640166E+01
3.92389502215297E-01 -3.14047927186374E+00 -1.13789217412000E+01
1.43789519936701E-01 8.95683744161872E-02 5.51308679103783E-02
3.43343139804145E-02 8.44171505400222E-02 1.31069788732053E+01
2.87456960507807E+00 4.84780682916232E+00 -8.54098683394433E+00
2.83533350827871E+00 4.91662126969937E+00 4.43850072950254E+00
2.95141488341089E+00 -4.80542390687403E+00 -8.59714703849626E+00
2.82501485426290E+00 -4.90110860896433E+00 4.40234973478692E+00
5.65634082404525E+00 -1.24851891953083E-02 -1.71524703303103E+01
5.66706037951402E+00 9.18065709228990E-02 -4.30774458733638E+00
-5.41740478793818E+00 1.03696823860055E-01 -8.46603127880005E+00
-5.52981835134213E+00 3.74816566677455E-02 4.36263803023674E+00
-2.73374622355905E+00 4.90669254100288E+00 -1.71056087884796E+01
-2.76475317059456E+00 4.97808833622533E+00 -4.26429469831174E+00
-2.71829980916229E+00 -4.81400423619281E+00 -1.71329000123123E+01
-2.78480871843135E+00 -4.79896324962904E+00 -4.16357601481035E+00
1.27116237724845E-01 5.57355728005296E-02 -2.58199597795567E+01
3.69147318259344E-02 1.01239700458947E-01 -1.28425126153755E+01
1.09315814613628E-01 2.02744841122104E-02 7.51595267544533E+00
9.28429983979248E-02 4.27836872194515E-02 2.02554355831334E+01
2.71495957648699E+00 4.91077718525781E+00 -8.53254132115559E-01
2.86538509412732E+00 4.97170584874535E+00 1.18102413232714E+01
3.09009565965701E+00 -4.82067158705315E+00 -1.15312475386682E+00
2.97965222241338E+00 -4.75118372415529E+00 1.17120712079335E+01
5.77396145656038E+00 1.07323885408925E-02 -9.98939448156541E+00
5.72978432984313E+00 -6.68544670080847E-03 3.19837381572586E+00
-5.80054029700240E+00 -3.59016258256638E-02 -1.07926803392243E+00
-5.41163712815481E+00 1.29254553589581E-01 1.16343816906280E+01
-2.62158506045681E+00 5.16922521499392E+00 -1.00544849478146E+01
-2.72015016982663E+00 4.98880486733811E+00 3.02504536215413E+00
-2.79050293194776E+00 -4.88384984087100E+00 -1.01240590641792E+01
-2.61941881955994E+00 -4.73421992471235E+00 3.07027251730700E+00
7.60779786630738E-02 2.19994741542859E-01 -1.87223237882607E+01
6.88549292157438E-02 -1.91082171805773E-02 -5.66940459435534E+00
vel
1.65929233257248E-04 -7.22899336586353E-06 2.26257574401143E-04
-4.41498838683481E-04 2.56696469185694E-05 3.31830998480140E-04
1.73881380446662E-04 -3.59274234563293E-04 2.29273421593672E-04
-7.02798094302199E-05 -2.60387333161425E-05 -2.65924872337890E-04
4.42536785165872E-04 -4.89827163710057E-04 -4.01073432982287E-04
1.95643632158351E-05 1.11653991217122E-03 -9.34094601194410E-04
-7.86319162726620E-05 1.98530438200753E-04 2.04688581300107E-04
3.00397575951297E-04 2.29731480818108E-04 -7.09537178564776E-05
-3.61460082461821E-04 2.48090453139150E-04 -5.49686708040239E-04
-8.68482177851845E-05 -1.08275834028475E-04 2.95658165452176E-05
-1.07412509904415E-04 -8.78012429709354E-05 1.54371802639068E-04
-2.09843740843053E-05 -3.01743488213807E-04 6.64260233761203E-05
-6.12496928754301E-05 1.36432817751052E-04 -9.40878503004121E-05
6.14104908278753E-04 1.68165113713754E-04 -5.12708955417303E-04
2.15443297114865E-04 -4.04241581459637E-04 2.39357619874606E-04
7.81430062976476E-05 1.59513106225212E-04 2.13405809058389E-04
-2.36012088016480E-04 1.38402390396325E-04 2.23418565305995E-04
-9.33414880283709E-05 1.20065355174723E-04 -5.40061134400772E-05
-2.80686746960019E-05 -3.62478762225168E-04 -2.99299750890718E-04
-4.40755174907394E-05 -1.32663871913867E-04 1.18770307640526E-04
-2.53558806288978E-04 4.01259820381485E-04 -2.81728409555392E-04
1.57361848603766E-04 3.41172196643058E-05 1.33022101611947E-04
4.01019710009567E-04 7.22573433787659E-07 -2.69262344059131E-04
2.99733136121017E-04 -6.68678893104353E-04 3.86496510503676E-04
3.95734899013453E-05 -2.80879543778806E-04 -2.77473937236590E-04
-2.93485746700284E-04 -1.00409592926771E-05 1.76552268367262E-04
-4.99995043152451E-04 5.34640365244488E-04 -4.16733250625040E-04
4.62540147703766E-04 2.69462418459427E-04 3.55370145754719E-04
2.11074019336284E-04 -5.09254836134716E-05 -6.07095017577757E-05
5.30486297370657E-05 -3.89516038918225E-04 4.59061055880998E-04
3.85969341185967E-05 -1.33647845375007E-04 1.45606759036967E-04
2.43990312107843E-04 1.62650517626908E-04 -1.51033371102077E-04
2.67081228198250E-04 -3.20863354500405E-04 5.19374366175999E-04
-3.10316561967053E-04 -1.85209360845414E-04 -4.03377994784274E-04
-3.01054183714974E-04 -1.79705768853539E-04 1.86267271696545E-05
2.66589291951177E-04 9.91038990779357E-04 -3.96599362099506E-04
4.08276960903040E-05 1.53432487219794E-04 1.22871735039336E-04
2.35590328814509E-04 2.76550673861414E-04 -1.90931581662904E-04
2.04967171400528E-04 -3.84948347563266E-04 1.08474133114518E-04
-6.97154912755278E-04 -4.71858129707306E-04 -3.50888447092869E-04
-1.24684436826259E-04 1.21627138377997E-04 1.68581444194268E-04
-7.70851773281092E-05 4.63714165425422E-05 1.81085183134115E-04
-3.17589866254135E-04 -5.50009675262097E-04 -4.30671530615520E-04
-1.57911296613577E-04 -4.54426250235638E-05 -4.92425558216806E-06
9.87305191973671E-06 1.80207048070693E-05 -1.79532949110393E-04
6.83671915911343E-05 -9.70105029125616E-05 1.82258963862073E-05
4.57857583174456E-04 -8.32122447198841E-05 -1.67508689699827E-04
1.94599797149855E-04 -2.23628588270493E-04 1.07780974532419E-04
-2.82971791158802E-05 4.68214184227891E-05 -8.36700118689787E-05
-4.24265368843893E-05 -3.21245875729528E-05 9.22433773212692E-05
2.51671799765403E-05 -1.77766618119046E-05 1.72494196852232E-05
6.11315117142581E-06 8.68163744077307E-05 5.67296042042437E-06
-7.70845081775687E-05 3.08561858749785E-05 1.52938314279569E-05
4.20820628111984E-05 3.70042677593407E-05 9.96812582225421E-05
1.40844649691300E-04 -7.97217755672014E-05 -1.31083237666120E-05
-5.81978615724723E-05 1.17490844352786E-04 4.40930378539056E-05
6.36078347459770E-05 6.69615793199388E-05 -8.39129585283473E-05
-6.07614472082574E-05 -4.37626480779298E-05 -4.29461393218481E-06
-6.36394987636312E-05 -3.12613972792605E-05 7.79309141070753E-05
2.39708852289351E-05 -9.39157455192732E-05 3.01944658201935E-05
-1.03662791966090E-05 3.87149926133824E-05 1.05050319203302E-04
1.10894432823969E-04 3.45992275343292E-05 -3.01976038789652E-05
-1.72253916094043E-05 3.96742670252505E-05 -5.25674305834915E-05
-8.48228304979956E-05 5.64908929641756E-05 4.47531393684651E-05
-1.47506234752193E-04 1.22556764449835E-04 -3.01480394930127E-04
-2.95445248698945E-04 -2.42094492146111E-04 2.66277938523650E-04
-7.77631153135254E-05 -1.09252568425864E-04 1.69153096659710E-04
2.60223872399644E-04 -7.70901517625762E-05 1.36621011724429E-04
4.79856202268419E-05 -3.58150632460491E-04 -1.54493852536122E-04
-7.14052845122851E-05 -2.69155466792118E-04 1.79749509700142E-04
1.45112055674323E-05 -1.80926466413756E-04 6.10956577883199E-05
9.60743159647168E-05 -1.00159903624474E-04 7.70024146584693E-05
-7.39105065049050E-05 -2.15075671717976E-04 2.20969494128146E-04
2.04020346009061E-04 1.56655746598131E-04 -1.77043062668635E-04
-4.03670312774535E-04 -5.52855555754880E-05 -1.18348120988931E-04
3.71601522478895E-04 4.88286475929969E-05 1.85192663041991E-04
4.16812192512686E-05 2.23401099965076E-04 1.46463434516355E-04
-3.25279267883869E-04 7.02921173153103E-05 1.05326094998430E-04
-6.05071951258716E-05 4.15488061404616E-04 1.16175008659989E-04
-1.93579262145139E-04 5.50540862463690E-05 4.37697584266472E-04
- Re: [abinit-forum] steady increase in memory for band parallel job, Winfried Lorenzen, 12/01/2009
- <Possible follow-up(s)>
- Re: [abinit-forum] steady increase in memory for band parallel job, Eric J. Walter, 12/01/2009
- Re: [abinit-forum] steady increase in memory for band parallel job, Manuel Cotelo, 12/01/2009
- Re: [abinit-forum] steady increase in memory for band parallel job, BOTTIN Francois, 12/03/2009
- Re: [abinit-forum] steady increase in memory for band parallel job, Winfried Lorenzen, 12/03/2009
- Re: [abinit-forum] steady increase in memory for band parallel job, Eric J. Walter, 12/03/2009
- Re: [abinit-forum] steady increase in memory for band parallel job, BOTTIN Francois, 12/04/2009
- Re: [abinit-forum] steady increase in memory for band parallel job, Eric J. Walter, 12/04/2009
- Re: [abinit-forum] steady increase in memory for band parallel job, BOTTIN Francois, 12/04/2009
- {Spam?} Re: [abinit-forum] steady increase in memory for band parallel job, Eric J. Walter, 12/07/2009
- Re: {Spam?} Re: [abinit-forum] steady increase in memory for band parallel job, Manuel Cotelo, 12/07/2009
- Re: {Spam?} Re: [abinit-forum] steady increase in memory for band parallel job, Eric J. Walter, 12/07/2009
- Re: [abinit-forum] steady increase in memory for band parallel job, Eric J. Walter, 12/04/2009
- Re: [abinit-forum] steady increase in memory for band parallel job, BOTTIN Francois, 12/04/2009
- Re: [abinit-forum] steady increase in memory for band parallel job, Eric J. Walter, 12/03/2009
- Re: [abinit-forum] steady increase in memory for band parallel job, Winfried Lorenzen, 12/03/2009
Archive powered by MHonArc 2.6.16.