Skip to Content.
Sympa Menu

forum - Re: [abinit-forum] v.5.3.x is slower than v.5.2.x ?!

forum@abinit.org

Subject: The ABINIT Users Mailing List ( CLOSED )

List archive

Re: [abinit-forum] v.5.3.x is slower than v.5.2.x ?!


Chronological Thread 
  • From: Masayoshi Mikami <mmikami@rc.m-kagaku.co.jp>
  • To: forum@abinit.org
  • Subject: Re: [abinit-forum] v.5.3.x is slower than v.5.2.x ?!
  • Date: Fri, 11 May 2007 20:48:30 +0900

Dear Marc,

Thank you very much for sparing your time for this !
I reply to your questions with a benchmark that can
be shared among us... (N.B. not light job !)

To your four questions 1)-4):

1) I have not had the keyword "userid" in my jobs in this series.

2) I renamed the src of v.5.2.4, and copied the src,
then made "configure/make", then run the same job.
The job was slower than the one with the original v.5.2.4 !?

Taking the diff between the original v.5.2.4 and
"the v.5.2.3 src with the v.5.2.4 configuration"
(snip)
< - Total cpu time (s,m,h): 12559.0 209.32 3.489
< - Total wall clock time (s,m,h): 12559.0 209.32 3.489
---
> - Total cpu time (s,m,h): 18226.4 303.77 5.063
> - Total wall clock time (s,m,h): 18226.4 303.77 5.063
1168,1175c1168,1175
< - projbd 5564.227 44.3 5564.303 44.3 36206
< - fourwf(pot) 3050.735 24.3 3050.726 24.3 22079
< - nonlop(apply) 2149.281 17.1 2149.324 17.1 22079
< - vtowfk(ssdiag) 336.827 2.7 336.827 2.7 -1
< - fourwf(den) 273.986 2.2 274.021 2.2 3264
< - nonlop(forces) 213.336 1.7 213.283 1.7 3264
< - getghc-other 65.512 0.5 65.414 0.5 -1
< - 59 others 191.804 1.5 191.829 1.5
---
> - projbd 8657.421 47.5 8657.593 47.5 36206
> - fourwf(pot) 3640.456 20.0 3640.632 20.0 22079
> - nonlop(apply) 3340.025 18.3 3340.070 18.3 22079
> - vtowfk(ssdiag) 501.184 2.7 501.183 2.7 -1
> - fourwf(den) 343.699 1.9 343.714 1.9 3264
> - nonlop(forces) 309.497 1.7 309.501 1.7 3264
> - getghc-other 102.163 0.6 101.822 0.6 -1
> - 59 others 263.562 1.4 263.533 1.4
1177c1177
< - subtotal 11845.706 94.3 11845.727 94.3
---
> - subtotal 17158.009 94.1 17158.048 94.1

I am wondering why the CPU times were different, but it might be
related to the "uptime load". The Itanium2 box has four cpus.
The first job (the original v.5.2.4) was done with "uptime" = 2.0
and the second job was done with"uptime" = 4.0.(on average)

The first report (among v.5.2.x - v.5.3.4) was performed
when I had other one job. (so, "uptime" = 2.0)

In any case, apparently the source does not seem the problem...

3) I run the sequential jobs on the Itaium2 node (with 32GB memory),
but the maximum number of jobs have been NEVER more than four.
The sum of the memory of the jobs should have been less than 5 GB.
I think that the environmental effect should not be of the order of 10 !

4) the reproducibility about v.5.2.3 and v.5.2.4.
Yes. I attach a benchmark. The number of atoms, the number of bands.
and ecut of this benchmark are close to my own job (undisclosed),
so the memory size should be similar each other.
The prepared job is on brookite (TiO2). I used HGH-type PPs
(22ti.12.hgh & 8o.6.hgh, available on the ABINIT WEB site)
to have the same order of the nband.

The diff of the v.5.2.3 and "the v.5.2.3/src with v.5.2.4 conf":
2c2
< .Version 5.2.3 of ABINIT
---
> .Version 5.2.4 of ABINIT
(snip)
< Total energy(eV)= -1.96217293671512E+04 ; Band energy (Ha)= -7.6019768659E+01
---
> Total energy(eV)= -1.96217293671514E+04 ; Band energy (Ha)= -7.6019768659E+01
876,877c876,877
< - Total cpu time (s,m,h): 19018.8 316.98 5.283
< - Total wall clock time (s,m,h): 19018.8 316.98 5.283
---
> - Total cpu time (s,m,h): 58214.5 970.24 16.171
> - Total wall clock time (s,m,h): 58214.5 970.24 16.171
884,890c884,890
< - projbd 7178.381 37.7 7178.442 37.7 50214
< - fourwf(pot) 6030.373 31.7 6030.553 31.7 30255
< - nonlop(apply) 2537.716 13.3 2537.849 13.3 30255
< - vtowfk(ssdiag) 602.527 3.2 602.529 3.2 -1
< - fourwf(den) 522.162 2.7 522.133 2.7 4224
< - nonlop(forces) 395.062 2.1 395.029 2.1 4224
< - 60 others 313.039 1.6 312.911 1.6
---
> - projbd 35407.559 60.8 35407.806 60.8 50214
> - nonlop(apply) 8066.123 13.9 8066.327 13.9 30255
> - fourwf(pot) 7364.795 12.7 7364.928 12.7 30255
> - vtowfk(ssdiag) 1978.062 3.4 1978.069 3.4 -1
> - nonlop(forces) 730.142 1.3 730.136 1.3 4224
> - fourwf(den) 667.165 1.1 667.168 1.1 4224
> - 60 others 527.324 0.9 526.847 0.9
892c892
< - subtotal 17579.261 92.4 17579.446 92.4
---
> - subtotal 54741.171 94.0 54741.281 94.0
(snip)

I set "ntime 1", but it can be commented to save the CPU time.
(Other parameters might be tuned to reduce the CPU time, e.g.
to have "nkpt 1"; the present case is with "nkpt 2", and needs
the memory of about 600MB)

I am quite puzzled ... Still I wish you "Bon weekend",
Masayoshi

Attachment: test.in
Description: Binary data

Attachment: test.files
Description: Binary data


On 2007/05/09, at 21:41, Marc Torrent wrote:

Dear Masayoshi,

First of all, a remark concerning the memory need for useylm=1:
the additional amount of memory is justified by the fact that non- local form factors (ffnl Abinit variable) are now discritized by (l,m,n) quantum number instead of (l,n).
And you also add the memory needed by the spherical harmonics...

Now, concerning your problem of cpu time:
I'm currently looking at the 5.2.4-5.3.0 differences but it will need more time... it will be the subject of of a later mail.

Between 5.2.3 and 5.2.4:
You found that cpu time increases (1.618 h -> 3.489 h)... it's surprising !... because these two versions of Abinit do not have any differences for ground states calculations...
Only a few comments, new input keywords and tests have been added.
Just verify that your input file does not contain userid keyword (because Pierre-Matthieu changed some lines for that in 5.2.4).
The "major evolution" between 5.2.3 and 5.2.4 is the build system !...
Do you think it would be possible for you to compile Abinit 5.2.4 with 5.2.3 build system ? (just try to copy /src directory from 5.2.4 to 5.2.3 ???).
Another question: are you sure that you ran Abinit with the same environnement, in particular the computer load...
Can you reproduce several times the difference between 5.2.3 and 5.2.4 ?

That's all for today,
sorry,

Marc


Masayoshi Mikami a ñÄrit :
Dear all,
I was away from office, so my reply was later than expected...
I am going to report the two things:
- put useylm=1 in your input and see if Ylm version of nonlop has the same behaviour as Legendre polynomials one.
- try to follow the changes by testing v5.2.3, v5.2.4, v5.3.0, v5.3.2 and v5.3.4...
(1) with/without "useylm 1" for v.5.3.4
No effective... (even slower with useylm 1). diff "without" and "with" :
< Total energy(eV)= -8.52960977840336E+03 ; Band energy (Ha)= -2.0397416022E+01
---
> Total energy(eV)= -8.52960977840312E+03 ; Band energy (Ha) = -2.0397416022E+01
(snip)
< - Total cpu time (s,m,h): 73885.9 1231.43 20.524
< - Total wall clock time (s,m,h): 73885.9 1231.43 20.524
---
> - Total cpu time (s,m,h): -121662.4 -2027.71 -33.795
> - Total wall clock time (s,m,h): 93085.6 1551.43 25.857
(snip)
1168,1174c1170,1176
< - nonlop(apply) 51494.610 69.7 51494.717 69.7 22079
< - nonlop(forces) 10453.660 14.1 10453.713 14.1 3264
< - projbd 5534.380 7.5 5534.500 7.5 36206
< - fourwf(pot) 3041.498 4.1 3041.495 4.1 22079
< - nonlop(forstr) 1271.925 1.7 1271.934 1.7 272
< - vtowfk(ssdiag) 779.864 1.1 779.856 1.1 -1
< - 60 others 588.176 0.8 587.959 0.8
---
> - nonlop(forces) 11423.508 -9.4 11423.488 12.3 3264
> - projbd 5688.608 -4.7 5688.702 6.1 36206
> - fourwf(pot) 3073.283 -2.5 3073.392 3.3 22079
> - nonlop(forstr) 2151.584 -1.8 2151.586 2.3 272
> - vtowfk(ssdiag) 801.072 -0.7 801.067 0.9 -1
> - nonlop(apply) -146198.784 120.2 68549.316 73.6 22079
> - 60 others 649.352 -0.5 649.152 0.7
1176c1178
< - subtotal 73164.113 99.0 73164.174 99.0
---
> - subtotal -122411.377 100.6 92336.703 99.2
Please do not get stunned with the negative CPU time.
(Maybe some parameters get over the limitation of the range... ?)
We should see the wall time in this output.
"useylm 1" apparently needs more memory ...
< P This job should need less than 565.407 Mbytes of memory.
---
> P This job should need less than 633.511 Mbytes of memory.
(2) esting v5.2.3, v5.2.4, v5.3.0, v5.3.2 and v5.3.4...
I have noticed that the big CPU time difference between v.5.2.3 and v.5.2.4 !
(please let me correct my former comment, pardon !)
The summary is like this:
v.5.2.3:- Total wall clock time (s,m,h): 5824.1 97.07 1.618
v.5.2.4:- Total wall clock time (s,m,h): 12559.0 209.32 3.489
v.5.3.0:- Total wall clock time (s,m,h): 13056.1 217.60 3.627
v.5.3.2:- Total wall clock time (s,m,h): 74086.2 1234.77 20.580
v.5.3.3:- Total wall clock time (s,m,h): 74144.5 1235.74 20.596
v.5.3.4:- Total wall clock time (s,m,h): 74040.8 1234.01 20.567
(NB: To test this, I recompiled all the binaries on Itanium2/Linux (2.4.24)
with ifort 8.1 (l_fc_pc_8.1.019). All the tests run with "abinis".)
So the transition between v.5.2.3 and v.5.2.4 as well as
the another transition between v.5.3.0 and v.5.3.2 seems quite big !
Here is a more detailed memo;
< .Version 5.2.3 of ABINIT
---
> .Version 5.2.4 of ABINIT
(snip)
< Total energy(eV)= -8.52960977840337E+03 ; Band energy (Ha)= -2.0397416022E+01
---
> Total energy(eV)= -8.52960977840336E+03 ; Band energy (Ha) = -2.0397416022E+01
(snip)
1160,1161c1160,1161
< - Total cpu time (s,m,h): 5824.1 97.07 1.618
< - Total wall clock time (s,m,h): 5824.1 97.07 1.618
---
> - Total cpu time (s,m,h): 12559.0 209.32 3.489
> - Total wall clock time (s,m,h): 12559.0 209.32 3.489
1168,1177c1168,1175
< - fourwf(pot) 2606.561 44.8 2606.631 44.8 22079
< - projbd 1381.013 23.7 1381.014 23.7 36206
< - nonlop(apply) 799.226 13.7 799.310 13.7 22079
< - fourwf(den) 231.073 4.0 231.079 4.0 3264
< - vtowfk(ssdiag) 156.961 2.7 156.959 2.7 -1
< - nonlop(forces) 125.116 2.1 125.128 2.1 3264
< - forces 53.086 0.9 53.088 0.9 24
< - getghc-other 32.156 0.6 31.870 0.5 -1
< - fourdp 32.090 0.6 32.082 0.6 330
< - 57 others 80.349 1.4 80.341 1.4
---
> - projbd 5564.227 44.3 5564.303 44.3 36206
> - fourwf(pot) 3050.735 24.3 3050.726 24.3 22079
> - nonlop(apply) 2149.281 17.1 2149.324 17.1 22079
> - vtowfk(ssdiag) 336.827 2.7 336.827 2.7 -1
> - fourwf(den) 273.986 2.2 274.021 2.2 3264
> - nonlop(forces) 213.336 1.7 213.283 1.7 3264
> - getghc-other 65.512 0.5 65.414 0.5 -1
> - 59 others 191.804 1.5 191.829 1.5
1179c1177
< - subtotal 5497.630 94.4 5497.502 94.4
---
> - subtotal 11845.706 94.3 11845.727 94.3
1184,1185c1182,1183
< .Delivered 6 WARNINGs and 1 COMMENTs to log file.
< +Overall time at end (sec) : cpu= 5824.1 wall= 5824.1
---
> .Delivered 1 WARNINGs and 1 COMMENTs to log file.
> +Overall time at end (sec) : cpu= 12559.0 wall= 12559.0
In passing, ...
< .Version 5.2.4 of ABINIT
---
> .Version 5.3.0 of ABINIT
1160,1161c1162,1163
< - Total cpu time (s,m,h): 12559.0 209.32 3.489
< - Total wall clock time (s,m,h): 12559.0 209.32 3.489
---
> - Total cpu time (s,m,h): 13056.1 217.60 3.627
> - Total wall clock time (s,m,h): 13056.1 217.60 3.627
1168,1175c1170,1178
< - projbd 5564.227 44.3 5564.303 44.3 36206
< - fourwf(pot) 3050.735 24.3 3050.726 24.3 22079
< - nonlop(apply) 2149.281 17.1 2149.324 17.1 22079
< - vtowfk(ssdiag) 336.827 2.7 336.827 2.7 -1
< - fourwf(den) 273.986 2.2 274.021 2.2 3264
< - nonlop(forces) 213.336 1.7 213.283 1.7 3264
< - getghc-other 65.512 0.5 65.414 0.5 -1
< - 59 others 191.804 1.5 191.829 1.5
---
> - projbd 5577.244 42.7 5577.338 42.7 36206
> - fourwf(pot) 3043.856 23.3 3043.865 23.3 22079
> - nonlop(apply) 2145.970 16.4 2146.085 16.4 22079
> - vtowfk(ssdiag) 778.078 6.0 778.083 6.0 -1
> - fourwf(den) 273.408 2.1 273.436 2.1 3264
> - nonlop(forces) 213.192 1.6 213.174 1.6 3264
> - forces 95.838 0.7 95.837 0.7 24
> - getghc-other 66.057 0.5 65.945 0.5 -1
> - 58 others 145.294 1.1 145.273 1.1
1177c1180
< - subtotal 11845.706 94.3 11845.727 94.3
---
> - subtotal 12338.938 94.5 12339.036 94.5
1182,1183c1185,1186
< .Delivered 1 WARNINGs and 1 COMMENTs to log file.
< +Overall time at end (sec) : cpu= 12559.0 wall= 12559.0
---
> .Delivered 2 WARNINGs and 1 COMMENTs to log file.
> +Overall time at end (sec) : cpu= 13056.1 wall= 13056.1
Then,
< .Version 5.3.0 of ABINIT
---
> .Version 5.3.2 of ABINIT
(snip)
1162,1163c1162,1163
< - Total cpu time (s,m,h): 13056.1 217.60 3.627
< - Total wall clock time (s,m,h): 13056.1 217.60 3.627
---
> - Total cpu time (s,m,h): -140661.8 -2344.36 -39.073
> - Total wall clock time (s,m,h): 74086.2 1234.77 20.580
1170,1178c1170,1176
< - projbd 5577.244 42.7 5577.338 42.7 36206
< - fourwf(pot) 3043.856 23.3 3043.865 23.3 22079
< - nonlop(apply) 2145.970 16.4 2146.085 16.4 22079
< - vtowfk(ssdiag) 778.078 6.0 778.083 6.0 -1
< - fourwf(den) 273.408 2.1 273.436 2.1 3264
< - nonlop(forces) 213.192 1.6 213.174 1.6 3264
< - forces 95.838 0.7 95.837 0.7 24
< - getghc-other 66.057 0.5 65.945 0.5 -1
< - 58 others 145.294 1.1 145.273 1.1
---
> - nonlop(forces) 10460.239 -7.4 10460.209 14.1 3264
> - projbd 5551.615 -3.9 5551.733 7.5 36206
> - fourwf(pot) 3043.078 -2.2 3043.082 4.1 22079
> - nonlop(forstr) 1271.740 -0.9 1271.742 1.7 272
> - vtowfk(ssdiag) 775.022 -0.6 775.023 1.0 -1
> - nonlop(apply) -163092.779 115.9 51655.317 69.7 22079
> - 60 others 587.954 -0.4 587.956 0.8
1180c1178
< - subtotal 12338.938 94.5 12339.036 94.5
---
> - subtotal -141403.131 100.5 73345.062 99.0
And,
< .Version 5.3.2 of ABINIT
---
> .Version 5.3.3 of ABINIT
(snip)
1162,1163c1162,1163
< - Total cpu time (s,m,h): -140661.8 -2344.36 -39.073
< - Total wall clock time (s,m,h): 74086.2 1234.77 20.580
---
> - Total cpu time (s,m,h): 74144.5 1235.74 20.596
> - Total wall clock time (s,m,h): 74144.5 1235.74 20.596
1170,1176c1170,1176
< - nonlop(forces) 10460.239 -7.4 10460.209 14.1 3264
< - projbd 5551.615 -3.9 5551.733 7.5 36206
< - fourwf(pot) 3043.078 -2.2 3043.082 4.1 22079
< - nonlop(forstr) 1271.740 -0.9 1271.742 1.7 272
< - vtowfk(ssdiag) 775.022 -0.6 775.023 1.0 -1
< - nonlop(apply) -163092.779 115.9 51655.317 69.7 22079
< - 60 others 587.954 -0.4 587.956 0.8
---
> - nonlop(apply) 51699.264 69.7 51699.243 69.7 22079
> - nonlop(forces) 10459.982 14.1 10459.994 14.1 3264
> - projbd 5587.185 7.5 5587.384 7.5 36206
> - fourwf(pot) 3044.780 4.1 3044.884 4.1 22079
> - nonlop(forstr) 1271.329 1.7 1271.329 1.7 272
> - vtowfk(ssdiag) 775.432 1.0 775.425 1.0 -1
> - 60 others 586.740 0.8 586.618 0.8
1178c1178
< - subtotal -141403.131 100.5 73345.062 99.0
---
> - subtotal 73424.712 99.0 73424.877 99.0
1183,1184c1183,1184
< .Delivered 19 WARNINGs and 1 COMMENTs to log file.
< +Overall time at end (sec) : cpu= -140661.8 wall= 74086.2
---
> .Delivered 4 WARNINGs and 1 COMMENTs to log file.
> +Overall time at end (sec) : cpu= 74144.5 wall= 74144.5
And, finally,
< .Version 5.3.3 of ABINIT
---
> .Version 5.3.4 of ABINIT
(snip)
1162,1163c1162,1163
< - Total cpu time (s,m,h): 74144.5 1235.74 20.596
< - Total wall clock time (s,m,h): 74144.5 1235.74 20.596
---
> - Total cpu time (s,m,h): 74040.8 1234.01 20.567
> - Total wall clock time (s,m,h): 74040.8 1234.01 20.567
1170,1176c1170,1176
< - nonlop(apply) 51699.264 69.7 51699.243 69.7 22079
< - nonlop(forces) 10459.982 14.1 10459.994 14.1 3264
< - projbd 5587.185 7.5 5587.384 7.5 36206
< - fourwf(pot) 3044.780 4.1 3044.884 4.1 22079
< - nonlop(forstr) 1271.329 1.7 1271.329 1.7 272
< - vtowfk(ssdiag) 775.432 1.0 775.425 1.0 -1
< - 60 others 586.740 0.8 586.618 0.8
---
> - nonlop(apply) 51572.901 69.7 51573.041 69.7 22079
> - nonlop(forces) 10465.045 14.1 10465.066 14.1 3264
> - projbd 5597.726 7.6 5597.835 7.6 36206
> - fourwf(pot) 3042.212 4.1 3042.068 4.1 22079
> - nonlop(forstr) 1272.081 1.7 1272.077 1.7 272
> - vtowfk(ssdiag) 777.669 1.1 777.668 1.1 -1
> - 60 others 590.731 0.8 590.541 0.8
1178c1178
< - subtotal 73424.712 99.0 73424.877 99.0
---
> - subtotal 73318.364 99.0 73318.296 99.0
1183,1184c1183,1184
< .Delivered 4 WARNINGs and 1 COMMENTs to log file.
< +Overall time at end (sec) : cpu= 74144.5 wall= 74144.5
---
> .Delivered 5 WARNINGs and 1 COMMENTs to log file.
> +Overall time at end (sec) : cpu= 74040.8 wall= 74040.8
... I wish this benchmark could give some hints ....
Bien a vous,
Masayoshi





Archive powered by MHonArc 2.6.16.

Top of Page