Skip to Content.
Sympa Menu

forum - Re: [abinit-forum] v.5.3.x is slower than v.5.2.x ?!

forum@abinit.org

Subject: The ABINIT Users Mailing List ( CLOSED )

List archive

Re: [abinit-forum] v.5.3.x is slower than v.5.2.x ?!


Chronological Thread 
  • From: Masayoshi Mikami <mmikami@rc.m-kagaku.co.jp>
  • To: forum@abinit.org
  • Subject: Re: [abinit-forum] v.5.3.x is slower than v.5.2.x ?!
  • Date: Mon, 7 May 2007 15:59:08 +0900

Dear all,

I was away from office, so my reply was later than expected...
I am going to report the two things:
- put useylm=1 in your input and see if Ylm version of nonlop has the same behaviour as Legendre polynomials one.
- try to follow the changes by testing v5.2.3, v5.2.4, v5.3.0, v5.3.2 and v5.3.4...

(1) with/without "useylm 1" for v.5.3.4
No effective... (even slower with useylm 1). diff "without" and "with" :
< Total energy(eV)= -8.52960977840336E+03 ; Band energy (Ha)= -2.0397416022E+01
---
> Total energy(eV)= -8.52960977840312E+03 ; Band energy (Ha)= -2.0397416022E+01
(snip)
< - Total cpu time (s,m,h): 73885.9 1231.43 20.524
< - Total wall clock time (s,m,h): 73885.9 1231.43 20.524
---
> - Total cpu time (s,m,h): -121662.4 -2027.71 -33.795
> - Total wall clock time (s,m,h): 93085.6 1551.43 25.857
(snip)
1168,1174c1170,1176
< - nonlop(apply) 51494.610 69.7 51494.717 69.7 22079
< - nonlop(forces) 10453.660 14.1 10453.713 14.1 3264
< - projbd 5534.380 7.5 5534.500 7.5 36206
< - fourwf(pot) 3041.498 4.1 3041.495 4.1 22079
< - nonlop(forstr) 1271.925 1.7 1271.934 1.7 272
< - vtowfk(ssdiag) 779.864 1.1 779.856 1.1 -1
< - 60 others 588.176 0.8 587.959 0.8
---
> - nonlop(forces) 11423.508 -9.4 11423.488 12.3 3264
> - projbd 5688.608 -4.7 5688.702 6.1 36206
> - fourwf(pot) 3073.283 -2.5 3073.392 3.3 22079
> - nonlop(forstr) 2151.584 -1.8 2151.586 2.3 272
> - vtowfk(ssdiag) 801.072 -0.7 801.067 0.9 -1
> - nonlop(apply) -146198.784 120.2 68549.316 73.6 22079
> - 60 others 649.352 -0.5 649.152 0.7
1176c1178
< - subtotal 73164.113 99.0 73164.174 99.0
---
> - subtotal -122411.377 100.6 92336.703 99.2

Please do not get stunned with the negative CPU time.
(Maybe some parameters get over the limitation of the range... ?)
We should see the wall time in this output.

"useylm 1" apparently needs more memory ...
< P This job should need less than 565.407 Mbytes of memory.
---
> P This job should need less than 633.511 Mbytes of memory.

(2) esting v5.2.3, v5.2.4, v5.3.0, v5.3.2 and v5.3.4...
I have noticed that the big CPU time difference between v.5.2.3 and v. 5.2.4 !
(please let me correct my former comment, pardon !)

The summary is like this:
v.5.2.3:- Total wall clock time (s,m,h): 5824.1 97.07 1.618
v.5.2.4:- Total wall clock time (s,m,h): 12559.0 209.32 3.489
v.5.3.0:- Total wall clock time (s,m,h): 13056.1 217.60 3.627
v.5.3.2:- Total wall clock time (s,m,h): 74086.2 1234.77 20.580
v.5.3.3:- Total wall clock time (s,m,h): 74144.5 1235.74 20.596
v.5.3.4:- Total wall clock time (s,m,h): 74040.8 1234.01 20.567

(NB: To test this, I recompiled all the binaries on Itanium2/Linux (2.4.24)
with ifort 8.1 (l_fc_pc_8.1.019). All the tests run with "abinis".)

So the transition between v.5.2.3 and v.5.2.4 as well as
the another transition between v.5.3.0 and v.5.3.2 seems quite big !

Here is a more detailed memo;
< .Version 5.2.3 of ABINIT
---
> .Version 5.2.4 of ABINIT
(snip)
< Total energy(eV)= -8.52960977840337E+03 ; Band energy (Ha)= -2.0397416022E+01
---
> Total energy(eV)= -8.52960977840336E+03 ; Band energy (Ha)= -2.0397416022E+01
(snip)
1160,1161c1160,1161
< - Total cpu time (s,m,h): 5824.1 97.07 1.618
< - Total wall clock time (s,m,h): 5824.1 97.07 1.618
---
> - Total cpu time (s,m,h): 12559.0 209.32 3.489
> - Total wall clock time (s,m,h): 12559.0 209.32 3.489
1168,1177c1168,1175
< - fourwf(pot) 2606.561 44.8 2606.631 44.8 22079
< - projbd 1381.013 23.7 1381.014 23.7 36206
< - nonlop(apply) 799.226 13.7 799.310 13.7 22079
< - fourwf(den) 231.073 4.0 231.079 4.0 3264
< - vtowfk(ssdiag) 156.961 2.7 156.959 2.7 -1
< - nonlop(forces) 125.116 2.1 125.128 2.1 3264
< - forces 53.086 0.9 53.088 0.9 24
< - getghc-other 32.156 0.6 31.870 0.5 -1
< - fourdp 32.090 0.6 32.082 0.6 330
< - 57 others 80.349 1.4 80.341 1.4
---
> - projbd 5564.227 44.3 5564.303 44.3 36206
> - fourwf(pot) 3050.735 24.3 3050.726 24.3 22079
> - nonlop(apply) 2149.281 17.1 2149.324 17.1 22079
> - vtowfk(ssdiag) 336.827 2.7 336.827 2.7 -1
> - fourwf(den) 273.986 2.2 274.021 2.2 3264
> - nonlop(forces) 213.336 1.7 213.283 1.7 3264
> - getghc-other 65.512 0.5 65.414 0.5 -1
> - 59 others 191.804 1.5 191.829 1.5
1179c1177
< - subtotal 5497.630 94.4 5497.502 94.4
---
> - subtotal 11845.706 94.3 11845.727 94.3
1184,1185c1182,1183
< .Delivered 6 WARNINGs and 1 COMMENTs to log file.
< +Overall time at end (sec) : cpu= 5824.1 wall= 5824.1
---
> .Delivered 1 WARNINGs and 1 COMMENTs to log file.
> +Overall time at end (sec) : cpu= 12559.0 wall= 12559.0


In passing, ...
< .Version 5.2.4 of ABINIT
---
> .Version 5.3.0 of ABINIT
1160,1161c1162,1163
< - Total cpu time (s,m,h): 12559.0 209.32 3.489
< - Total wall clock time (s,m,h): 12559.0 209.32 3.489
---
> - Total cpu time (s,m,h): 13056.1 217.60 3.627
> - Total wall clock time (s,m,h): 13056.1 217.60 3.627
1168,1175c1170,1178
< - projbd 5564.227 44.3 5564.303 44.3 36206
< - fourwf(pot) 3050.735 24.3 3050.726 24.3 22079
< - nonlop(apply) 2149.281 17.1 2149.324 17.1 22079
< - vtowfk(ssdiag) 336.827 2.7 336.827 2.7 -1
< - fourwf(den) 273.986 2.2 274.021 2.2 3264
< - nonlop(forces) 213.336 1.7 213.283 1.7 3264
< - getghc-other 65.512 0.5 65.414 0.5 -1
< - 59 others 191.804 1.5 191.829 1.5
---
> - projbd 5577.244 42.7 5577.338 42.7 36206
> - fourwf(pot) 3043.856 23.3 3043.865 23.3 22079
> - nonlop(apply) 2145.970 16.4 2146.085 16.4 22079
> - vtowfk(ssdiag) 778.078 6.0 778.083 6.0 -1
> - fourwf(den) 273.408 2.1 273.436 2.1 3264
> - nonlop(forces) 213.192 1.6 213.174 1.6 3264
> - forces 95.838 0.7 95.837 0.7 24
> - getghc-other 66.057 0.5 65.945 0.5 -1
> - 58 others 145.294 1.1 145.273 1.1
1177c1180
< - subtotal 11845.706 94.3 11845.727 94.3
---
> - subtotal 12338.938 94.5 12339.036 94.5
1182,1183c1185,1186
< .Delivered 1 WARNINGs and 1 COMMENTs to log file.
< +Overall time at end (sec) : cpu= 12559.0 wall= 12559.0
---
> .Delivered 2 WARNINGs and 1 COMMENTs to log file.
> +Overall time at end (sec) : cpu= 13056.1 wall= 13056.1

Then,
< .Version 5.3.0 of ABINIT
---
> .Version 5.3.2 of ABINIT
(snip)
1162,1163c1162,1163
< - Total cpu time (s,m,h): 13056.1 217.60 3.627
< - Total wall clock time (s,m,h): 13056.1 217.60 3.627
---
> - Total cpu time (s,m,h): -140661.8 -2344.36 -39.073
> - Total wall clock time (s,m,h): 74086.2 1234.77 20.580
1170,1178c1170,1176
< - projbd 5577.244 42.7 5577.338 42.7 36206
< - fourwf(pot) 3043.856 23.3 3043.865 23.3 22079
< - nonlop(apply) 2145.970 16.4 2146.085 16.4 22079
< - vtowfk(ssdiag) 778.078 6.0 778.083 6.0 -1
< - fourwf(den) 273.408 2.1 273.436 2.1 3264
< - nonlop(forces) 213.192 1.6 213.174 1.6 3264
< - forces 95.838 0.7 95.837 0.7 24
< - getghc-other 66.057 0.5 65.945 0.5 -1
< - 58 others 145.294 1.1 145.273 1.1
---
> - nonlop(forces) 10460.239 -7.4 10460.209 14.1 3264
> - projbd 5551.615 -3.9 5551.733 7.5 36206
> - fourwf(pot) 3043.078 -2.2 3043.082 4.1 22079
> - nonlop(forstr) 1271.740 -0.9 1271.742 1.7 272
> - vtowfk(ssdiag) 775.022 -0.6 775.023 1.0 -1
> - nonlop(apply) -163092.779 115.9 51655.317 69.7 22079
> - 60 others 587.954 -0.4 587.956 0.8
1180c1178
< - subtotal 12338.938 94.5 12339.036 94.5
---
> - subtotal -141403.131 100.5 73345.062 99.0

And,
< .Version 5.3.2 of ABINIT
---
> .Version 5.3.3 of ABINIT
(snip)
1162,1163c1162,1163
< - Total cpu time (s,m,h): -140661.8 -2344.36 -39.073
< - Total wall clock time (s,m,h): 74086.2 1234.77 20.580
---
> - Total cpu time (s,m,h): 74144.5 1235.74 20.596
> - Total wall clock time (s,m,h): 74144.5 1235.74 20.596

1170,1176c1170,1176
< - nonlop(forces) 10460.239 -7.4 10460.209 14.1 3264
< - projbd 5551.615 -3.9 5551.733 7.5 36206
< - fourwf(pot) 3043.078 -2.2 3043.082 4.1 22079
< - nonlop(forstr) 1271.740 -0.9 1271.742 1.7 272
< - vtowfk(ssdiag) 775.022 -0.6 775.023 1.0 -1
< - nonlop(apply) -163092.779 115.9 51655.317 69.7 22079
< - 60 others 587.954 -0.4 587.956 0.8
---
> - nonlop(apply) 51699.264 69.7 51699.243 69.7 22079
> - nonlop(forces) 10459.982 14.1 10459.994 14.1 3264
> - projbd 5587.185 7.5 5587.384 7.5 36206
> - fourwf(pot) 3044.780 4.1 3044.884 4.1 22079
> - nonlop(forstr) 1271.329 1.7 1271.329 1.7 272
> - vtowfk(ssdiag) 775.432 1.0 775.425 1.0 -1
> - 60 others 586.740 0.8 586.618 0.8
1178c1178
< - subtotal -141403.131 100.5 73345.062 99.0
---
> - subtotal 73424.712 99.0 73424.877 99.0
1183,1184c1183,1184
< .Delivered 19 WARNINGs and 1 COMMENTs to log file.
< +Overall time at end (sec) : cpu= -140661.8 wall= 74086.2
---
> .Delivered 4 WARNINGs and 1 COMMENTs to log file.
> +Overall time at end (sec) : cpu= 74144.5 wall= 74144.5

And, finally,
< .Version 5.3.3 of ABINIT
---
> .Version 5.3.4 of ABINIT
(snip)
1162,1163c1162,1163
< - Total cpu time (s,m,h): 74144.5 1235.74 20.596
< - Total wall clock time (s,m,h): 74144.5 1235.74 20.596
---
> - Total cpu time (s,m,h): 74040.8 1234.01 20.567
> - Total wall clock time (s,m,h): 74040.8 1234.01 20.567
1170,1176c1170,1176
< - nonlop(apply) 51699.264 69.7 51699.243 69.7 22079
< - nonlop(forces) 10459.982 14.1 10459.994 14.1 3264
< - projbd 5587.185 7.5 5587.384 7.5 36206
< - fourwf(pot) 3044.780 4.1 3044.884 4.1 22079
< - nonlop(forstr) 1271.329 1.7 1271.329 1.7 272
< - vtowfk(ssdiag) 775.432 1.0 775.425 1.0 -1
< - 60 others 586.740 0.8 586.618 0.8
---
> - nonlop(apply) 51572.901 69.7 51573.041 69.7 22079
> - nonlop(forces) 10465.045 14.1 10465.066 14.1 3264
> - projbd 5597.726 7.6 5597.835 7.6 36206
> - fourwf(pot) 3042.212 4.1 3042.068 4.1 22079
> - nonlop(forstr) 1272.081 1.7 1272.077 1.7 272
> - vtowfk(ssdiag) 777.669 1.1 777.668 1.1 -1
> - 60 others 590.731 0.8 590.541 0.8
1178c1178
< - subtotal 73424.712 99.0 73424.877 99.0
---
> - subtotal 73318.364 99.0 73318.296 99.0
1183,1184c1183,1184
< .Delivered 4 WARNINGs and 1 COMMENTs to log file.
< +Overall time at end (sec) : cpu= 74144.5 wall= 74144.5
---
> .Delivered 5 WARNINGs and 1 COMMENTs to log file.
> +Overall time at end (sec) : cpu= 74040.8 wall= 74040.8

... I wish this benchmark could give some hints ....

Bien a vous,
Masayoshi

On 2007/04/27, at 10:43, Masayoshi Mikami wrote:

Dear Marc and Matthieu,

Merci bien de vos commentaires !
Matthieu, sure, I did not change the compiler etc
between the v.5.3.x and v.5.2.x.
(I did not see big difference in the config.log (and no -O0 ;-)),
which may be understandable the speed of "fourwf(pot)"
did not changed so much) , whereas I saw the speed of
the nonlop is so slow with v.5.3.4.

Please let me have time to run the jobs suggested by Marc.

I should have test cases that I could disclose here...
(please let me think ...) Still, I could give some hints:
My model contains relatively large number of atoms
(over 20) and relatively large nbands (over 70)
due to pseudopotentials with "shallow core (s & p)".

Bien a vous, et bon weekend,
Masayoshi

On 2007/04/26, at 19:23, Marc Torrent wrote:

Hi Masayoshi and Matthieu,

You called for a PAW developper; here am I...

I did a diff on abinit-5.2.4/src/03nonlocal and abinit-5.3.4/src/ 13nonlocal directories and didn't see anything changed for nonlop routine (and children), when it uses Legendre polynomials (which the case for Masayoshi tests because he uses norm-conserving pseudopotentials).
Of course spherical harmonics version of nonlop has changed but it is not called AT ALL in that case; and, it has been improved for speed (at least on our machines).
The only interaction between nonlop_pl and nonlop_ylm is the driver "nonlop" which has, in v5.3.x, new arguments (like <p_i| cnk> projected scalars) but the later are not transmitted to nonlop_pl. Could it be possible that a memory problem occurs ? I don't think so because these <p_i|cnk> are zero-dimensioned in norm-conserving case.

Masayoshi, could you perform following tests ?:
- put useylm=1 in your input and see if Ylm version of nonlop has the same behaviour as Legendre polynomials one.
- try to follow the changes by testing v5.2.3, v5.2.4, v5.3.0, v5.3.2 and v5.3.4...

This could help us to track the problem...


Cheers,
Marc


Matthieu Verstraete a ñÄrit :
This is horrendous! An order of magnitude slower!!!?
I trust mikami-san that your compilation options were the same, and linking to the same libs, same compiler version etc??? Has anyone else reproduced this on other machines?
The main changes to nonlop were probably paw related. Are you using this, and do the PAW developpers have any comment (maybe they haven't touched it at all)?
Matthieu









Archive powered by MHonArc 2.6.16.

Top of Page