Skip to Content.
Sympa Menu

forum - Re: [abinit-forum] parallelism over bands in ABINIT

forum@abinit.org

Subject: The ABINIT Users Mailing List ( CLOSED )

List archive

Re: [abinit-forum] parallelism over bands in ABINIT


Chronological Thread 
  • From: Francois Bottin <Francois.Bottin@cea.fr>
  • To: forum@abinit.org
  • Subject: Re: [abinit-forum] parallelism over bands in ABINIT
  • Date: Fri, 10 Nov 2006 18:25:19 +0100

Dear Dr. Dumont,

Thank you for your comments (in this forum as well as in the developper one). The band-FFT parallelization is a new feature of ABINIT (is appeared in the 5.2.2) and is not yet documented.
Sorry for this. We would like to add some precisions about your observations. We hope that these ones will be useful to all the community.

Guillaume Dumont wrote:

Dear Dr. Geneste,

I have also had this kind of problem with abinit-4.6.5 but I can't recall what the error message exactly was. But I've tested the new band and fft parallelism available in abinit 5.2.3 and it seems to work well. I you already have a `hostname`.ac file you only have to add the following lines to it:

enable_parallel="yes"
enable_mpi="yes"
with_mpi_cppflags="-DMPI_FFT"

Up to now, does not work with MPIO. So the wavefunctions are not correctly written and the restart is not allowed.

enable_smart_config="no"

Then in your abinit input file you have to use the following input variables:

fftalg 401

When fftalg=401, one uses the efficient 3dim-FFT of Goedecker et al. (Comput. Phys. Comm. *154*, 105 (2003)) with zero padding and cache optimization. The input variable fftalg=400 (without zero padding) works also.

wfoptalg 4
fft_opt_lob 2

The band-FFT parallelization only works if the LOBPCG blocked eigenvalue solver is used.

npband x
npfft y
iprcch 0

Iprcch have to be equal to 0 (this restriction will be removed in the next release). With respect to iprcch=2 (which is the defaut), two drawbacks:
i) At each step, the density residual is not taken into account to correct the forces.
ii) At the begining of each broyden, moldyn ... step, the expectation of the density is false.
These points have to keep in mind when molecular dynamics are performed since the forces are not necessary well converged and the number of moldyn steps are very large!


where x and y are integers that can take values between 1 and the number of processors you are using (8 in your case) with the following restrictions:

x*y = npband*npfft = number of cpus
npband has to be a multiple of nband (nband % npband = 0)
npfft has to be a multiple of BOTH ngfft(2) and ngftt(3) (npfft % ngfft(2,3) = 0)

In my own experience, I had some trouble if either npband or npfft is set to 1, so you can try to avoid these situations.

All our tests work with npband=npfft=1 and are equal to the sequential (abinis) result. Please, send me your input.


Hope this helps

In addition:
i)*THIS VERSION (Abinit5.2.3) DOES NOT WORK IN PAW. (expected for the next release).
*ii) *THIS VERSION (Abinit5.2.3) IS NOT TESTED FOR MAGNETISM.
*ii) *THIS VERSION (Abinit5.2.3) DOES NOT WORK FOR BAND ONLY PARALLELIZATION. (expected for the next release)**
*ii) During the implementation of the band-FFT parallelization some new restrictions have been found:

* 1) Even if useylm=0 is defaut, useylm=1 can be used in the
norm-conserving (NC) case and nonlop_ylm is used rather than
nonlop_pl. Put in your input file nloalg=4.
* 2) Mixing can be performed on density (iscf>10 defaut in PAW) as
well as on potential (iscf<10 defaut in NC). In this version the
mixing on density is not tested. Put in your input file iscf<10
(this drawback will be removed in the next release).
As concerns the scaling, we have found in Abinit 5.2.3 a _*linear scaling up to 100 processors*_ for the following system (Gold): natom=108, ecut=24 Ha, nband=648, ngfft=108 108 108
This result could be machine dependent whether the interconnections (tests on quadrics and infiniband gives same scaling), the processors, the cache (see the FFT)...

Some new features are implemented (next release) such as Scalapack, extensively use of LAPACK, BLAS... and a -O3 flag for optimization (did not work on all supercomputers).
In these conditions (see the attached figure):
i) _*a superlinear scaling is achieved up to 200 processors*_
ii) a scaling equals to 130 is obtained around 100 processors (cache effect).
iii) LOBPCG is linear up to 432 processors, whereas the FFT overall time is 1.5 times larger at 432 processors (coming from communications: around 50%).
iv) in our case the lobpcg routine represents 90% of the total time and around 10% for the FFT (explain our results).
v) the bandFFT parallelization is found to be faster than the band-only and FFT-only parallelizations. ACTION: the npband input variable have to be larger than the npfft one (4 or 6 times).

These results will be published as sooner as possible.

All the observations (scaling, bugs ...) are welcome. Feel free to send your question by reply or to post it on the forum (particular vs. general).
Best regards
Francois Bottin and Gilles Zerah

--
##############################################################
Francois Bottin tel: 01 69 26 41 73
CEA/DIF fax: 01 69 26 70 77
BP 12 Bruyeres-le-Chatel email: Francois.Bottin@cea.fr
##############################################################

Attachment: bandFFT.eps
Description: PostScript document




Archive powered by MHonArc 2.6.16.

Top of Page