Skip to Content.
Sympa Menu

forum - Re: [abinit-forum] parallelism over bands in ABINIT

forum@abinit.org

Subject: The ABINIT Users Mailing List ( CLOSED )

List archive

Re: [abinit-forum] parallelism over bands in ABINIT


Chronological Thread 
  • From: "Guillaume Dumont" <dumont.guillaume@gmail.com>
  • To: forum@abinit.org
  • Subject: Re: [abinit-forum] parallelism over bands in ABINIT
  • Date: Fri, 10 Nov 2006 15:24:48 -0500
  • Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=MDagW+vrNEuBVHpWljdQl/H3pz24qV4HSVdm6+acQFwW1exLDpaItOR9mJQv79VXHWkU6S0yEf7tKVOGKic2swA8Iou4Q2k25OC8PD+vGa1DGiTajDPwkVy3s6vU5DvsJQLz93GjJBlKdBbXQEEPdFUeMKbeqN0do66WWwKn+uo=

Dear Dr. Zerah and Bottin,

First of all, thanks for this detailed reply. As I mentionned it in my second post (the one on the developpper's mailling list) the code works well, for npband=npfft=1, but behaves abnormally when EITHER npband OR npfft is set to one and the other variable is set to the actual number of processors. I have attached a case in which an MPI error occurs (ncpus=4,npband=4,npfft=1).

Your scaling results are very impressive! I would like to see one or two output files including detailed time analysis (timopt -1) to compare with my own runs and see where the main timming differences are. Could you also provide me your input files for your gold case so that I can test it on my machine and see if I get similar performance.

I have checked in my output files and it seems that lobpcg represents around 90% of the total time.

Regards,








On 11/10/06, Francois Bottin <Francois.Bottin@cea.fr> wrote:
Dear Dr. Dumont,

Thank you for your comments (in this forum as well as in the developper
one). The band-FFT parallelization is a new feature of ABINIT (is
appeared in the 5.2.2) and is not yet documented.
Sorry for this. We would like to add some precisions about your
observations. We hope that these ones will be useful to all the community.

Guillaume Dumont wrote:

> Dear Dr. Geneste,
>
> I have also had this kind of problem with abinit-4.6.5 but I can't
> recall what the error message exactly was. But I've tested the new
> band and fft parallelism available in abinit 5.2.3 and it seems to
> work well. I you already have a `hostname`.ac file you only have to
> add the following lines to it:
>
> enable_parallel="yes"
> enable_mpi="yes"
> with_mpi_cppflags="-DMPI_FFT"

Up to now, does not work with MPIO. So the wavefunctions are not
correctly written and the restart is not allowed.

> enable_smart_config="no"
>
> Then in your abinit input file you have to use the following input
> variables:
>
> fftalg 401

When fftalg=401, one uses the efficient 3dim-FFT of Goedecker et al.
(Comput. Phys. Comm. *154*, 105 (2003)) with zero padding and cache
optimization. The input variable fftalg=400 (without zero padding) works
also.

> wfoptalg 4
> fft_opt_lob 2

The band-FFT parallelization only works if the LOBPCG blocked eigenvalue
solver is used.

> npband x
> npfft y
> iprcch 0

Iprcch have to be equal to 0 (this restriction will be removed in the
next release). With respect to iprcch=2 (which is the defaut), two
drawbacks:
i) At each step, the density residual is not taken into account to
correct the forces.
ii) At the begining of each broyden, moldyn ... step, the expectation of
the density is false.
These points have to keep in mind when molecular dynamics are performed
since the forces are not necessary well converged and the number of
moldyn steps are very large!

>
> where x and y are integers that can take values between 1 and the
> number of processors you are using (8 in your case) with the following
> restrictions:
>
> x*y = npband*npfft = number of cpus
> npband has to be a multiple of nband (nband % npband = 0)
> npfft has to be a multiple of BOTH ngfft(2) and ngftt(3) (npfft %
> ngfft(2,3) = 0)
>
> In my own experience, I had some trouble if either npband or npfft is
> set to 1, so you can try to avoid these situations.

All our tests work with npband=npfft=1 and are equal to the sequential
(abinis) result. Please, send me your input.

>
> Hope this helps

In addition:
i)*THIS VERSION (Abinit5.2.3) DOES NOT WORK IN PAW. (expected for the
next release).
*ii) *THIS VERSION (Abinit5.2.3) IS NOT TESTED FOR MAGNETISM.
*ii) *THIS VERSION (Abinit5.2.3) DOES NOT WORK FOR BAND ONLY
PARALLELIZATION. (expected for the next release)**
*ii) During the implementation of the band-FFT parallelization some new
restrictions have been found:

    * 1) Even if useylm=0 is defaut, useylm=1 can be used in the
      norm-conserving (NC) case and nonlop_ylm is used rather than
      nonlop_pl. Put in your input file nloalg=4.
    * 2) Mixing can be performed on density (iscf>10 defaut in PAW) as
      well as on potential (iscf<10 defaut in NC). In this version the
      mixing on density is not tested. Put in your input file iscf<10
      (this drawback will be removed in the next release).

As concerns the scaling, we have found in Abinit 5.2.3 a _*linear
scaling up to 100 processors*_ for the following system (Gold):
natom=108, ecut=24 Ha, nband=648, ngfft=108 108 108
This result could be machine dependent whether the interconnections
(tests on quadrics and infiniband gives same scaling), the processors,
the cache (see the FFT)...

Some new features are implemented (next release) such as Scalapack,
extensively use of LAPACK, BLAS... and a -O3 flag for optimization (did
not work on all supercomputers).
In these conditions (see the attached figure):
i) _*a superlinear scaling is achieved up to 200 processors*_
ii) a scaling equals to 130 is obtained around 100 processors (cache
effect).
iii) LOBPCG is linear up to 432 processors, whereas the FFT overall time
is 1.5 times larger at 432 processors (coming from communications:
around 50%).
iv) in our case the lobpcg routine represents 90% of the total time and
around 10% for the FFT (explain our results).
v) the bandFFT parallelization is found to be faster than the band-only
and FFT-only parallelizations. ACTION: the npband input variable have to
be larger than the npfft one (4 or 6 times).

These results will be published as sooner as possible.

All the observations (scaling, bugs ...) are welcome. Feel free to send
your question by reply or to post it on the forum (particular vs. general).
Best regards
Francois Bottin and Gilles Zerah

--
##############################################################
Francois Bottin                    tel: 01 69 26 41 73
CEA/DIF                            fax: 01 69 26 70 77
BP 12 Bruyeres-le-Chatel         email: Francois.Bottin@cea.fr
##############################################################






--
Guillaume Dumont
=========================
guillaume.dumont.1@umontreal.ca
dumont.guillaume@gmail.com
(514) 341 5298
(514) 343 6111 ext. 13279

Attachment: fft-paral.tar.gz
Description: GNU Zip compressed data




Archive powered by MHonArc 2.6.16.

Top of Page