forum@abinit.org
Subject: The ABINIT Users Mailing List ( CLOSED )
List archive
- From: "Anglade Pierre-Matthieu" <anglade@gmail.com>
- To: forum@abinit.org
- Subject: Re: [abinit-forum] parallelism over bands in ABINIT
- Date: Tue, 28 Nov 2006 08:44:15 +0100
- Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=hhclOMGqZFEXTbfVP2E1aciiQsMEz6QtkkSS3e5Z0toD+gRUTgzvJcmQ7Fz4ctFw513pPFD36U2GkfpQniIx64hpJcV4SyG9uEp0y6Tsi1bfQJeQYJoxYJ+UtfnYtJRK+TVBbC4F/WClkl5EYZF4rYc+ChYjl+U1o2/pfifOfCY=
PS: one of the simplest way to discover possible memory leaks is to make a run with a binary compiled with g95. At the end of the run it will report the names of the routines where memory was not deallocated.
On 11/28/06, Anglade Pierre-Matthieu <anglade@gmail.com> wrote:
>Is there a reason why the code as such a great memory need? Why did the code run for 2 scf
>cycles and than crashed?
At early development stage of band parallelism there were a lot of memory leaks in lobpcgxx. It may be possible that some of them remains. Have you check for this ?
A very simple way in f90 to get ride of memory leaks when the memory scheme is complex is to add at the end of the routines some statements like
if(allocated(XX)) deallocate(XX)
regards
PMA--
On 11/27/06, Guillaume Dumont <dumont.guillaume@gmail.com > wrote:Oops I forgot the attachments...On 11/27/06, Guillaume Dumont < dumont.guillaume@gmail.com> wrote:Dear Dr Bottin,
I tried to reproduce your superlinear scaling up to 144 cpus. Here are the results. The scaling is superlinear up to 54 cpus for your gold case. However, keeping the number of processors constant, some sets of npband and npfft do not give the superlinear behavior (see graph speedup.eps.)
For the superlinear regime most of the time is spent in the lobpcgxx routine, but as the number of processors increase more and more time is spent in gstate->kpgsph.
I also noticed that the memory requirement is proportional to the number of processors ( memory.eps). This is causing problems with cases where you need more than the memory accessible to a single processor. For example, I tried to run a total energy calculation on a 216 atoms GaAsN supercell with nband 480 and ngfft 180 180 180. I was able to run it on 32 processors and it did 2 scf cycles and then crashed with an error message indicating that the memory need exceeded the available memory.
Is there a reason why the code as such a great memory need? Why did the code run for 2 scf cycles and than crashed? Shouldn't it allocate all the memory before doing the calculation? (Memory leeks?)
This calculation needs a little more than 4 GB on a single processor run.
To answer your other questions:
In the cases of both Au and GaAsN systems? For gold, the code is two
times faster (if I remember correctly) with the -O3 flag compilation.
I did not test the gold on case with the -O2 flag, but I'll let you know when I do it.
Does the lobpcg part in these two systems weight equally? In Au, the
lobpcg part corresponds approximatively to the total time. Its perfect
scaling gives the supelinear behaviour of ABINIT.Does your FFT part (fourwf) strongly increase (more than 2 times)
between 1 and 32 processors? And what is its weight? Even if this FFT is
strongly optimized, the scaling does not remain linear.
Unfortunately some of the calculations where done with timopt 2 instead of -1 or -2 so I cannot answer this question yet.
Regards,
--
Guillaume Dumont
=========================
guillaume.dumont.1@umontreal.ca
dumont.guillaume@gmail.com
(514) 341 5298
(514) 343 6111 ext. 13279
--
Guillaume Dumont
=========================
guillaume.dumont.1@umontreal.ca
dumont.guillaume@gmail.com
(514) 341 5298
(514) 343 6111 ext. 13279
Pierre-Matthieu Anglade
--
Pierre-Matthieu Anglade
- Re: [abinit-forum] parallelism over bands in ABINIT, (continued)
- Re: [abinit-forum] parallelism over bands in ABINIT, Yann Pouillon, 11/02/2006
- Re: [abinit-forum] parallelism over bands in ABINIT, Francois Bottin, 11/10/2006
- Re: [abinit-forum] parallelism over bands in ABINIT, Guillaume Dumont, 11/10/2006
- Re: [abinit-forum] parallelism over bands in ABINIT, Francois Bottin, 11/13/2006
- Re: [abinit-forum] parallelism over bands in ABINIT, Guillaume Dumont, 11/13/2006
- {Filename?} Re: [abinit-forum] parallelism over bands in ABINIT, Guillaume Dumont, 11/23/2006
- Re: [abinit-forum] parallelism over bands in ABINIT, Francois Bottin, 11/24/2006
- Re: [abinit-forum] parallelism over bands in ABINIT, Guillaume Dumont, 11/27/2006
- Re: [abinit-forum] parallelism over bands in ABINIT, Guillaume Dumont, 11/27/2006
- Re: [abinit-forum] parallelism over bands in ABINIT, Anglade Pierre-Matthieu, 11/28/2006
- Re: [abinit-forum] parallelism over bands in ABINIT, Anglade Pierre-Matthieu, 11/28/2006
- Re: [abinit-forum] parallelism over bands in ABINIT, Guillaume Dumont, 11/28/2006
- Re: [abinit-forum] parallelism over bands in ABINIT, Anglade Pierre-Matthieu, 11/28/2006
- Re: [abinit-forum] parallelism over bands in ABINIT, Guillaume Dumont, 11/30/2006
- Re: [abinit-forum] parallelism over bands in ABINIT, Francois Bottin, 11/30/2006
- Re: [abinit-forum] parallelism over bands in ABINIT, Guillaume Dumont, 11/30/2006
- Re: [abinit-forum] parallelism over bands in ABINIT, Francois Bottin, 11/30/2006
- {Filename?} Re: [abinit-forum] parallelism over bands in ABINIT, Guillaume Dumont, 11/23/2006
- Re: [abinit-forum] parallelism over bands in ABINIT, Guillaume Dumont, 11/13/2006
- Re: [abinit-forum] parallelism over bands in ABINIT, Francois Bottin, 11/13/2006
- Re: [abinit-forum] parallelism over bands in ABINIT, Guillaume Dumont, 11/10/2006
- Re: [abinit-forum] parallelism over bands in ABINIT, Francois Bottin, 11/30/2006
Archive powered by MHonArc 2.6.16.