forum@abinit.org
Subject: The ABINIT Users Mailing List ( CLOSED )
List archive
- From: BOTTIN Francois <francois.bottin@cea.fr>
- To: forum@abinit.org
- Subject: Re: [abinit-forum] Large-cell calculations
- Date: Mon, 23 Nov 2009 09:21:16 +0100
- Organization: CEA-DAM
Hi,
You want to launch Abinit on 1500 band/FFT-processors (and 2 kpt-processors).
You set npband=150, so I guess you use npfft=10 (what is the value of bandpp?).
Two reflexions about that:
1) Did you perform any test of scaling before lauching Abinit on 1500 band/FFT-cores?
I suspect your calculation is not sufficiently large to have benefit above 200 or 300 cores.
My advice is to test various distributions for 50, 100, 200 and 300 procs and see if you keep on scaling!
See our paper about that:
F. Bottin, S. Leroux, A. Knyazev, G. Zerah, Comput. Mat. Science 42, 329, (2008)
"Large scale ab initio calculations based on three levels of parallelization "
(available on arxiv)
2) I suspect the diagonalisation to be responsible for the crash. Indeed, the current
implementation of ScaLAPACK is not satisfactory and shows a very poor scaling
as a function of the number of band/FFT-processors. So on 1500 procs, I guess
this part digerges.
We are working to remove this problem so we are very interested in your report.
By performing the tests of scaling proposed above (using timopt -3) you could see
(i) if your calculation ends normally for a small number of band/FFT-processors
(ii) if this part causes the trouble for a large number.
Good continuation,
Regards,
Francois
PS: Yann, do you launch Abinit on 1000 band/FFT-cores? Or more?
Latévi Max LAWSON DAKU a écrit :
Dear Abinit users,
I'm writing to ask for some advices for performing ground-state
calculations on a large-cell system, using NCs PSPs. Here are a
few characteristics of the system.
- the primitive cubic cell contains 324 atoms, generated using
symmetry operations from 29 atoms.
- the systems is spin-polarised: (nband, nsppol)=(600, 2)
I would like to use the band/FFT/k-point parallelisation, and several
distributions have been proposed for the number of processors.
The distributions retained read (npkpt, npband, npfft)=(2, 150, x),
with x a positive integer.
But the calculations tend to be extremely slow even when using
3000 cpus. In this last case, the code actually segfaults after having
output:
ITER STEP NUMBER 1
vtorho : nnsclo_now= 2, note that nnsclo,dbl_nnsclo,istep= 0 0 1
**** In vtorho for isppol= 1
starting lobpcg, with nblockbd,mpi_enreg%nproc_band 1 150
condition number of the Gram matrix= 59476636.484308563
Lobpcgccwf: restart performed
The compilation perhaps went wrong... The machine on which I would
like to perform the calculations is a Cray XT5. I've compiled ABINIT/5.8.4p
with MPI and ScaLAPACK enabled, using the linalg library shipped
with ABINIT. I'm going to try to recompile it using an other linalg library.
I thank you in advance for your helps and/or advices
Best regards,
Max
--
##############################################################
Francois Bottin tel: 01 69 26 41 73
CEA/DIF fax: 01 69 26 70 77
BP 12 Bruyeres-le-Chatel email: Francois.Bottin@cea.fr
##############################################################
- [abinit-forum] Large-cell calculations, Latévi Max LAWSON DAKU, 11/21/2009
- Re: [abinit-forum] Large-cell calculations, Yann Pouillon, 11/21/2009
- Re: [abinit-forum] Large-cell calculations, Latévi Max LAWSON DAKU, 11/22/2009
- Re: [abinit-forum] Large-cell calculations, 6671011, 11/22/2009
- Re: [abinit-forum] Large-cell calculations, Latévi Max LAWSON DAKU, 11/22/2009
- Re: [abinit-forum] Large-cell calculations, BOTTIN Francois, 11/23/2009
- Re: [abinit-forum] Large-cell calculations, Yann Pouillon, 11/25/2009
- Re: [abinit-forum] Large-cell calculations, BOTTIN Francois, 11/25/2009
- Re: [abinit-forum] Large-cell calculations, Yann Pouillon, 11/25/2009
- Re: [abinit-forum] Large-cell calculations, BOTTIN Francois, 11/25/2009
- Re: [abinit-forum] Large-cell calculations, Yann Pouillon, 11/25/2009
- Re: [abinit-forum] Large-cell calculations, Yann Pouillon, 11/21/2009
Archive powered by MHonArc 2.6.16.