forum@abinit.org

Subject: The ABINIT Users Mailing List ( CLOSED )

List archive

Re: [abinit-forum] question about parallelization in outkss.F90

From: Deyu Lu <dylu@ucdavis.edu>
To: forum@abinit.org
Subject: Re: [abinit-forum] question about parallelization in outkss.F90
Date: Fri, 22 Sep 2006 13:03:33 -0700

Dear all:
Come back to the question I posted last week regarding the
parallelization of the code outkss.F90. Here I summarize what I learned.
As I'm not very familiar with MPI programing, please correct me if I
misunderstood anything.
As far as the code itself is concerned (version 5.1 or 5.2), I
don't think it runs parallel, where two major issues are associated.
Abinit assigns k points among different nodes. Let us say there are 4 k
points and 2 nodes. Then k points are distributed as: node 0 (k1, k2),
and node 1 (k3,k4). During each iteration in k points, the
pseudopotentials, eigenvalues and wavefunctions are computed and stored
in the .KSS file. The output is carried out by node 0. Each time, the
array variables such as eigenvalues and wavefunctions are allocated for
the particular k point only in the loop.
Issue 1: At the end of each iteration, "MPI_BARRIER
spaceComm,ierr)" is called. Therefore, nodes that should go directly to
the next iteration will wait. For example at k=1 or 2, node 1 will be
idling while node 0 is running. At k=3 or 4, vice versa.
Issue 2: Even though issue 1 is solved, the parallelization still
has trouble. Under the current k point distribution scheme (k1(0), k2
(0), k3(1), k4(1)), k4 can not start before both k1 and k2 are finished
because memory of node 1 is used to store info of k3 which can not be
released after info of k1 and k2 all being written.
A better scheme for k-point distribution could be (k1(0), k2(1), k3
(0), k4(1)). Indeed there is a array "mpi_enreg%proc_distrb" is
responsible for it. I have tried to re-order it, and indeed the code is
running parallel. Unfortunately, this simple trick is not a solution. It
turns out when "mpi_enreg%proc_distrb" is modified, the construction of
the Hamiltonian will be affected and the results can no longer be
trusted.
This is so far what I got. Comments and suggestions are
appreciated.

Best
Deyu Lu

On Mon, 2006-09-18 at 23:01 +0200, dylu@ucdavis.edu wrote:
> Dear abinit users:
> I have a question regarding the parallelization of outkss.F90, i.e.,
> the program writing the KSS file for a subsequent GW calculation. My system
> has 8 water moelcules placed periodically in SC cells and 32 (4*4*4)
> k-points are used for sampling. To get about 150 bands for each k-point
> under ecut=30 Ha, the partial diagonalization takes about a couple of hours
> (IBM AIX/power4+ with abinit5.1.2 or 5.2.2).
> To finish the job within the time limit of the supercomputer center
> (18 hours), I was hoping to run the job in parallel. However, running 10
> hours with 32 cpus (4 nodes) produced wavefuncs for 4 k-points and running
> 18 hours with 8 cpus (1 node) produced wavefuncs for 10 k-points. I'm kind
> of confused with the scaling of the parallelization. I did some tests using
> tgw_1.in under tests/tutorials with DATASET 1 only (KSS part, no GW, also
> setting nstep=1). With 2 cpus, the scaling is perfect. But with more, say
> 4, cpus, the scaling is pretty bad.
> I have the feeling that sending big wavefunc matrix between cpus could
> take quite some time, but it not may be the only factor. Hope people with
> more experience with MPI programming and familiar with the source code can
> give me some suggestions.
>
> Thanks
> Deyu Lu
>
> -----------------------------INPUT
> FILE----------------------------------------
> acell 3*11.732
> ecut 30
> ixc 11
>
> #Definition of the atom types
> ntypat 2 # There is only one type of atom
> znucl 1 8 # The keyword "znucl" refers to the atomic number of the
>
> #Definitioon of the k-point grid
> kptopt 1
> ngkpt 4 4 4
> nshiftk 1
> shiftk 0.5 0.5 0.5
>
> istwfk 32*1
> symmorphi 0
> nstep 100 # Maximal number of SCF cycles
> nband 34
> diemac 3.5 # Although this is not mandatory, it is worth to
> diemix 0.5 # function used as the standard preconditioner
>
> prtden 1
> nbandkss 150 # Number of bands in KSS file (the maximum possible)
> npwkss 4457 # 14.915 Ha
>
> #Definition of the atoms
> natom 24
> typat 16*1 8*2
> toldfe 1.0d-8
>
> xangst -1.770915920 1.256839825 -1.079211626
> -1.156326844 2.649702174 -1.720622083
> -1.180102192 -2.885145227 -0.015882150
> 0.268978367 -2.217143196 0.420187286
> 0.815634914 0.556314882 1.500261495
> 0.432901744 1.858703091 2.612469201
> -1.687632397 -1.257557546 1.839956586
> -1.153134808 0.254543639 2.195829065
> 0.988552101 -3.115473480 -1.580657088
> 0.463591779 -2.106859074 -2.941603535
> 2.721494031 2.989959718 -0.706831304
> 1.837223196 2.019236764 0.195625936
> 2.745692539 0.034601097 -0.228571029
> 2.981538651 0.272971339 1.348134702
> -2.171870314 -0.590821640 -2.364215811
> -1.429592969 -0.888051080 -0.748716697
> -1.838512467 2.369269478 -1.016291640
> -0.728936694 -1.987319326 0.495856040
> 0.167830160 0.901308139 2.252217434
> -2.021407178 -0.356002232 2.335649903
> 0.335619632 -3.074532487 -2.523892064
> 1.777997704 3.008397218 -0.245787912
> 2.241855244 0.459793227 0.602171653
> -2.131412127 -0.322996196 -1.201685489

question about parallelization in outkss.F90, dylu, 09/18/2006
- Re: [abinit-forum] question about parallelization in outkss.F90, Deyu Lu, 09/22/2006