Skip to Content.
Sympa Menu

forum - Re: [abinit-forum] MPI communicators

forum@abinit.org

Subject: The ABINIT Users Mailing List ( CLOSED )

List archive

Re: [abinit-forum] MPI communicators


Chronological Thread 
  • From: Michel Cote <Michel.Cote@umontreal.ca>
  • To: forum@abinit.org
  • Subject: Re: [abinit-forum] MPI communicators
  • Date: Mon, 22 Oct 2007 11:58:24 -0400
  • Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:mime-version:in-reply-to:references:content-type:message-id:from:subject:date:to:x-mailer:sender; b=dQq74DKak0Uefet5iB7U10Rm9SPNNNjWuIBeoln3YzopMFp9ZaCAZ1HzGRjfgLb5IGjqqLQlcWOvgSjRKwTV/QQMgsafOXaRzhK5shYSX3VihQIVGEKbgfLbwq+4LVmIg6Jv+qhzdy2Jb7Ug7ezYiHwCz8J7V+qwaNtVZUguctU=

With the last version of the code, you will need to set the values of MPI_GROUP_MAX and MPI_COMM_MAX to about twice the number of k-points. The way Abinit is coded, a MPI group is created for each k-point and that can be quite large (~1000s) but there is no easy way to change that at the moment.

Michel Côté

Le 07-10-18 à 10:03, a.cote@ucl.ac.uk a écrit :

Dear abinit users,

I wonder if anyone has the solution to this problem. I am trying to run
a phonon calculation on both a Cray and an IBM machine. In both cases,
it crashes complaining about the number of MPI communicators. 
On an SGI machine this problem was solved by setting MPI_GROUP_MAX and MPI_COMM_MAX 
to large values, but it is not possible to explicitely do that on other 
machines. 

In an earlier correspondence with Michel Cote I was told that the new
version contained a fix by X. Gonze that freed most of the unused 
communicators, so I was hoping it would work... alas, it doesn't.

Here is the error from the log. If anyone has any idea how to fix this, 
I would really appreciate it. The error is the same for v. 5.3.5, and 5.4.4.

-P-0000 ================================================================================
-P-0000 == DATASET  2 ==================================================================
-P-0000
dtsetcopy : copying area  algalch    the actual size () of the index ()  differs from its standard size ()
dtsetcopy : copying area  istwfk     the actual size (À) of the index ()  differs from its standard size (`)
dtsetcopy : copying area  kberry     the actual size () of the index ()  differs from its standard size ()
) of the index ()  differs from its standard size () (
  dtsetcopy : allocated densty=  T
dtsetcopy : copying area  kpt        the actual size (À) of the index ()  differs from its standard size (`)
dtsetcopy : copying area  kptns      the actual size (À) of the index ()  differs from its standard size (`)
dtsetcopy : copying area  mixalch    the actual size () of the index ()  differs from its standard size ()
dtsetcopy : copying area  mixalch    the actual size () of the index ()  differs from its standard size ()
dtsetcopy : copying area  occ_orig   the actual size (') of the index ()  differs from its standard size (Q)
dtsetcopy : copying area  shiftk     the actual size ) of the index ()  differs from its standard size ()
dtsetcopy : copying area  wtk        the actual size (À) of the index ()  differs from its standard size (`)
 mkfilename : getwfk/=0, take file _WFK from output of DATASET   1.

 getdim_nloc : deduce lmnmax  =   8, lnmax  =   2,
                      lmnmaxso=   8, lnmaxso=   2.
  distrb2: enter
  mpi_enreg%parareel=            0
  mpi_enreg%paralbd=            1
  mpi_enreg%paral_compil_respfn=            0
  distrb2: exit
aborting job:
Fatal error in MPI_Comm_create: Other MPI error, error stack:
MPI_Comm_create(217): MPI_Comm_create(MPI_COMM_WORLD, group=0xc8000066, new_comm=0x33c5178) failed
MPI_Comm_create(120): Too many communicators

...repeats a few times, with different numbers for group and new_comm, and then:

[NID 11612]Apid 20033: initiated application termination
Application 20033 exit codes: 13
Application 20033 resources: utime 0, stime 0

I even tried doing the first calculation first, outside the dataset, and 
then reading from the WFK file, and again it crashes.

Thank you very much,
Alex

_____________________________________
Michel Cote
Departement de physique
Universite de Montreal





Archive powered by MHonArc 2.6.16.

Top of Page