Skip to Content.
Sympa Menu

forum - BUG? Re: MPI_Comm_create(): Too many communicators

forum@abinit.org

Subject: The ABINIT Users Mailing List ( CLOSED )

List archive

BUG? Re: MPI_Comm_create(): Too many communicators


Chronological Thread 
  • From: delaire@caltech.edu
  • To: forum@abinit.org
  • Subject: BUG? Re: MPI_Comm_create(): Too many communicators
  • Date: Thu, 1 Dec 2005 09:32:58 +0100

Hello all,

I posted a report a week ago regarding some MPI error I am getting (cf
original email below). In the mean time, we rebooted all the nodes on our
opteron/Debian cluster, in order to clean up some possible MPI communicators
that would have been assigned and never freed. I started the same job
(abinip-4.6.5 / mpich2) from this configuration, I am still getting the same
error about the program trying to use "too many communicators".
Note that mpich2 works flawlessly on our cluster for other applications.

Is this a bug in some Abinit routine not freeing MPI communicators?

thanks for your help,
regards,
Olivier.

PS:

Pierre-Matthieu: I did try to use the ipcs command, unfortunately, it appears
that the kernel on our compute nodes is not configured for this command; only
the head node is.

Flavien: I am getting the same error no matter how many nodes I try to run on.

thanks to both of you however.
-
-
-

Hello,

I am encountering some MPI errors when running parallel Abinip jobs under
mpich2. This is when I run my newly g95-compiled Abinip-4.6.5 with
g95-compiled mpich2 under our Linux-Debian cluster.

First the good:
the g95-compiled Abinip-4.6.5 passes all parallel tests in Test_paral (tests
A through J, with adapted Run script). The only differences I get in the
fldiff reports are date/timing and some minor differences in numerical
results.

Now the bad:
when I try to run larger jobs in parallel (these jobs have been run in serial
without pb before), I get an error:
MPI_Comm_create(number): Too many communicators
when the program hits the response-function part of the calculation. This has
happened for several different input files. Note that this problem does not
seem to occur for the ground-state part of the calculations.

Here is what I found online on the meaning of this MPI error message:

------
0032-160 Too many communicators (number) in string, task number

Explanation: MPI is unable to create a new communicator because the maximum
number of simultaneous communicators would be exceeded.

User Response: Be sure to free unneeded communicators with MPI_Comm_free so
that they can be reused.

Error Class: MPI_ERR_COMM
------

Has anyone encountered this error before? Is there a way to make sure that
MPI communicators get freed? or a way to set parallelizaton options so that
this error would be prevented?

I am appending below the input file I'm running as well as the tail of the
log file.
Thanks for any help,
Olivier.

log
#################################
-P-0000 cgwf3: WARNING -
-P-0000 New trial energy at line 3 = -5.546092E+02
-P-0000 is higher than former: -5.546092E+02
-P-0000
-P-0000 leave_test : synchronization done...
vtorho3: loop on k-points and spins done in parallel
vtorho3 : MPI_ALLREDUCE, buffer of size 915896 bytes
ETOT 52 -8.70108545352650E-02-3.865E-12 2.117E-15 5.405E-09

At SCF step 52 vres2 = 5.40E-09 < tolvrs= 1.00E-08 =>converged.
-P-0000 leave_test : synchronization done...
nstdy3: loop on k-points and spins done in parallel
-P-0000 leave_test : synchronization done...
================================================================================

----iterations are completed or convergence reached----

outwf : write wavefunction to file bccV_ph2o_DS3_1WF1
-P-0000 leave_test : synchronization done...
aborting job:
Fatal error in MPI_Comm_create: Other MPI error, error stack:
MPI_Comm_create(222): MPI_Comm_create(MPI_COMM_WORLD, group=0xc80101f8,
new_comm=0xfd3ad8) failed
MPI_Comm_create(120): Too many communicators
aborting job:
Fatal error in MPI_Comm_create: Other MPI error, error stack:
MPI_Comm_create(222): MPI_Comm_create(MPI_COMM_WORLD, group=0xc80101f8,
new_comm=0xf88e58) failed
MPI_Comm_create(120): Too many communicators
aborting job:
Fatal error in MPI_Comm_create: Other MPI error, error stack:
MPI_Comm_create(222): MPI_Comm_create(MPI_COMM_WORLD, group=0xc80101f8,
new_comm=0xf88e58) failed
MPI_Comm_create(120): Too many communicators
aborting job:
Fatal error in MPI_Comm_create: Other MPI error, error stack:
MPI_Comm_create(222): MPI_Comm_create(MPI_COMM_WORLD, group=0xc80101f8,
new_comm=0xfa9938) failed
MPI_Comm_create(120): Too many communicators
rank 2 in job 33 strongmad_33980 caused collective abort of all ranks
exit status of rank 2: killed by signal 9
rank 1 in job 33 strongmad_33980 caused collective abort of all ranks
exit status of rank 1: killed by signal 9
rank 0 in job 33 strongmad_33980 caused collective abort of all ranks
exit status of rank 0: killed by signal 9
#################################



Archive powered by MHonArc 2.6.16.

Top of Page