forum@abinit.org
Subject: The ABINIT Users Mailing List ( CLOSED )
List archive
- From: delaire@caltech.edu
- To: forum@abinit.org
- Subject: BUG? Re: MPI_Comm_create(): Too many communicators
- Date: Thu, 1 Dec 2005 09:32:58 +0100
Hello all,
I posted a report a week ago regarding some MPI error I am getting (cf
original email below). In the mean time, we rebooted all the nodes on our
opteron/Debian cluster, in order to clean up some possible MPI communicators
that would have been assigned and never freed. I started the same job
(abinip-4.6.5 / mpich2) from this configuration, I am still getting the same
error about the program trying to use "too many communicators".
Note that mpich2 works flawlessly on our cluster for other applications.
Is this a bug in some Abinit routine not freeing MPI communicators?
thanks for your help,
regards,
Olivier.
PS:
Pierre-Matthieu: I did try to use the ipcs command, unfortunately, it appears
that the kernel on our compute nodes is not configured for this command; only
the head node is.
Flavien: I am getting the same error no matter how many nodes I try to run on.
thanks to both of you however.
-
-
-
Hello,
I am encountering some MPI errors when running parallel Abinip jobs under
mpich2. This is when I run my newly g95-compiled Abinip-4.6.5 with
g95-compiled mpich2 under our Linux-Debian cluster.
First the good:
the g95-compiled Abinip-4.6.5 passes all parallel tests in Test_paral (tests
A through J, with adapted Run script). The only differences I get in the
fldiff reports are date/timing and some minor differences in numerical
results.
Now the bad:
when I try to run larger jobs in parallel (these jobs have been run in serial
without pb before), I get an error:
MPI_Comm_create(number): Too many communicators
when the program hits the response-function part of the calculation. This has
happened for several different input files. Note that this problem does not
seem to occur for the ground-state part of the calculations.
Here is what I found online on the meaning of this MPI error message:
------
0032-160 Too many communicators (number) in string, task number
Explanation: MPI is unable to create a new communicator because the maximum
number of simultaneous communicators would be exceeded.
User Response: Be sure to free unneeded communicators with MPI_Comm_free so
that they can be reused.
Error Class: MPI_ERR_COMM
------
Has anyone encountered this error before? Is there a way to make sure that
MPI communicators get freed? or a way to set parallelizaton options so that
this error would be prevented?
I am appending below the input file I'm running as well as the tail of the
log file.
Thanks for any help,
Olivier.
log
#################################
-P-0000 cgwf3: WARNING -
-P-0000 New trial energy at line 3 = -5.546092E+02
-P-0000 is higher than former: -5.546092E+02
-P-0000
-P-0000 leave_test : synchronization done...
vtorho3: loop on k-points and spins done in parallel
vtorho3 : MPI_ALLREDUCE, buffer of size 915896 bytes
ETOT 52 -8.70108545352650E-02-3.865E-12 2.117E-15 5.405E-09
At SCF step 52 vres2 = 5.40E-09 < tolvrs= 1.00E-08 =>converged.
-P-0000 leave_test : synchronization done...
nstdy3: loop on k-points and spins done in parallel
-P-0000 leave_test : synchronization done...
================================================================================
----iterations are completed or convergence reached----
outwf : write wavefunction to file bccV_ph2o_DS3_1WF1
-P-0000 leave_test : synchronization done...
aborting job:
Fatal error in MPI_Comm_create: Other MPI error, error stack:
MPI_Comm_create(222): MPI_Comm_create(MPI_COMM_WORLD, group=0xc80101f8,
new_comm=0xfd3ad8) failed
MPI_Comm_create(120): Too many communicators
aborting job:
Fatal error in MPI_Comm_create: Other MPI error, error stack:
MPI_Comm_create(222): MPI_Comm_create(MPI_COMM_WORLD, group=0xc80101f8,
new_comm=0xf88e58) failed
MPI_Comm_create(120): Too many communicators
aborting job:
Fatal error in MPI_Comm_create: Other MPI error, error stack:
MPI_Comm_create(222): MPI_Comm_create(MPI_COMM_WORLD, group=0xc80101f8,
new_comm=0xf88e58) failed
MPI_Comm_create(120): Too many communicators
aborting job:
Fatal error in MPI_Comm_create: Other MPI error, error stack:
MPI_Comm_create(222): MPI_Comm_create(MPI_COMM_WORLD, group=0xc80101f8,
new_comm=0xfa9938) failed
MPI_Comm_create(120): Too many communicators
rank 2 in job 33 strongmad_33980 caused collective abort of all ranks
exit status of rank 2: killed by signal 9
rank 1 in job 33 strongmad_33980 caused collective abort of all ranks
exit status of rank 1: killed by signal 9
rank 0 in job 33 strongmad_33980 caused collective abort of all ranks
exit status of rank 0: killed by signal 9
#################################
- BUG? Re: MPI_Comm_create(): Too many communicators, delaire, 12/01/2005
- DNA analysis, elis, 12/02/2005
- Re: [abinit-forum] DNA analysis, Xavier Gonze, 12/02/2005
- DNA analysis, elis, 12/02/2005
Archive powered by MHonArc 2.6.16.