Skip to Content.
Sympa Menu

forum - Re: [abinit-forum] 4.3.3 on Hitachi-sr11000 - MPI_COMM_WORLD==0

forum@abinit.org

Subject: The ABINIT Users Mailing List ( CLOSED )

List archive

Re: [abinit-forum] 4.3.3 on Hitachi-sr11000 - MPI_COMM_WORLD==0


Chronological Thread 
  • From: Marc Torrent <marc.torrent@cea.fr>
  • To: forum@abinit.org
  • Subject: Re: [abinit-forum] 4.3.3 on Hitachi-sr11000 - MPI_COMM_WORLD==0
  • Date: Fri, 15 Oct 2004 13:27:55 +0200

Eric (and Masao Arai),

In fact you were right when you said:
"mpif.h should match whatever the MPI library was compiled with"
And my answer also was right when I said that Abinit assumes that MPI_COMM_WORLD is not 91.
After a discussion with Mireille Boulet, we decided to produce a patch in order to make Abinit work with all version of mpif.h.
So Mireille has written that patch and it will be soon available in the
- Abinit v4.3.4 release (coming soon)
- Abinit v4.4.3 release
and 4.5.x releases of course.

I hope it will help you,

Best wishes,

Marc


Eric Roman a écrit :
Marc,

I don't understand this advice.

mpif.h should match whatever the MPI library was compiled with. Nothing else.
The MPI library was built expecting MPI_COMM_WORLD to be zero. That's why
it's defined to be zero in the header file that ships with the libraries.

If the MPI library is compiled thinking MPI_COMM_WORLD is 0, and it gets
91, the MPI library will see an uninitialized communicator, and return
MPI_ERR_COMM.

Since the MPI library has been compiled already, MPI_COMM_WORLD is just a
constant in the binary, (0 in his case)! If the headers don't match the
MPI libraries, there will be problems. This is like copying say, math.h or
unistd.h from a Solaris Sparc system to a Linux x86 ystem, and expecting it
to work. It might compile. Things might even run in some cases. But only
by sheer coincidence.

AFAIK, there is no part of the MPI specification that says MPI_COMM_WORLD
is nonzero. In fact, MPI_COMM_WORLD is zero in both LAM MPI, IBM's MPI,
Hitachi's MPI, and perhaps some versions of MPICH. This assumption breaks
compatibility with those platforms. I can't run the newer abinit versions
in parallel on our local IBM SP because of this assumption. abinit won't
run in parallel with LAM MPI because of this assumption. And now abinit
won't run in parallel on this Hitachi machine.

The assumption that MPI_COMM_WORLD is non-zero is a mistake. There's
nothing definitive about MPICH's version of mpif.h. (In MPICH 2 they set
MPI_COMM_WORLD to 114085068). When compiling, you have to use the headers
that match your binaries. And when writing code, the only thing you can
count on is what's inside the MPI specification. Nowhere does the MPI spec
say that MPI_COMM_WORLD is nonzero.

This is a bug, and one that I'd like to fix. Would you be able to put me in
touch with Mireille Boulet?

Best Wishes,
Eric
On Thu, Oct 07, 2004 at 05:36:14PM +0200, Marc Torrent wrote:

Dear Arai Masao,

I seems that you're using an "old" version of mpif...
Abinit's parallelization has been recently revised (by Mireille Boulet, here at the CEA-Bruyeres-le_Chatel, France) and these modifications are using the new version of the "mpif.h" where MPI_COMM_WOLRD is 91 and MPI_COMM_SELF is 92.
(In fact, in older version of mpif.h, MPI_COMM_WORLD=0 and MPI_COMM_SELF=1).

If you want Abinit v433 to run on your Hitachi, you have to use a newer version of mpif.h (for ex. v1.2.5.2).


Regards,

Marc Torrent
CEA-Bruyeres-le-Chatel
France


Arai Masao a écrit :

Dear all,

I am a collaborator of Wang Yuan Xu who asked some questions
about abinit on Hitachi SR-11000. Thank you for your kind replies.

I found possible reason why the abinit-4.3.3 does not work on this machine
in parallel mode.

In SR11000, MPI_COMM_WORLD is defined to 0 in /usr/include/mpif.h.
---- mpif.h -----
integer*4 MPI_COMM_WORLD,MPI_COMM_SELF
parameter (MPI_COMM_WORLD=0,MPI_COMM_SELF=1)
------------------

Unfortunately, the zero has special meaning in Src_1managempi/xdef_comm.f and
Src_1managempi/xfuncmpi.f.

--- xdeff_comm.f ----
subroutine xcomm_world(spaceComm)
[ lines deleted]
integer :: spaceComm
# if defined MPI
spaceComm = MPI_COMM_WORLD
# else
spaceComm = 0
# endif
--------------------------------

--- xfuncmpi.f ----
subroutine xsum_mpi_int(xval,spaceComm,ier)
[lines deleted]
# if defined MPI || defined MPI_FFT
integer , allocatable :: xsum(:)
if (spaceComm /= 0) then <<<<<<<<<<<< #1
!Accumulate xval on all proc. in spaceComm
[lines deleted]
end subroutine xsum_mpi_int
-------

With the line indicated by "#1", if the spaceComm is set to MPI_COMM_WORLD(=0),
the mpi routines does not work properly.
The version 4.2 does not have such checks. So, abinit-4.2 works properly
even on SR11000.

If we remove the comparison between spaceComm and 0, the parallel mode
seems to work properly. Is it safe to remove this comparison?

--
Masao ARAI
National Institute for Materials Science (NIMS)
Computational Materials Science Center
First-Principles Simulation Group (II)
mail: arai.masao@nims.go.jp







Archive powered by MHonArc 2.6.16.

Top of Page