forum@abinit.org
Subject: The ABINIT Users Mailing List ( CLOSED )
List archive
- From: Eric Roman <ESRoman@berkeley.edu>
- To: forum@abinit.org
- Subject: Re: [abinit-forum] 4.3.3 on Hitachi-sr11000 - MPI_COMM_WORLD==0
- Date: Thu, 7 Oct 2004 15:19:05 -0700
Marc,
I don't understand this advice.
mpif.h should match whatever the MPI library was compiled with. Nothing else.
The MPI library was built expecting MPI_COMM_WORLD to be zero. That's why
it's defined to be zero in the header file that ships with the libraries.
If the MPI library is compiled thinking MPI_COMM_WORLD is 0, and it gets
91, the MPI library will see an uninitialized communicator, and return
MPI_ERR_COMM.
Since the MPI library has been compiled already, MPI_COMM_WORLD is just a
constant in the binary, (0 in his case)! If the headers don't match the
MPI libraries, there will be problems. This is like copying say, math.h or
unistd.h from a Solaris Sparc system to a Linux x86 ystem, and expecting it
to work. It might compile. Things might even run in some cases. But only
by sheer coincidence.
AFAIK, there is no part of the MPI specification that says MPI_COMM_WORLD
is nonzero. In fact, MPI_COMM_WORLD is zero in both LAM MPI, IBM's MPI,
Hitachi's MPI, and perhaps some versions of MPICH. This assumption breaks
compatibility with those platforms. I can't run the newer abinit versions
in parallel on our local IBM SP because of this assumption. abinit won't
run in parallel with LAM MPI because of this assumption. And now abinit
won't run in parallel on this Hitachi machine.
The assumption that MPI_COMM_WORLD is non-zero is a mistake. There's
nothing definitive about MPICH's version of mpif.h. (In MPICH 2 they set
MPI_COMM_WORLD to 114085068). When compiling, you have to use the headers
that match your binaries. And when writing code, the only thing you can
count on is what's inside the MPI specification. Nowhere does the MPI spec
say that MPI_COMM_WORLD is nonzero.
This is a bug, and one that I'd like to fix. Would you be able to put me in
touch with Mireille Boulet?
Best Wishes,
Eric
On Thu, Oct 07, 2004 at 05:36:14PM +0200, Marc Torrent wrote:
> Dear Arai Masao,
>
> I seems that you're using an "old" version of mpif...
> Abinit's parallelization has been recently revised (by Mireille Boulet,
> here at the CEA-Bruyeres-le_Chatel, France) and these modifications are
> using the new version of the "mpif.h" where MPI_COMM_WOLRD is 91 and
> MPI_COMM_SELF is 92.
> (In fact, in older version of mpif.h, MPI_COMM_WORLD=0 and MPI_COMM_SELF=1).
>
> If you want Abinit v433 to run on your Hitachi, you have to use a newer
> version of mpif.h (for ex. v1.2.5.2).
>
>
> Regards,
>
> Marc Torrent
> CEA-Bruyeres-le-Chatel
> France
>
>
> Arai Masao a écrit :
> >Dear all,
> >
> >I am a collaborator of Wang Yuan Xu who asked some questions
> >about abinit on Hitachi SR-11000. Thank you for your kind replies.
> >
> >I found possible reason why the abinit-4.3.3 does not work on this machine
> >in parallel mode.
> >
> >In SR11000, MPI_COMM_WORLD is defined to 0 in /usr/include/mpif.h.
> >---- mpif.h -----
> > integer*4 MPI_COMM_WORLD,MPI_COMM_SELF
> > parameter (MPI_COMM_WORLD=0,MPI_COMM_SELF=1)
> >------------------
> >
> >Unfortunately, the zero has special meaning in
> >Src_1managempi/xdef_comm.f and
> >Src_1managempi/xfuncmpi.f.
> >
> >--- xdeff_comm.f ----
> > subroutine xcomm_world(spaceComm)
> > [ lines deleted]
> > integer :: spaceComm
> ># if defined MPI
> > spaceComm = MPI_COMM_WORLD
> ># else
> > spaceComm = 0
> ># endif
> >--------------------------------
> >
> >--- xfuncmpi.f ----
> > subroutine xsum_mpi_int(xval,spaceComm,ier)
> > [lines deleted]
> ># if defined MPI || defined MPI_FFT
> > integer , allocatable :: xsum(:)
> > if (spaceComm /= 0) then <<<<<<<<<<<< #1
> > !Accumulate xval on all proc. in spaceComm
> > [lines deleted]
> > end subroutine xsum_mpi_int
> >-------
> >
> >With the line indicated by "#1", if the spaceComm is set to
> >MPI_COMM_WORLD(=0),
> >the mpi routines does not work properly.
> >The version 4.2 does not have such checks. So, abinit-4.2 works properly
> >even on SR11000.
> >
> >If we remove the comparison between spaceComm and 0, the parallel mode
> >seems to work properly. Is it safe to remove this comparison?
> >
> >--
> >Masao ARAI
> >National Institute for Materials Science (NIMS)
> >Computational Materials Science Center
> >First-Principles Simulation Group (II)
> >mail: arai.masao@nims.go.jp
> >
> >
--
Eric Roman Department of Physics
510-642-7302 UC Berkeley
- 4.3.3 on Hitachi-sr11000 - MPI_COMM_WORLD==0, Arai Masao, 10/07/2004
- Re: [abinit-forum] 4.3.3 on Hitachi-sr11000 - MPI_COMM_WORLD==0, Marc Torrent, 10/07/2004
- Re: [abinit-forum] 4.3.3 on Hitachi-sr11000 - MPI_COMM_WORLD==0, Eric Roman, 10/08/2004
- Re: [abinit-forum] 4.3.3 on Hitachi-sr11000 - MPI_COMM_WORLD==0, Marc Torrent, 10/15/2004
- Re: [abinit-forum] 4.3.3 on Hitachi-sr11000 - MPI_COMM_WORLD==0, Eric Roman, 10/15/2004
- Re: [abinit-forum] 4.3.3 on Hitachi-sr11000 - MPI_COMM_WORLD==0, Marc Torrent, 10/18/2004
- Re: [abinit-forum] 4.3.3 on Hitachi-sr11000 - MPI_COMM_WORLD==0, Eric Roman, 10/18/2004
- Re: [abinit-forum] 4.3.3 on Hitachi-sr11000 - MPI_COMM_WORLD==0, Marc Torrent, 10/18/2004
- Re: [abinit-forum] 4.3.3 on Hitachi-sr11000 - MPI_COMM_WORLD==0, Eric Roman, 10/15/2004
- Re: [abinit-forum] 4.3.3 on Hitachi-sr11000 - MPI_COMM_WORLD==0, Marc Torrent, 10/15/2004
- Re: [abinit-forum] 4.3.3 on Hitachi-sr11000 - MPI_COMM_WORLD==0, Eric Roman, 10/08/2004
- Re: [abinit-forum] 4.3.3 on Hitachi-sr11000 - MPI_COMM_WORLD==0, Marc Torrent, 10/07/2004
Archive powered by MHonArc 2.6.16.