Skip to Content.
Sympa Menu

forum - AW: [abinit-forum] running MPI

forum@abinit.org

Subject: The ABINIT Users Mailing List ( CLOSED )

List archive

AW: [abinit-forum] running MPI


Chronological Thread 
  • From: "Ludwig, Christian" <ludwigc@uni-mainz.de>
  • To: "forum@abinit.org" <forum@abinit.org>
  • Subject: AW: [abinit-forum] running MPI
  • Date: Mon, 2 Nov 2009 15:47:13 +0100
  • Accept-language: de-DE
  • Acceptlanguage: de-DE

Just a quick update. I found my own problem. mpich is using rsh, so having
ssh set up properly did not help. I recompiled mpich with option -rsh=ssh,
recompiled abinit and now it seems to work.


________________________________________
Von: forum-owner@abinit.org [forum-owner@abinit.org] im Auftrag von Ludwig,
Christian [ludwigc@uni-mainz.de]
Gesendet: Mittwoch, 28. Oktober 2009 15:52
An: forum@abinit.org
Betreff: [abinit-forum] running MPI

Hello,

I have been working with a parallel abinit on an IBM p590 for a while. Since
it is one machine with a bunch of processors, I just need to create a file
machines with

localhost:4

to run a job on 4 processors. Then I start the job with

mpirun -np 4 -machinefile machines abinip < job.files >& log

This is all working and I wanted to try this on a x86 cluster. Before I
continue, let me say that the master can ssh to itself and all nodes without
password prompt.
I installed mpich and abinit on the master and the directories are mirrored
to all nodes. When I execute mpirun on the master with a machinefile of

master:4

I get the error message

master: Connection refused
p0_3193: p4_error: Child process exited while making connection to remote
process on master: 0

written to the log file. With a machinefile of

node1:4

I get

Host address mismatch for 192.168.199.1
p0_3292: p4_error: Child process exited while making connection to remote
process on node1: 0

192.168.199.1 is not the ip of the host, I do not know how mpi gets the idea.
Finally I tried to execute mpirun while being on node1 and got

Permission denied.
p0_28472: p4_error: Child process exited while making connection to remote
process on node1: 0


Hopefully one of you can give me a hint what to do to make it work.

Cheers,
Christian




Archive powered by MHonArc 2.6.16.

Top of Page