Skip to Content.
Sympa Menu

forum - AW: [abinit-forum] running MPI

forum@abinit.org

Subject: The ABINIT Users Mailing List ( CLOSED )

List archive

AW: [abinit-forum] running MPI


Chronological Thread 
  • From: "Ludwig, Christian" <ludwigc@uni-mainz.de>
  • To: "forum@abinit.org" <forum@abinit.org>
  • Subject: AW: [abinit-forum] running MPI
  • Date: Mon, 2 Nov 2009 17:54:14 +0100
  • Accept-language: de-DE
  • Acceptlanguage: de-DE

I just got another problem. On the master the parallel run works just fine,
but when I want to start a job on a node from the master, I get

m_3016: p4_error: rm_start: net_conn_to_listener failed: 14777
p0_11620: p4_error: Child process exited while making connection to remote
process on node: 0

When running from a node with 4 procs and with the option -np 4, I get

Could not find enough machines for architecture LINUX

even though there should be.
When using -np 1, my log file looks like this
ABINIT

Give name for formatted input file:
abinit.in
Give name for formatted output file:
abinit.out
Give root name for generic input files:
abi
Give root name for generic output files:
abo
Give root name for generic temporary files:
tmp
-P-0000 leave_test : synchronization done...

and nothing more happens. The process idles at 0% CPU.
Especially this last thing confuses me, since it seems to work in principle,
but for some reason it does not really do anything.

Any help is greatly appreciated.

Cheers,
Christian


________________________________________
Von: forum-owner@abinit.org [forum-owner@abinit.org] im Auftrag von Ludwig,
Christian [ludwigc@uni-mainz.de]
Gesendet: Montag, 2. November 2009 15:47
An: forum@abinit.org
Betreff: AW: [abinit-forum] running MPI

Just a quick update. I found my own problem. mpich is using rsh, so having
ssh set up properly did not help. I recompiled mpich with option -rsh=ssh,
recompiled abinit and now it seems to work.


________________________________________
Von: forum-owner@abinit.org [forum-owner@abinit.org] im Auftrag von Ludwig,
Christian [ludwigc@uni-mainz.de]
Gesendet: Mittwoch, 28. Oktober 2009 15:52
An: forum@abinit.org
Betreff: [abinit-forum] running MPI

Hello,

I have been working with a parallel abinit on an IBM p590 for a while. Since
it is one machine with a bunch of processors, I just need to create a file
machines with

localhost:4

to run a job on 4 processors. Then I start the job with

mpirun -np 4 -machinefile machines abinip < job.files >& log

This is all working and I wanted to try this on a x86 cluster. Before I
continue, let me say that the master can ssh to itself and all nodes without
password prompt.
I installed mpich and abinit on the master and the directories are mirrored
to all nodes. When I execute mpirun on the master with a machinefile of

master:4

I get the error message

master: Connection refused
p0_3193: p4_error: Child process exited while making connection to remote
process on master: 0

written to the log file. With a machinefile of

node1:4

I get

Host address mismatch for 192.168.199.1
p0_3292: p4_error: Child process exited while making connection to remote
process on node1: 0

192.168.199.1 is not the ip of the host, I do not know how mpi gets the idea.
Finally I tried to execute mpirun while being on node1 and got

Permission denied.
p0_28472: p4_error: Child process exited while making connection to remote
process on node1: 0


Hopefully one of you can give me a hint what to do to make it work.

Cheers,
Christian




Archive powered by MHonArc 2.6.16.

Top of Page