forum@abinit.org
Subject: The ABINIT Users Mailing List ( CLOSED )
List archive
- From: "Ludwig, Christian" <ludwigc@uni-mainz.de>
- To: "forum@abinit.org" <forum@abinit.org>
- Subject: AW: [abinit-forum] running MPI
- Date: Mon, 2 Nov 2009 17:54:14 +0100
- Accept-language: de-DE
- Acceptlanguage: de-DE
I just got another problem. On the master the parallel run works just fine,
but when I want to start a job on a node from the master, I get
m_3016: p4_error: rm_start: net_conn_to_listener failed: 14777
p0_11620: p4_error: Child process exited while making connection to remote
process on node: 0
When running from a node with 4 procs and with the option -np 4, I get
Could not find enough machines for architecture LINUX
even though there should be.
When using -np 1, my log file looks like this
ABINIT
Give name for formatted input file:
abinit.in
Give name for formatted output file:
abinit.out
Give root name for generic input files:
abi
Give root name for generic output files:
abo
Give root name for generic temporary files:
tmp
-P-0000 leave_test : synchronization done...
and nothing more happens. The process idles at 0% CPU.
Especially this last thing confuses me, since it seems to work in principle,
but for some reason it does not really do anything.
Any help is greatly appreciated.
Cheers,
Christian
________________________________________
Von: forum-owner@abinit.org [forum-owner@abinit.org] im Auftrag von Ludwig,
Christian [ludwigc@uni-mainz.de]
Gesendet: Montag, 2. November 2009 15:47
An: forum@abinit.org
Betreff: AW: [abinit-forum] running MPI
Just a quick update. I found my own problem. mpich is using rsh, so having
ssh set up properly did not help. I recompiled mpich with option -rsh=ssh,
recompiled abinit and now it seems to work.
________________________________________
Von: forum-owner@abinit.org [forum-owner@abinit.org] im Auftrag von Ludwig,
Christian [ludwigc@uni-mainz.de]
Gesendet: Mittwoch, 28. Oktober 2009 15:52
An: forum@abinit.org
Betreff: [abinit-forum] running MPI
Hello,
I have been working with a parallel abinit on an IBM p590 for a while. Since
it is one machine with a bunch of processors, I just need to create a file
machines with
localhost:4
to run a job on 4 processors. Then I start the job with
mpirun -np 4 -machinefile machines abinip < job.files >& log
This is all working and I wanted to try this on a x86 cluster. Before I
continue, let me say that the master can ssh to itself and all nodes without
password prompt.
I installed mpich and abinit on the master and the directories are mirrored
to all nodes. When I execute mpirun on the master with a machinefile of
master:4
I get the error message
master: Connection refused
p0_3193: p4_error: Child process exited while making connection to remote
process on master: 0
written to the log file. With a machinefile of
node1:4
I get
Host address mismatch for 192.168.199.1
p0_3292: p4_error: Child process exited while making connection to remote
process on node1: 0
192.168.199.1 is not the ip of the host, I do not know how mpi gets the idea.
Finally I tried to execute mpirun while being on node1 and got
Permission denied.
p0_28472: p4_error: Child process exited while making connection to remote
process on node1: 0
Hopefully one of you can give me a hint what to do to make it work.
Cheers,
Christian
- AW: [abinit-forum] running MPI, Ludwig, Christian, 11/02/2009
- AW: [abinit-forum] running MPI, Ludwig, Christian, 11/02/2009
Archive powered by MHonArc 2.6.16.