forum@abinit.org
Subject: The ABINIT Users Mailing List ( CLOSED )
List archive
- From: "张�s" <zhangting1980323@gmail.com>
- To: forum@abinit.org
- Subject: Re: [abinit-forum] parallel bail out
- Date: Thu, 24 Jan 2008 19:43:47 +0800
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=OEDWHeZcXGY5AHNAAMHLEEPUORdsIzkWJaLXK6O7ugos9XH3l9JfUO0oXCKp4R3Fl5oDi/H6IWzTgFRg12R5xBNkwtoa41EAvhiupS1ntagoiiFOj60FV9tHzl0fwCKDZuZGKvADCAx0EtUV0UbQas8IxGuZDMqKuQLNnl3Tlts=
Hi, I think maybe the problem is because the right to write on disk. In my memory, you can add "umask" option to your environment file .bashrc, and then files created by abinip can be read and written by other machines.
Regards
Zhang Ting
Peking Univ.
Jan, 24th, 2008
2008/1/22, Toby D. Young <tyoung@ippt.gov.pl>:
Hello users,
I am having trouble running my input file on a cluster system with 4
processors; that's two two-processor machines. I'm quite a newbie at
abinitp.
I can see from the log file two worrying messages:
p1_7266: p4_error: alloc_p4_msg failed: 0
which (after some other output) continues with
p2_6059: p4_error: interrupt SIGx: 13
p2_6059: (955.289062) net_send: could not write to fd=5, errno = 32
mpiexec: Warning: tasks 0-1 exited with status 1.
Googling did not give me much help; except for that this may be a
mpiexec problem writing to disk(?) Has anyone had this problem or know
how to get round it?
I haven't had any trouble running the parallel tests / tutorials.
Thanks in advance.
Best,
Toby
=====================
complete message
p1_7266: (949.214844) xx_shmalloc: returning NULL; requested 3280880
bytes p1_7266: (949.214844) p4_shmalloc returning NULL; request =
3280880 bytes You can increase the amount of memory by setting the
environment variable P4_GLOBMEMSIZE (in bytes); the current size is
4194304 p1_7266: p4_error: alloc_p4_msg failed: 0
=====================
iter Etot(hartree) deltaE(h) residm vres2 diffor
maxfor
getcut: wavevector= 0.0000 0.0000 0.0000 ngfft= 80 80 128
ecut(hartree)= 50.000 => boxcut(ratio)= 2.05901
ewald : nr and ng are 3 and 25
ITER STEP NUMBER 1
vtorho : nnsclo_now= 2, note that nnsclo,dbl_nnsclo,istep= 0 0 1
-P-0000 leave_test : synchronization done...
vtorho: loop on k-points and spins done in parallel
p2_6059: p4_error: interrupt SIGx: 13
p2_6059: (955.289062) net_send: could not write to fd=5, errno = 32
mpiexec: Warning: tasks 0-1 exited with status 1.
--
Toby D. Young - Adiunkt (Assistant Professor)
Department of Computational Science
Institute of Fundamental Technological Research
Polish Academy of Sciences
Room 206, Swietokrzyska 21
00-049 Warsaw, POLAND
- parallel bail out, Toby D. Young, 01/22/2008
- Re: [abinit-forum] parallel bail out, Josef Zwanziger, 01/22/2008
- Re: [abinit-forum] parallel bail out, 张�s, 01/24/2008
- Re: [abinit-forum] parallel bail out, Toby D. Young, 01/24/2008
Archive powered by MHonArc 2.6.16.