forum@abinit.org
Subject: The ABINIT Users Mailing List ( CLOSED )
List archive
- From: "Toby D. Young" <tyoung@ippt.gov.pl>
- To: forum@abinit.org
- Subject: Re: [abinit-forum] parallel bail out
- Date: Thu, 24 Jan 2008 13:28:18 +0100 (CET)
Thanks for the repies to my problem.
Yes, it is indeed a memory problem, but with mpiexec version not abinip.
By adding "setenv P4_GLOBMEMSIZE 512000000" to the tcsh script solved the
problem. As I understand it, this increases the amount of memory allowed
to be requeted from the clusters nodes (to 512MB).
Best,
Toby
-----
Toby D. Young - Adiunkt (Assistant Professor)
Department of Computational Science
Institute of Fundamental Technological Research
Polish Academy of Sciences
Room 206, ul. Swietokrzyska 21
00-049 Warszawa, POLAND
On Thu, 24 Jan 2008, [GB2312] ÕÅ^\s wrote:
> Hi, I think maybe the problem is because the right to write on disk. In my
> memory, you can add "umask" option to your environment file .bashrc, and
> then files created by abinip can be read and written by other machines.
>
>
> Regards
>
> Zhang Ting
> Peking Univ.
> Jan, 24th, 2008
>
> 2008/1/22, Toby D. Young <tyoung@ippt.gov.pl>:
> >
> >
> >
> >
> > Hello users,
> >
> > I am having trouble running my input file on a cluster system with 4
> > processors; that's two two-processor machines. I'm quite a newbie at
> > abinitp.
> >
> > I can see from the log file two worrying messages:
> >
> > p1_7266: p4_error: alloc_p4_msg failed: 0
> >
> > which (after some other output) continues with
> >
> > p2_6059: p4_error: interrupt SIGx: 13
> > p2_6059: (955.289062) net_send: could not write to fd=5, errno = 32
> > mpiexec: Warning: tasks 0-1 exited with status 1.
> >
> > Googling did not give me much help; except for that this may be a
> > mpiexec problem writing to disk(?) Has anyone had this problem or know
> > how to get round it?
> >
> > I haven't had any trouble running the parallel tests / tutorials.
> >
> > Thanks in advance.
> > Best,
> > Toby
> >
> >
> >
> > =====================
> > complete message
> >
> >
> > p1_7266: (949.214844) xx_shmalloc: returning NULL; requested 3280880
> > bytes p1_7266: (949.214844) p4_shmalloc returning NULL; request =
> > 3280880 bytes You can increase the amount of memory by setting the
> > environment variable P4_GLOBMEMSIZE (in bytes); the current size is
> > 4194304 p1_7266: p4_error: alloc_p4_msg failed: 0
> > =====================
> >
> > iter Etot(hartree) deltaE(h) residm vres2 diffor
> > maxfor
> >
> > getcut: wavevector= 0.0000 0.0000 0.0000 ngfft= 80 80 128
> > ecut(hartree)= 50.000 => boxcut(ratio)= 2.05901
> >
> > ewald : nr and ng are 3 and 25
> >
> > ITER STEP NUMBER 1
> > vtorho : nnsclo_now= 2, note that nnsclo,dbl_nnsclo,istep= 0 0 1
> > -P-0000 leave_test : synchronization done...
> > vtorho: loop on k-points and spins done in parallel
> > p2_6059: p4_error: interrupt SIGx: 13
> > p2_6059: (955.289062) net_send: could not write to fd=5, errno = 32
> > mpiexec: Warning: tasks 0-1 exited with status 1.
> >
> >
> >
> > --
> >
> > Toby D. Young - Adiunkt (Assistant Professor)
> > Department of Computational Science
> > Institute of Fundamental Technological Research
> > Polish Academy of Sciences
> > Room 206, Swietokrzyska 21
> > 00-049 Warsaw, POLAND
> >
>
- parallel bail out, Toby D. Young, 01/22/2008
- Re: [abinit-forum] parallel bail out, Josef Zwanziger, 01/22/2008
- Re: [abinit-forum] parallel bail out, 张�s, 01/24/2008
- Re: [abinit-forum] parallel bail out, Toby D. Young, 01/24/2008
Archive powered by MHonArc 2.6.16.