forum@abinit.org
Subject: The ABINIT Users Mailing List ( CLOSED )
List archive
- From: "Narayani Choudhury" <n.choudhury@gl.ciw.edu>
- To: <forum@abinit.org>
- Subject: MPI error
- Date: Sat, 13 Dec 2003 22:32:34 -0500 (EST)
- Importance: Normal
Hi,
Recently, we compiled ABINIT413 on our CRAY SV-1 16 node cluster. The
sequential and parallel versions were compiled and all sequential as well
as parallel tests were sucessful.
However, while running the mpi jobs (often large jobs requiring several
hours), the job often hangs and creates dead cpus. This error often
apparently happens often prior to writing of a large binary file. The
error message sometimes given is
Register parity error
Beginning of Traceback:
Traceback aborted; possible stack corruption.
MPI: MPI_COMM_WORLD rank 8 has terminated without calling MPI_Finalize()
MPI: aborting job
[2] Exit 255 mpirun -np 10 ./abinip < p.files > log
Has anyone else encountered this problem? on restart, the calculation
moves ahead but fails at some other point in the calculation with an
identical error. does this point to hardware or software problems?
Thanks,
Narayani
-----------------------------------------
Narayani Choudhury
Geophysical laboratory,
Carnegie Institution of Washington,
5251, Broad Branch Road, N.W.,
Washington D.C. 20015
Phone: +1-202-478-8945
email: n.choudhury@gl.ciw.edu
- MPI error, Narayani Choudhury, 12/14/2003
Archive powered by MHonArc 2.6.16.