Skip to Content.
Sympa Menu

forum - MPI error

forum@abinit.org

Subject: The ABINIT Users Mailing List ( CLOSED )

List archive

MPI error


Chronological Thread 
  • From: "Narayani Choudhury" <n.choudhury@gl.ciw.edu>
  • To: <forum@abinit.org>
  • Subject: MPI error
  • Date: Sat, 13 Dec 2003 22:32:34 -0500 (EST)
  • Importance: Normal

Hi,

Recently, we compiled ABINIT413 on our CRAY SV-1 16 node cluster. The
sequential and parallel versions were compiled and all sequential as well
as parallel tests were sucessful.

However, while running the mpi jobs (often large jobs requiring several
hours), the job often hangs and creates dead cpus. This error often
apparently happens often prior to writing of a large binary file. The
error message sometimes given is

Register parity error
Beginning of Traceback:
Traceback aborted; possible stack corruption.
MPI: MPI_COMM_WORLD rank 8 has terminated without calling MPI_Finalize()
MPI: aborting job

[2] Exit 255 mpirun -np 10 ./abinip < p.files > log

Has anyone else encountered this problem? on restart, the calculation
moves ahead but fails at some other point in the calculation with an
identical error. does this point to hardware or software problems?


Thanks,
Narayani
-----------------------------------------
Narayani Choudhury
Geophysical laboratory,
Carnegie Institution of Washington,
5251, Broad Branch Road, N.W.,
Washington D.C. 20015
Phone: +1-202-478-8945
email: n.choudhury@gl.ciw.edu





  • MPI error, Narayani Choudhury, 12/14/2003

Archive powered by MHonArc 2.6.16.

Top of Page