Skip to Content.
Sympa Menu

forum - Parallel Problem on Cluster

forum@abinit.org

Subject: The ABINIT Users Mailing List ( CLOSED )

List archive

Parallel Problem on Cluster


Chronological Thread 
  • From: "Jin Zhang" <jin.zhang.pku@gmail.com>
  • To: forum@abinit.org
  • Subject: Parallel Problem on Cluster
  • Date: Mon, 4 Sep 2006 13:34:04 +0800
  • Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:mime-version:content-type; b=reriAfheaaeL/t9s1df5hU2JPhOZVsydscJasH8/4TuYzqdLdafFQPlRzwKI2CBV8br6iGCj7CU+6w/VtoH3j3xIddIcckdggAavaz86xz83NHPPxDvCq1NEYtfmcGLzm1U3e/9SZLzI33sd5COfEphtGIsISgppKyKY2Ya2iF0=

Dear all,

I met this para problem on a cluster managed by OpenPBS and MPICH-1.2.6.
All nodes are running RHEL(kernel
2.4.21-20) with dual Xeon 3.2GHz cpu
installed.
The cluster is equipped with InfiniBand, but I used GigaEther instead.
Anyway, this does not concern too much because InfiniBandAbinip does not
compile:)


The abinip(4.6.5) is compiled with ifort-8.1 and the serial version runs
fine.
Also, if I specify only ONE processor for ABINIP, it can finish the test
with nothing wrong.

I examined the LOG file and suspect that the problem is possibly caused

by network_io or something similar.
However, I have no idea at all how to fix it. So I decided to turn to
this forum for help.

The test jobs I used is Parallel Tutorial No.1 with no revision.

Messages below are the job-file written for submission to PBS and the

ERROR part of LOG file.
---------------------------------------------------------------------------
PBS-job-file:
#PBS -l nodes=2
#PBS -N PARA_TEST
#PBS -j oe
cd $PBS_O_WORKDIR
cat $PBS_NODEFILE | tee list

export PATH=$PBS_O_PATH
/usr/voltaire/mpi/bin/mpirun -machinefile ./list -np 2
~/bin/abinip_ifort81 < ./tparal_1.files > ./log
---------------------------------------------------------------------------

LOG file(Partial):
(At the very beginning)
ABINIT
forrtl: severe (24): end-of-file during read, unit 5, file stdin
Image PC Routine Line Source
abinip_ifort81 0000000000FEEE8A Unknown Unknown Unknown

abinip_ifort81 0000000000FEE76A Unknown Unknown Unknown
abinip_ifort81 0000000000FBC684 Unknown Unknown Unknown
abinip_ifort81 0000000000F8409D Unknown Unknown Unknown
abinip_ifort81 0000000000F845FD Unknown Unknown Unknown

abinip_ifort81 0000000000F9FA62 Unknown Unknown Unknown
abinip_ifort81 0000000000529B51 Unknown Unknown Unknown
abinip_ifort81 00000000004003FB Unknown Unknown Unknown
abinip_ifort81 000000000040029E Unknown Unknown Unknown

abinip_ifort81 0000000000FF734E Unknown Unknown Unknown
abinip_ifort81 00000000004001AA Unknown Unknown Unknown
ABINIT
Give name for formatted input file:

Give name for formatted input file:
/iodisk2/home/nano/abinit/abinit-
4.6.5-64Bit/Tutorial/Test/tparal_1.in
Give name for formatted output file:
/iodisk2/home/nano/abinit/abinit-4.6.5-64Bit/Tutorial/Test/tparal_1.out
Give root name for generic input files:
/iodisk2/home/nano/abinit/abinit-
4.6.5-64Bit/Tutorial/Test/tparal_1i
Give root name for generic output files:
/iodisk2/home/nano/abinit/abinit-4.6.5-64Bit/Tutorial/Test/tparal_1o
Give root name for generic temporary files:
/iodisk2/home/nano/abinit/abinit-
4.6.5-64Bit/Tutorial/Test/tparal_1
-P-0000 leave_test : synchronization done...

............
............
ewald : nr and ng are 3 and 11
rhohxc_coll : enter with option, nspden 1 1

ITER STEP NUMBER 1

vtorho : nnsclo_now= 2, note that nnsclo,dbl_nnsclo,istep= 0 0 1
Cleaning up all processes ...done.
Timeout alarm signaled

(The end of file)
--------------------------------------------------------------------------

As you can see, one of the two abinip processes forked by mpich
complaining about end-of-file error.
If I specify "-np 4", there'll be three.
And the output stops at ITER STEP 1. I'm not sure if this means the true

parallel parts never got chance to run.
(The last two lines is probably written by mpich and PBS.)

What a long post! Hopefully this does not bother you too much.
And any suggestion is appreciated.
Thanks in advance!


Best,
Jin Zhang
@Physics Department, Peking Univ.




Archive powered by MHonArc 2.6.16.

Top of Page