Skip to Content.
Sympa Menu

forum - Re: Re: [abinit-forum] parallelism 5.3.2

forum@abinit.org

Subject: The ABINIT Users Mailing List ( CLOSED )

List archive

Re: Re: [abinit-forum] parallelism 5.3.2


Chronological Thread 
  • From: delaire@caltech.edu
  • To: forum@abinit.org
  • Subject: Re: Re: [abinit-forum] parallelism 5.3.2
  • Date: Fri, 6 Jul 2007 00:48:48 +0200

Hi Matthieu,

Actually, I have no easy way to turn off the 'nolocal' option for mpirun (our
mpi launching system is home-developped by our administrator, it launches
jobs with mpiexec).

Could you please tell me what routine is causing the hanging/slow
writing/crashing in parallel mode for larger problems?
Maybe we can take a look at it?

Also, in a more general setting, does abinis/abinip write anything out to
/temp?
Our local disks on compute nodes are ramdisks, which may be an issue?...

thanks,
Olivier

####################

> Thanks for your input. The problem we've been having sounds very similar,
> although our cluster here is based on rather different hardware/OS:
> dual-core Opteron with 2GB/cpu, Debian, PGI-7.0 compiler, MPICH2.

yup, it feels more like a mpi bug than anything platform related, but that's
intuition (yuk)

> I will look into this "-nolocal" setting for mpirun. How did you force MPI
> not run in this mode?

Well, there may be defaults, but in my case, you just run mpirun without the
option. To turn it on I add the flag:

/usr/local/mpich/bin/mpirun -nolocal -np 4 -machinefile ../machines abinip <
etc...

As an update, running the mpirun from a daughter/son node also works (without
the nolocal) but the writing is abominably slow. abinip hangs and writes at
100% of the cpu for a few seconds every few minutes, and the rest of the
time, nothing. Eventually it does work, though.

> Also, have you investigated the dependence of the crashiness on the size of
> your problem? In my case, it seems to depend on the kpt grid size (see my
> post from 6/29).

Absolutely: I tried to make it crash by increasing the cutoff or other
parameters from a working minimal input file, but it always goes fine, from 4
to 30 procs with 32 kpoints. Only when I went to a 4x4x4 kpt grid (with 4
shifts) did it start crashing.

> Finally, do you /should one systematically use paral_rf=1 for parallel RF
> calculations?

not necessarily. kpoint or band parallelism will decrease your memory
footprint, which is nice. paral_rf is very efficient as all calculations are
independent, but it doesn't reduce the footprint unless it's used
simultaneously with another (kpt, band) parallelism. I have no idea how/if
this works, or how to activate it. The added //ism may be automatic if nproc
> nperts.

bye

Matthieu



Archive powered by MHonArc 2.6.16.

Top of Page