Skip to Content.
Sympa Menu

forum - Re: Re: [abinit-forum] abinit5.2.2 parallel job crashed on IBM AIX/power4+

forum@abinit.org

Subject: The ABINIT Users Mailing List ( CLOSED )

List archive

Re: Re: [abinit-forum] abinit5.2.2 parallel job crashed on IBM AIX/power4+


Chronological Thread 
  • From: Riad Shaltaf <shaltaf@pcpm.ucl.ac.be>
  • To: forum@abinit.org
  • Subject: Re: Re: [abinit-forum] abinit5.2.2 parallel job crashed on IBM AIX/power4+
  • Date: Thu, 14 Sep 2006 15:30:17 +0200
  • Organization: PCPM

Dear Deyu Lu,

Obviously there is nothing wrong in your input files as I have checked
in your previous e mail. In fact as far as I know the diagonalization is
performed in parallel, but writing the results to the KSS file is done
in sequential. In fact in case of huge KSS calculations you may wait for
some time for the file to be written, so it is better in this case - I
think- to write locally in the machine hard disk.

Riad

On Thu, 2006-09-14 at 03:00 +0200, dylu@ucdavis.edu wrote:
> Riad:
> Thank you for your input. The question now is about parallelizatio. I
> switched to kssform=1 requiring a partial diagonalization of the
> Hamiltonian. To be safer, I used a abinit 4.6.5 binary installed on Alpha
> EV6.8CB/Tru64 Unix machines. After I submitted the job with 4 cpus, but I
> noticed that the digonalization part is not quite parallel among 32 k
> points although 8 kpoints are distributed for each cpu.
> As shown in the log files blow, it seems to me that one cpu started the
> diagonalization only after the previous one finished ( 1 after 0, 2 after
> 1, ..., the job shown was run for the kpts 1-14 only) I'm kind of worried
> that as my system grows larger, the direct/partial diagonalization may take
> too long if the job is not running truely in parallel.
>
> Best
> Deyu Lu
>
> ----------------------------------------------------------------------------
> >From main LOG file:
> Writing out eigenvalues/vectors for ikpt= 13.
> Occupation numbers for ikpt= 13:
> 13 2.0000 2.0000 2.0000 2.0000 2.0000 2.0000 2.0000 2.0000
> 2.0000
> 2.0000 2.0000 2.0000 2.0000 2.0000 2.0000 2.0000 2.0000
> 2.0000
> 2.0000 2.0000 2.0000 2.0000 2.0000 2.0000 2.0000 2.0000
> 2.0000
> 2.0000 2.0000 2.0000 2.0000 2.0000 0.0000 0.0000 0.0000
> 0.0000
> 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
> 0.0000
> 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
> 0.0000
> 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
> 0.0000
> 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
> 0.0000
> 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
> 0.0000
> 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
> 0.0000
> 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
> 0.0000
> 0.0000
> -P-0000
> -P-0000 k-point 14
> ---------------------------------------------------------------------------
> >From LOG_0001
> -P-0001 leave_test : synchronization done...
> -P-0001
> -P-0001 k-point 1
> -P-0001
> -P-0001 k-point 2
> -P-0001
> -P-0001 k-point 3
> -P-0001
> -P-0001 k-point 4
> -P-0001
> -P-0001 k-point 5
> -P-0001
> -P-0001 k-point 6
> -P-0001
> -P-0001 k-point 7
> -P-0001
> -P-0001 k-point 8
> -P-0001
> -P-0001 k-point 9
> -P-0001 Calculating <G|H|G'> elements
> -P-0001 Begin partial diago for ikpt= 9 - Size of mat.= 2440 - # bnds=
> 100
> -P-0001
> -P-0001 k-point 10
> -P-0001 Calculating <G|H|G'> elements
> -P-0001 Begin partial diago for ikpt= 10 - Size of mat.= 2442 - # bnds=
> 100
> -P-0001
> -P-0001 k-point 11
> -P-0001 Calculating <G|H|G'> elements
> -P-0001 Begin partial diago for ikpt= 11 - Size of mat.= 2442 - # bnds=
> 100
> -P-0001
> -P-0001 k-point 12
> -P-0001 Calculating <G|H|G'> elements
> -P-0001 Begin partial diago for ikpt= 12 - Size of mat.= 2440 - # bnds=
> 100
> -P-0001
> -P-0001 k-point 13
> -P-0001 Calculating <G|H|G'> elements
> -P-0001 Begin partial diago for ikpt= 13 - Size of mat.= 2445 - # bnds=
> 100
> -P-0001
> -P-0001 k-point 14
> -P-0001 Calculating <G|H|G'> elements
> -P-0001 Begin partial diago for ikpt= 14 - Size of mat.= 2440 - # bnds=
> 100
> ---------------------------------------------------------------------------
> >From LOG_0002
> -P-0002 leave_test : synchronization done...
> -P-0002
> -P-0002 k-point 1
> -P-0002
> -P-0002 k-point 2
> -P-0002
> -P-0002 k-point 3
> -P-0002
> -P-0002 k-point 4
> -P-0002
> -P-0002 k-point 5
> -P-0002
> -P-0002 k-point 6
> -P-0002
> -P-0002 k-point 7
> -P-0002
> -P-0002 k-point 8
> -P-0002
> -P-0002 k-point 9
> -P-0002
> -P-0002 k-point 10
> -P-0002
> -P-0002 k-point 11
> -P-0002
> -P-0002 k-point 12
> -P-0002
> -P-0002 k-point 13
> -P-0002
> -P-0002 k-point 14
> ----------------------------------------------------------------------------
> >From LOG_0003
> -P-0003 k-point 1
> -P-0003
> -P-0003 k-point 2
> -P-0003
> -P-0003 k-point 3
> -P-0003
> -P-0003 k-point 4
> -P-0003
> -P-0003 k-point 5
> -P-0003
> -P-0003 k-point 6
> -P-0003
> -P-0003 k-point 7
> -P-0003
> -P-0003 k-point 8
> -P-0003
> -P-0003 k-point 9
> -P-0003
> -P-0003 k-point 10
> -P-0003
> -P-0003 k-point 11
> -P-0003
> -P-0003 k-point 12
> -P-0003
> -P-0003 k-point 13
> -P-0003
> -P-0003 k-point 14
>
>
--
Riad Shaltaf UCL/SE/FSA/MAPR/PCPM
Tel: +32 (0)10 47 28 50 Bâtiment Boltzmann, a+1
Fax: +32 (0)10 47 34 52 1 place Croix du Sud
Mel: shaltaf@pcpm.ucl.ac.be 1348 Louvain-la-Neuve (Belgique)




Archive powered by MHonArc 2.6.16.

Top of Page