Skip to Content.
Sympa Menu

forum - Re: RE : [abinit-forum] Band/FFT parallelism on large systems

forum@abinit.org

Subject: The ABINIT Users Mailing List ( CLOSED )

List archive

Re: RE : [abinit-forum] Band/FFT parallelism on large systems


Chronological Thread 
  • From: DELAVEAU Muriel <muriel.delaveau@cea.fr>
  • To: forum@abinit.org
  • Subject: Re: RE : [abinit-forum] Band/FFT parallelism on large systems
  • Date: Wed, 24 Jun 2009 10:10:35 +0200
  • Organization: CEA-DAM

In the uptodate version of trunk 5.9.0 I found some trouble when i compile the code with ifort disappearing if i compile without any optimisation
the file src/51_manage_mpi/xderivewrite.F90 .
It might be the optimiser or some trouble with memory somewhere in the code. The problem desappear also when I change
the location of (displace(:)-1) * nboct * n1
like that

subroutine xderiveWrite_dp2d_mpio_displ(wff,xval,n1,n2,ierr,spaceComm,displace)

use defs_basis
use defs_datatypes


implicit none

#if defined MPI_IO
#ifndef __VMS
include 'mpif.h'
#endif
integer :: statux(MPI_STATUS_SIZE)
#endif
type(wffile_type),intent(inout) :: wff
integer,intent(in) :: n1,n2,spaceComm
integer,intent(out) :: ierr
real(dp),intent(in):: xval(:,:)
integer,intent(in):: displace(:)

integer(abinit_offset) :: nboct,totoct
integer(abinit_offset) :: posit
integer(abinit_offset),allocatable :: dispoct(:)
integer :: i1,i2
real(dp), allocatable :: val(:)

ierr=0

#if defined MPI_IO

allocate(dispoct(n2))
allocate(val(n1))

nboct = wff%nbOct_dp
dispoct(:) = (displace(:)-1) * nboct * n1

do i2=1,n2
posit = wff%offwff + dispoct(i2)
do i1=1,n1
val(i1) = xval(i1,i2)
enddo
call MPI_FILE_WRITE_AT(wff%fhwff,posit,val,n1,MPI_DOUBLE_PRECISION,statux,ierr)
! call MPI_FILE_WRITE_AT(wff%fhwff,posit,xval(1,i2),n1,MPI_DOUBLE_PRECISION,statux,ierr)
enddo


! total offset
nboct = nboct * n1 * n2
call MPI_ALLREDUCE(nboct,totoct,1,MPI_INTEGER8,MPI_SUM,spaceComm,ierr)

wff%offwff = wff%offwff + totoct

deallocate(dispoct)
deallocate(val)

#endif

end subroutine xderiveWrite_dp2d_mpio_displ


Anyway , I work on it to see if i reproduce your problem in my kind of station

Muriel

David Waroquiers a écrit :
Hello,

I tried with the last 5.9.0 (revision 485) and it crashed (even for few
bands : nband 256) with 8 cpus. Note that I decreased the number of
steps to speed up the test (but it doesnt matter anyway).

The end of the log file is at the end of the message.

Any suggestion ?

David

scprqt: WARNING -
nstep= 5 was not enough SCF cycles to converge;
potential residual= 4.649E+00 exceeds tolvrs= 1.000E-12

ioarr: writing density data
ioarr: file name is asio2test_para8o_DS1_DEN
ioarr: data written to disk file asio2test_para8o_DS1_DEN
-P-0000 leave_test : synchronization done...
================================================================================

----iterations are completed or convergence reached----

outwf : write wavefunction to file asio2test_para8o_DS1_WFK
-P-0000 leave_test : synchronization done...
[node008:27384] *** Process received signal ***
[node008:27385] *** Process received signal ***
[node008:27385] Signal: Segmentation fault (11)
[node008:27385] Signal code: Address not mapped (1)
[node008:27385] Failing at address: 0xc2c37e54
[node008:27389] *** Process received signal ***
[node008:27389] Signal: Segmentation fault (11)
[node008:27389] Signal code: Address not mapped (1)
[node008:27389] Failing at address: 0xb2ee4574
[node008:27390] *** Process received signal ***
[node008:27390] Signal: Segmentation fault (11)
[node008:27390] Signal code: Address not mapped (1)
[node008:27390] Failing at address: 0xb83e1914
[node008:27391] *** Process received signal ***
[node008:27391] Signal: Segmentation fault (11)
[node008:27391] Signal code: Address not mapped (1)
[node008:27391] Failing at address: 0xb83c32e4
[node008:27386] *** Process received signal ***
[node008:27386] Signal: Segmentation fault (11)
[node008:27386] Signal code: Address not mapped (1)
[node008:27386] Failing at address: 0xc9cb7dd4
[node008:27387] *** Process received signal ***
[node008:27387] Signal: Segmentation fault (11)
[node008:27387] Signal code: Address not mapped (1)
[node008:27387] Failing at address: 0xb67bd5a4
[node008:27388] *** Process received signal ***
[node008:27388] Signal: Segmentation fault (11)
[node008:27388] Signal code: Address not mapped (1)
[node008:27388] Failing at address: 0xc2f74d94
[node008:27384] Signal: Segmentation fault (11)
[node008:27384] Signal code: Address not mapped (1)
[node008:27384] Failing at address: 0xbdd70364
[node008:27389] [ 0] /lib64/libpthread.so.0 [0x3db920e4c0]
[node008:27389]
[ 1]
/home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(xderivewrite_int2d_mpio_displ_+0x939)
[0x148614f]
[node008:27389]
[ 2] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(writewf_
+0xe96) [0x11fe522]
[node008:27389]
[ 3] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(rwwf_
+0x4e36) [0x11fd682]
[node008:27389]
[ 4] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(outwf_
+0x2983) [0x62d93b]
[node008:27389]
[ 5] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(gstate_
+0x16f5e) [0x46deae]
[node008:27389]
[ 6] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(driver_
+0xaec4) [0x451a64]
[node008:27389]
[ 7] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(MAIN__
+0x2f7d) [0x44348d]
[node008:27389]
[ 8] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(main+0x2a)
[0x440502]
[node008:27389] [ 9] /lib64/libc.so.6(__libc_start_main+0xf4)
[0x3db861d974]
[node008:27389]
[10] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(mpi_bcast_
+0x49) [0x440429]
[node008:27389] *** End of error message ***
[node008:27388] [ 0] /lib64/libpthread.so.0 [0x3db920e4c0]
[node008:27388]
[ 1]
/home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(xderivewrite_int2d_mpio_displ_+0x939)
[0x148614f]
[node008:27388]
[ 2] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(writewf_
+0xe96) [0x11fe522]
[node008:27388]
[ 3] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(rwwf_
+0x4e36) [0x11fd682]
[node008:27388]
[ 4] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(outwf_
+0x2983) [0x62d93b]
[node008:27388]
[ 5] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(gstate_
+0x16f5e) [0x46deae]
[node008:27388]
[ 6] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(driver_
+0xaec4) [0x451a64]
[node008:27388]
[ 7] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(MAIN__
+0x2f7d) [0x44348d]
[node008:27388]
[ 8] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(main+0x2a)
[0x440502]
[node008:27388] [ 9] /lib64/libc.so.6(__libc_start_main+0xf4)
[0x3db861d974]
[node008:27388]
[10] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(mpi_bcast_
+0x49) [0x440429]
[node008:27388] *** End of error message ***
[node008:27391] [ 0] /lib64/libpthread.so.0 [0x3db920e4c0]
[node008:27391]
[ 1]
/home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(xderivewrite_int2d_mpio_displ_+0x939)
[0x148614f]
[node008:27391]
[ 2] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(writewf_
+0xe96) [0x11fe522]
[node008:27391]
[ 3] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(rwwf_
+0x4e36) [0x11fd682]
[node008:27391]
[ 4] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(outwf_
+0x2983) [0x62d93b]
[node008:27391]
[ 5] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(gstate_
+0x16f5e) [0x46deae]
[node008:27391]
[ 6] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(driver_
+0xaec4) [0x451a64]
[node008:27391]
[ 7] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(MAIN__
+0x2f7d) [0x44348d]
[node008:27391]
[ 8] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(main+0x2a)
[0x440502]
[node008:27391] [ 9] /lib64/libc.so.6(__libc_start_main+0xf4)
[0x3db861d974]
[node008:27391]
[10] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(mpi_bcast_
+0x49) [0x440429]
[node008:27391] *** End of error message ***
[node008:27390] [ 0] /lib64/libpthread.so.0 [0x3db920e4c0]
[node008:27390]
[ 1]
/home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(xderivewrite_int2d_mpio_displ_+0x939)
[0x148614f]
[node008:27390]
[ 2] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(writewf_
+0xe96) [0x11fe522]
[node008:27390]
[ 3] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(rwwf_
+0x4e36) [0x11fd682]
[node008:27390]
[ 4] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(outwf_
+0x2983) [0x62d93b]
[node008:27390]
[ 5] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(gstate_
+0x16f5e) [0x46deae]
[node008:27390]
[ 6] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(driver_
+0xaec4) [0x451a64]
[node008:27390]
[ 7] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(MAIN__
+0x2f7d) [0x44348d]
[node008:27390]
[ 8] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(main+0x2a)
[0x440502]
[node008:27390] [ 9] /lib64/libc.so.6(__libc_start_main+0xf4)
[0x3db861d974]
[node008:27390]
[10] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(mpi_bcast_
+0x49) [0x440429]
[node008:27390] *** End of error message ***
[node008:27386] [ 0] /lib64/libpthread.so.0 [0x3db920e4c0]
[node008:27386]
[ 1]
/home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(xderivewrite_int2d_mpio_displ_+0x939)
[0x148614f]
[node008:27386]
[ 2] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(writewf_
+0xe96) [0x11fe522]
[node008:27386]
[ 3] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(rwwf_
+0x4e36) [0x11fd682]
[node008:27386]
[ 4] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(outwf_
+0x2983) [0x62d93b]
[node008:27386]
[ 5] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(gstate_
+0x16f5e) [0x46deae]
[node008:27386]
[ 6] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(driver_
+0xaec4) [0x451a64]
[node008:27386]
[ 7] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(MAIN__
+0x2f7d) [0x44348d]
[node008:27386]
[ 8] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(main+0x2a)
[0x440502]
[node008:27386] [ 9] /lib64/libc.so.6(__libc_start_main+0xf4)
[0x3db861d974]
[node008:27386]
[10] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(mpi_bcast_
+0x49) [0x440429]
[node008:27386] *** End of error message ***
[node008:27385] [ 0] /lib64/libpthread.so.0 [0x3db920e4c0]
[node008:27385]
[ 1]
/home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(xderivewrite_int2d_mpio_displ_+0x939)
[0x148614f]
[node008:27385]
[ 2] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(writewf_
+0xe96) [0x11fe522]
[node008:27385]
[ 3] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(rwwf_
+0x4e36) [0x11fd682]
[node008:27385]
[ 4] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(outwf_
+0x2983) [0x62d93b]
[node008:27385]
[ 5] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(gstate_
+0x16f5e) [0x46deae]
[node008:27385]
[ 6] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(driver_
+0xaec4) [0x451a64]
[node008:27385]
[ 7] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(MAIN__
+0x2f7d) [0x44348d]
[node008:27385]
[ 8] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(main+0x2a)
[0x440502]
[node008:27385] [ 9] /lib64/libc.so.6(__libc_start_main+0xf4)
[0x3db861d974]
[node008:27385]
[10] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(mpi_bcast_
+0x49) [0x440429]
[node008:27385] *** End of error message ***
[node008:27387] [ 0] /lib64/libpthread.so.0 [0x3db920e4c0]
[node008:27384] [ 0] /lib64/libpthread.so.0 [0x3db920e4c0]
[node008:27384]
[ 1]
/home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(xderivewrite_int2d_mpio_displ_+0x939)
[0x148614f]
[node008:27384]
[ 2] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(writewf_
+0xe96) [0x11fe522]
[node008:27384]
[ 3] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(rwwf_
+0x4e36) [0x11fd682]
[node008:27384]
[ 4] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(outwf_
+0x2983) [0x62d93b]
[node008:27384]
[ 5] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(gstate_
+0x16f5e) [0x46deae]
[node008:27384]
[ 6] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(driver_
+0xaec4) [0x451a64]
[node008:27387]
[ 1]
/home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(xderivewrite_int2d_mpio_displ_+0x939)
[0x148614f]
[node008:27384]
[ 7] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(MAIN__
+0x2f7d) [0x44348d]
[node008:27384]
[ 8] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(main+0x2a)
[0x440502]
[node008:27384] [ 9] /lib64/libc.so.6(__libc_start_main+0xf4)
[0x3db861d974]
[node008:27384]
[10] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(mpi_bcast_
+0x49) [0x440429]
[node008:27384] *** End of error message ***
[node008:27387]
[ 2] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(writewf_
+0xe96) [0x11fe522]
[node008:27387]
[ 3] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(rwwf_
+0x4e36) [0x11fd682]
[node008:27387]
[ 4] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(outwf_
+0x2983) [0x62d93b]
[node008:27387]
[ 5] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(gstate_
+0x16f5e) [0x46deae]
[node008:27387]
[ 6] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(driver_
+0xaec4) [0x451a64]
[node008:27387]
[ 7] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(MAIN__
+0x2f7d) [0x44348d]
[node008:27387]
[ 8] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(main+0x2a)
[0x440502]
[node008:27387] [ 9] /lib64/libc.so.6(__libc_start_main+0xf4)
[0x3db861d974]
[node008:27387]
[10] /home/pcpm/waroquiers/590_r485bis/abinit/5.9/bin/abinip(mpi_bcast_
+0x49) [0x440429]
[node008:27387] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 6 with PID 27390 on node node008 exited
on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

On Mon, 2009-06-22 at 12:01 +0200, TORRENT Marc wrote:
Hi again David,

To simplify...
Could you test with the last 5.9/trunk/5.9.0-public ?
The treatment of MPI_IO WFK has been changed for the 5.9 and it's more
convenient for us to debug this last branch.
Thanks

Marc


Marc.TORRENT@cea.fr a écrit :
Hi David,
1) Did you try with the last corrections contained in the last 5.8.3
bzr revision or 5.8.4 (at least revision 507) ? Muriel Delaveau and
I found several improvements for the writing of the WFK file with
MPI-IO; these corrections improve the portability of the code and
have been merged in the revision 507. We found that the code
produced crashes on several architectures because of wrong treatment
of buffers... and hope to have correct that.

2) If you want to be able to use the file with anaddb you have to : - use the last 5.8.3 branch (or 5.8.4) or - use the --enable-mpi-io-buggy option when building the code
(unuseful after 5.8.3 rev507); but you could have buffer problems in
that case.


We tested the new changes on ifort and gcc43 with mpich, open-mpi.
In band-fft, the memory is splitted for the wfk but not for other
quantities, especially if you use PAW. We plan to correct that soon.

Marc



-------- Message d'origine-------- De: David Waroquiers [mailto:david.waroquiers@uclouvain.be] Date: ven. 19/06/2009 12:29 À: forum@abinit.org Objet : [abinit-forum] Band/FFT parallelism on large systems Hello all,
I have tried to use the band/fft parallelism on a large supercell
(a-SiO2, 72 atoms and 108 atoms). I encountered a problem while using a lot of
bands (4480 bands). It reaches convergences but crashes at the end of the run
when it is supposed to write the WFK file (outwf call). I tried to run a
calculation with 16, 32 and 64 processors.
I have tried with fewer bands (640) and it works. Do you have any idea how to overcome this problem ? The WFK file is
supposed to be 4 GB and the available memory on the clusters is more than that.
By the way, in the band/fft parallelism approach, the memory for the wfk is
split into the different cpus, isn't it ?
I encountered another problem while using cut3d to analyse the wfk
generated with the band/fft parallelism. It does not recognise the file as a
valid wfk file (about the same message as when band/fft parallelism didn't
allow to restart with a different number of processors, before version 5.8 if
I'm right). Any idea too ?
My input file is hereafter and the log messages are after the input
file. I'm using public version 5.8.3, revision 485 and the machines used are
the "green" clusters in UCL : 102 Dual Quad-Core Xeon L5420/2.5GHz in Dell
Blade M1000e with 16 GB (or 32 GB for some nodes) per node of 8 processors.
Thanks a lot
David Waroquiers PhD Student UCL - PCPM - ETSF



My input file :
# Amorphous SiO2 : Generation of the WFK file needed for the KSS
(for GW corrections) # Dataset 1 : GS calculation (_DEN generation) # Dataset 2 : GS calculation with many bands (_WFK generation)
ndtset 2 jdtset 1 2 timopt 2
# Dataset 1 : _DEN file generation (Density)
tolvrs1 1.0d-12 prtden1 1 nstep1 5 #5 for testing iscf1 7 npulayit1 7 nband1 256
# Dataset 2 : _WFK file (Wavefunction)
tolwfr2 1.0d-12 nband2 4480 nbdbuf2 384 istwfk2 1 iscf2 7 nstep2 5 #5 for testing getden2 1
# Options for Band/FFT Parallelism
paral_kgb 1 wfoptalg 14 nloalg 4 fftalg 401 iprcch 4 intxc 0 istwfk 1 fft_opt_lob 2 npfft 1 npband 16 #32 #64
# K-point mesh
kptopt 0 kpt 0.0 0.0 0.0
# System definition # Unit cell
acell 1.9465690950E+01 1.9465690950E+01 1.9465690950E+01 rprim 1 0 0 0 1 0 0 0 1
# Atom types
ntypat 2 znucl 8 14
# Atoms and coordinates
natom 72 typat 48*1 24*2 xcart 1.8342971905E+01 1.0013093348E+01 4.9948115472E+00 1.8450118788E+01 5.1100335358E+00 1.1410341879E+01 3.0243029960E+00 1.7006888337E+01 1.0689037523E+01 6.3068666011E+00 1.4446482399E+01 7.9505060279E+00 1.9178811503E+01 4.4712567836E-01 3.4641995090E+00 9.5178783093E+00 1.2762912471E+01 1.4947329016E+01 1.7402433472E+01 4.7067303120E+00 1.5833402903E+00 1.0623164695E+01 2.7299953166E+00 8.7471659694E+00 1.2931573871E+01 1.8128981231E+01 6.7007362518E+00 1.8660924236E+01 1.4792395464E+01 3.1319031106E+00 7.0217232014E+00 6.3190579071E+00 2.1266991430E+00 4.1181163909E-01 5.0929210080E+00 5.7193503290E+00 7.6209880479E+00 1.5443775482E+00 6.1023412080E-01 1.7923134211E+01 9.4056919719E+00 1.3628670860E+01 1.4710748045E+01 9.1118601940E+00 1.7566857742E+01 1.0411995344E+01 1.0041061607E+00 1.5870123306E+01 1.0980496920E+01 1.3629862231E+01 6.8821852197E+00 1.2756648650E+01 9.3922889131E+00 1.2966781879E+01 1.3710153187E+01 2.2151381385E+00 1.9176017166E+01 5.9015247795E+00 1.8254646045E+01 1.6364133902E+01 3.5889689987E+00 8.6729161022E+00 4.9047876611E+00 1.4649631278E+01 1.1782133781E+01 2.4189697381E+00 1.3094524372E+01 1.5574388332E+01 1.1017906884E+01 1.8122798453E+00 1.5904671691E+01 1.5390374184E+01 8.0934509994E+00 9.9606459884E+00 5.6351418737E+00 1.0388873243E+01 1.1258002356E+01 1.9535431306E+01 1.7801695829E+01 1.5681701759E+01 1.1954743795E+01 2.9395289639E+00 3.6212308778E+00 1.4808160737E+00 1.3785141980E+01 3.1146153451E+00 4.7897808777E+00 6.6125694236E+00 3.8955369666E+00 1.1802613942E+01 1.0543336669E+00 8.8480531151E+00 9.2302571597E+00 1.5034376672E+01 1.3207034271E+01 1.5126390258E+01 1.9223920516E+01 6.5595988246E-01 1.3020475817E+01 6.6553078921E+00 5.1934209327E+00 6.9894256581E+00 1.4918361618E+01 3.1596212425E+00 1.4324193688E+01 8.0804273193E+00 7.9884008127E+00 1.4307619386E+01 5.7570753518E+00 1.3551949199E+01 1.8079850277E+01 1.2833144388E+01 6.9576781789E+00 1.8702339976E+00 2.7890960157E+00 1.7032376017E+01 6.7568473875E-01 8.6760457768E+00 1.0859908527E+01 1.0407253204E+01 6.3690907257E+00 2.2769273004E-01 8.2629069843E+00 1.4623475391E+01 1.7952319809E+01 1.5406783784E+01 1.5775821227E+01 1.3896960139E+01 6.9539101570E+00 1.5477566296E+01 1.7519166868E+00 9.5117606862E+00 3.1098755647E+00 7.2414373656E+00 1.3444441571E+01 2.9576688783E+00 1.7045648497E+01 5.6738016905E+00 9.7659864282E+00 1.6334927247E+01 1.8709494220E+01 5.6015780233E+00 4.6820174692E+00 1.6849684158E+01 1.3193293623E+01 1.6296954721E+00 7.4269549058E+00 4.0153579861E+00 1.6089810803E+01 1.7511617105E+01 1.5080013653E+01 1.5674902127E+01 1.3378910449E+01 1.3468917372E+01 1.2396666756E+00 1.6246453919E+01 3.1443097800E-01 3.4518653529E+00 3.0738155414E+00 1.6864054813E+01 1.2620177700E+01 4.3720388857E+00 1.3252228290E+01 1.5343974821E+01 7.9284144425E+00 8.4872425534E+00 1.9054897865E+01 1.7815010425E+01 1.5170448087E+01 1.0186883021E+01 1.4748027393E+01 7.5516402653E+00 3.0013700719E+00 8.9766200084E+00 6.4090355722E+00 7.4843741588E+00 4.9295671605E+00 4.3705611827E-01 7.6893073781E+00 1.2001555962E+01 8.4238741309E+00 1.2232714786E+01 7.6995337657E+00 5.8387974184E+00 5.9155119378E+00 1.4039991791E+01 1.3107235988E+01 9.8055489044E+00 6.4400593019E-01 8.3270647814E-01 1.7227458132E+01 1.2775664290E+01 1.4372625432E+01 4.2560000137E+00 1.9730406948E+00 5.7914453145E+00 4.0664955533E+00 3.9036518542E-01 9.7815513593E+00 1.0257448955E+01 1.3164763822E+01 1.6979663973E+01 2.5757556368E+00 1.2070399003E+01 9.1476280310E-01 8.1192625454E+00 6.2498371664E+00 8.8902943261E+00 1.3433615492E+01 1.7894990037E+01 4.7238007437E+00 1.7074503731E+01 8.1487422033E+00 1.1337419675E+00 1.7170180156E+01 3.2442179093E+00
# Energy cutoff for the planewaves
ecut 32.0
# Parameters for the SCF cycles
nstep 5 diemac 4.0 ixc 11





Here is the end of the message log for the 72 atoms cell with 4480
bands run on 16 processors :
================================================================================
----iterations are completed or convergence reached----
outwf : write wavefunction to file asio2test_para16_4096o_DS2_WFK -P-0000 leave_test : synchronization done... -------------------------------------------------------------------------- MPI_ABORT was invoked on rank 15 in communicator MPI_COMM_WORLD with errorcode 1.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. You may or may not see output from other processes, depending on exactly when Open MPI kills them. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun has exited due to process rank 1 with PID 6159 on node node054 exiting without calling "finalize". This may have caused other processes in the application to be terminated by signals sent by mpirun (as reported here). -------------------------------------------------------------------------- [green:29817] 15 more processes have sent help message
help-mpi-api.txt / mpi-abort [green:29817] Set MCA parameter "orte_base_help_aggregate" to 0 to
see all help / error messages




Here is the end of the log file I got from the 108 atoms cell (with
3200 bands) run on 32 processors :
================================================================================
----iterations are completed or convergence reached----
outwf : write wavefunction to file as108nr_004o_DS2_WFK -P-0000 leave_test : synchronization done... [node090:26797] *** Process received signal *** [node090:26797] Signal: Segmentation fault (11) [node090:26797] Signal code: Address not mapped (1) [node090:26797] Failing at address: 0x2 [node090:26797] [ 0] /lib64/libpthread.so.0 [0x395300e4c0] [node090:26797] [ 1] /cvos/shared/apps/openmpi/intel/64/1.3.1/lib64/libmpi.so.0(ompi_ddt_add+0x6c1) [0x2b389ea74951] [node090:26797] [ 2] /cvos/shared/apps/openmpi/intel/64/1.3.1/lib64/libmpi.so.0(ompi_ddt_create_indexed_block+0x1b3) [0x2b389ea750c3] [node090:26797] [ 3] /cvos/shared/apps/openmpi/intel/64/1.3.1/lib64/libmpi.so.0(MPI_Type_create_indexed_block+0xb8) [0x2b389ea9d848] [node090:26797] [ 4] /cvos/shared/apps/openmpi/intel/64/1.3.1/lib64/libmpi_f77.so.0(mpi_type_create_indexed_block_f+0x38) [0x2b389e828780] [node090:26797] [ 5] /home/pcpm/waroquiers/583/abinit/5.8/bin/abinip(wffwritecg_+0xc29)
[0x10823a9] [node090:26797] [ 6] /home/pcpm/waroquiers/583/abinit/5.8/bin/abinip(writewf_+0x1ef9)
[0x107f0b7] [node090:26797] [ 7] /home/pcpm/waroquiers/583/abinit/5.8/bin/abinip(rwwf_+0x3e88)
[0x107d1b4] [node090:26797] [ 8] /home/pcpm/waroquiers/583/abinit/5.8/bin/abinip(outwf_+0x2616)
[0x5f5a72] [node090:26797] [ 9] /home/pcpm/waroquiers/583/abinit/5.8/bin/abinip(gstate_+0x15074)
[0x4675cc] [node090:26797] [10] /home/pcpm/waroquiers/583/abinit/5.8/bin/abinip(driver_+0x740e)
[0x44ea52] [node090:26797] [11] /home/pcpm/waroquiers/583/abinit/5.8/bin/abinip(MAIN__+0x52a6)
[0x4448f6] [node090:26797]
[12] /home/pcpm/waroquiers/583/abinit/5.8/bin/abinip(main+0x2a) [0x43f642] [node090:26797] [13] /lib64/libc.so.6(__libc_start_main+0xf4)
[0x395281d974] [node090:26797] [14] /home/pcpm/waroquiers/583/abinit/5.8/bin/abinip
[0x43f569] [node090:26797] *** End of error message *** -------------------------------------------------------------------------- mpirun noticed that process rank 8 with PID 26797 on node node090
exited on signal 11 (Segmentation fault). --------------------------------------------------------------------------





Archive powered by MHonArc 2.6.16.

Top of Page