Skip to main content

FDMNES (2015-10-28) with openmpi on Debian GNU/Linux 8 Jessie

Summary

A memorandum of compiling FDMNES (2015-10-28) on Debian GNU/Linux 8.0.

The fdmnes executable compiled with MUMPS and openmpi is too slow. It must be because the compilation procedure and options for it are too bad.

Environment

OS

Debian GNU/Linux 8.0 Jessie

Compiler

gfortran 4.9

CPU

Intel Corei7 4770K (4 physical cores with HT) (ie. 8 threads)

Compiling FDMNES (sequential) with the original Gaussian solver

Editing mat_solve_gaussian.f90

integer:: i, i_newind, ia, ib, icheck, ie, igrph, ii, ipr, isp, ispin, ispinin, iv, j, jj, k, lb1i, lb1r, lb2i, lb2r, lm, lmaxso, &
  lms, MPI_host_num_for_mumps, mpirank0, natome, nbm, nbtm, ngrph, nicm, nim, nligne, nligne_i, nligneso, nlmagm, nlmmax, &
  nlmomax, nlmsam, nlmso, nlmso_i, nphiato1, nphiato7, npoint, &
  npsom, nsm, nso1, nsort, nsort_c, nsort_r, nsortf, nspin, nspino, nspinp, nspinr, nstm, nvois

Just split line 15 into like this.

integer:: i, i_newind, ia, ib, icheck, ie, igrph, ii, ipr, isp, ispin, ispinin, iv, j, jj, k, lb1i, lb1r, &
  lb2i, lb2r, lm, lmaxso, &
  lms, MPI_host_num_for_mumps, mpirank0, natome, nbm, nbtm, ngrph, nicm, nim, nligne, nligne_i, nligneso, nlmagm, nlmmax, &
  nlmomax, nlmsam, nlmso, nlmso_i, nphiato1, nphiato7, npoint, &
  npsom, nsm, nso1, nsort, nsort_c, nsort_r, nsortf, nspin, nspino, nspinp, nspinr, nstm, nvois

Makefile

Makefile for the sequential FDMNES with the original Gaussian solver can be like this.

FC = gfortran
OPTLVL = 3

EXEC = ../fdmnes_gauss
FFLAGS = -c  -O$(OPTLVL)

OBJ_GAUSS = main.o clemf0.o coabs.o convolution.o dirac.o fdm.o fprime.o general.o lecture.o mat.o metric.o \
            minim.o optic.o potential.o selec.o scf.o spgroup.o sphere.o tab_data.o tddft.o tensor.o \
            not_mpi.o mat_solve_gaussian.o sub_util.o

all: $(EXEC)

$(EXEC): $(OBJ_GAUSS)
     $(FC) -o $@ $^

%.o: %.f90
     $(FC) -o $@ $(FFLAGS) $?

clean:
     rm -f *.o $(EXEC)
     rm -f *.mod

mpih.f should be copied from the include directory.

Execute FDMNES (sequential)

I ran fdmnes with the first example file Sim/Test_stand/Cu.

$ time ./fdmnes_gauss
...
real    0m40.395s
user    0m40.060s
sys     0m0.308s

Everything goes well.

Compiling FDMNES (openmpi) with the original Gaussian solver

Install openmpi

$ sudo apt-get install libopenmpi-dev openmpi-bin

Makefile

Makefile for parallel FDMNES with the original Gaussian solver can be like this.

FC = mpif90
OPTLVL = 3

EXEC = ../fdmnes_gauss_openmpi
FFLAGS = -c  -O$(OPTLVL)

OBJ_GAUSS = main.o clemf0.o coabs.o convolution.o dirac.o fdm.o fprime.o general.o lecture.o mat.o metric.o \
            minim.o optic.o potential.o selec.o scf.o spgroup.o sphere.o tab_data.o tddft.o tensor.o \
            mat_solve_gaussian.o sub_util.o

all: $(EXEC)

$(EXEC): $(OBJ_GAUSS)
     $(FC) -o $@ $^

%.o: %.f90
     $(FC) -o $@ $(FFLAGS) $?

clean:
     rm -f *.o $(EXEC)
     rm -f *.mod

Please note not_mpi.o is not needed any more.

Execute FDMNES (parallel) with the original Gaussian solver

$ mpirun -np 8 ./fdmnes_gauss_openmpi
...
real    0m23.965s
user    2m59.052s
sys     0m4.216s

Everything goes well.

Compiling MUMPS (sequential)

  1. Download MUMPS from http://mumps-solver.org/ .

  2. Install dependencies

  3. Make all libraries

Install dependencies and make libraries

$ sudo apt-get install libmetis-dev libscotch-dev
$ cd /path/to/MUMPS_5.0.1
$ cp Make.inc/Makefile.debian.SEQ Makefile.inc
$ make all
$ cp libseq/libmpiseq.a lib

Please note you should not install libparmetis-dev and libptscotch-dev.

At least, on my machine, the linking to these libraries failed.

Compiling FDMNES (sequential) with MUMPS

Makefile for the sequential FDMNES with MUMPS can be like this.

Makefile

FC = gfortran
OPTLVL = 3

EXEC = ../fdmnes

BIBDIR = /path/to/MUMPS_5.0.1/lib

FFLAGS = -O$(OPTLVL) -c

OBJ = main.o clemf0.o coabs.o convolution.o dirac.o fdm.o fprime.o general.o lecture.o mat.o metric.o \
      minim.o optic.o potential.o selec.o scf.o spgroup.o sphere.o tab_data.o tddft.o tensor.o \
      mat_solve_mumps.o

all: $(EXEC)

$(EXEC): $(OBJ)
     $(FC) -o $@ $^ -L$(BIBDIR) -ldmumps -lzmumps -lmumps_common -lmpiseq -lmetis -lpord \
                            -lesmumps -lscotch -lscotcherr -lpthread -llapack -lblas
%.o: %.f90
     $(FC) -o $@ $(FFLAGS) $?

clean:
     rm -f *.o $(EXEC)
     rm -f *.mod

Execute FDMNES (sequential) with MUMPS

$ time ./fdmnes
...
real    0m11.262s
user    0m25.988s
sys     0m58.796s

Calculation goes well, HOWEVER, the %system cpu usage is VERY LARGE.

Performance of self compiled FDMNES (sequential) with MUMPS and fdmnes_linux64

I ran fdmnes_linux64 and self-compiled fdmnes with the first example file Sim/Test_stand/Cu .

$ time ./fdmnes_linux64
...
real    0m8.335s
user    0m23.800s
sys     0m0.488s

$ time ./fdmnes
...
real    0m11.262s
user    0m25.988s
sys     0m58.796s

CPU usage of fdmnes_linux64 and self compiled fdmnes are about 400% (4 cores) and 800% (8 cores), respectively. The self compiled fdmnes is a bit slower than fdmnes_linux64. It can be ascribed to the difference of compilers, gfortran for fdmnes and ifort for fdmnes_linux64.

In addition, the %system cpu usage of fdmnes is really high (70%) for the self compiled fdmnes. I'm not sure why, but it can lower the performance.

Compiling FDMNES (openmpi) with MUMPS

Install dependencies and make libraries

$ cd /path/to/MUMPS_5.0.1
$ cp Make.inc/Makefile.debian.PAR Makefile.inc
$ make all

Please note you should not install libparmetis-dev and libptscotch-dev.

At least, on my machine, the linking to these libraries failed.

Makefile

Makefile for FDMNES with MUMPS can be like this.

FC = mpif90
OPTLVL = 3

EXEC = ../fdmnes_openmpi

BIBDIR = /path/to/MUMPS_5.0.1/lib

FFLAGS = -O$(OPTLVL) -c

OBJ = main.o clemf0.o coabs.o convolution.o dirac.o fdm.o fprime.o general.o lecture.o mat.o metric.o \
      minim.o optic.o potential.o selec.o scf.o spgroup.o sphere.o tab_data.o tddft.o tensor.o \
      mat_solve_mumps.o

all: $(EXEC)

$(EXEC): $(OBJ)
     $(FC) -o $@ $^ -L$(BIBDIR) -ldmumps -lzmumps -lmumps_common -lmetis -lpord \
                            -lesmumps -lscotch -lscotcherr -lpthread -llapack -lblas \
                            -lscalapack-openmpi -lblacs-openmpi -lblacsF77init-openmpi \
                            -lblacsCinit-openmpi -lmpi -lmpi_f77

%.o: %.f90
     $(FC) -o $@ $(FFLAGS) $?

clean:
     rm -f *.o $(EXEC)
     rm -f *.mod

Please note you need all the headers provided in include directory except for mpif.h .

Execute FDMNES (parallel) with MUMPS

$ time mpirun -np 8 ./fdmnes_openmpi
real    0m32.581s
user    2m5.808s
sys     2m6.408s

Calculation looks go well, HOWEVER, the %system cpu usage is TOO LARGE again.

Performance of self compiled FDMNES (sequential) with MUMPS and fdmnes_linux64

I ran fdmnes_linux64 and self-compiled fdmnes with the first example file Sim/Test_stand/Cu .

$ time ./fdmnes_linux64
real    0m8.335s
user    0m23.800s
sys     0m0.488s

$ time ./fdmnes
real    0m11.211s
user    0m26.224s
sys     0m58.484s

$ time mpirun -np 8 ./fdmnes_openmpi
real    0m32.581s
user    2m5.808s
sys     2m6.408s

Unfortunately, the fdmnes with openmpi and MUMPS (, which can be the fastest I hope) is the SLOWEST executable... The %system cpu usage of fdmnes_openmpi is really high (50%) again.