FDMNES (2015-10-28) with openmpi on Debian GNU/Linux 8 Jessie
Summary
A memorandum of compiling FDMNES (2015-10-28) on Debian GNU/Linux 8.0.
The fdmnes executable compiled with MUMPS and openmpi is too slow. It must be because the compilation procedure and options for it are too bad.
Environment
- OS
-
Debian GNU/Linux 8.0 Jessie
- Compiler
-
gfortran 4.9
- CPU
-
Intel Corei7 4770K (4 physical cores with HT) (ie. 8 threads)
Compiling FDMNES (sequential) with the original Gaussian solver
Editing mat_solve_gaussian.f90
integer:: i, i_newind, ia, ib, icheck, ie, igrph, ii, ipr, isp, ispin, ispinin, iv, j, jj, k, lb1i, lb1r, lb2i, lb2r, lm, lmaxso, & lms, MPI_host_num_for_mumps, mpirank0, natome, nbm, nbtm, ngrph, nicm, nim, nligne, nligne_i, nligneso, nlmagm, nlmmax, & nlmomax, nlmsam, nlmso, nlmso_i, nphiato1, nphiato7, npoint, & npsom, nsm, nso1, nsort, nsort_c, nsort_r, nsortf, nspin, nspino, nspinp, nspinr, nstm, nvois
Just split line 15 into like this.
integer:: i, i_newind, ia, ib, icheck, ie, igrph, ii, ipr, isp, ispin, ispinin, iv, j, jj, k, lb1i, lb1r, & lb2i, lb2r, lm, lmaxso, & lms, MPI_host_num_for_mumps, mpirank0, natome, nbm, nbtm, ngrph, nicm, nim, nligne, nligne_i, nligneso, nlmagm, nlmmax, & nlmomax, nlmsam, nlmso, nlmso_i, nphiato1, nphiato7, npoint, & npsom, nsm, nso1, nsort, nsort_c, nsort_r, nsortf, nspin, nspino, nspinp, nspinr, nstm, nvois
Makefile
Makefile for the sequential FDMNES with the original Gaussian solver can be like this.
FC = gfortran OPTLVL = 3 EXEC = ../fdmnes_gauss FFLAGS = -c -O$(OPTLVL) OBJ_GAUSS = main.o clemf0.o coabs.o convolution.o dirac.o fdm.o fprime.o general.o lecture.o mat.o metric.o \ minim.o optic.o potential.o selec.o scf.o spgroup.o sphere.o tab_data.o tddft.o tensor.o \ not_mpi.o mat_solve_gaussian.o sub_util.o all: $(EXEC) $(EXEC): $(OBJ_GAUSS) $(FC) -o $@ $^ %.o: %.f90 $(FC) -o $@ $(FFLAGS) $? clean: rm -f *.o $(EXEC) rm -f *.mod
mpih.f should be copied from the include directory.
Execute FDMNES (sequential)
I ran fdmnes with the first example file Sim/Test_stand/Cu.
Everything goes well.
Compiling FDMNES (openmpi) with the original Gaussian solver
Install openmpi
Makefile
Makefile for parallel FDMNES with the original Gaussian solver can be like this.
FC = mpif90 OPTLVL = 3 EXEC = ../fdmnes_gauss_openmpi FFLAGS = -c -O$(OPTLVL) OBJ_GAUSS = main.o clemf0.o coabs.o convolution.o dirac.o fdm.o fprime.o general.o lecture.o mat.o metric.o \ minim.o optic.o potential.o selec.o scf.o spgroup.o sphere.o tab_data.o tddft.o tensor.o \ mat_solve_gaussian.o sub_util.o all: $(EXEC) $(EXEC): $(OBJ_GAUSS) $(FC) -o $@ $^ %.o: %.f90 $(FC) -o $@ $(FFLAGS) $? clean: rm -f *.o $(EXEC) rm -f *.mod
Please note not_mpi.o is not needed any more.
Execute FDMNES (parallel) with the original Gaussian solver
Everything goes well.
Compiling MUMPS (sequential)
Download MUMPS from http://mumps-solver.org/ .
Install dependencies
Make all libraries
Install dependencies and make libraries
$ sudo apt-get install libmetis-dev libscotch-dev $ cd /path/to/MUMPS_5.0.1 $ cp Make.inc/Makefile.debian.SEQ Makefile.inc $ make all $ cp libseq/libmpiseq.a lib
Please note you should not install libparmetis-dev and libptscotch-dev.
At least, on my machine, the linking to these libraries failed.
Compiling FDMNES (sequential) with MUMPS
Makefile for the sequential FDMNES with MUMPS can be like this.
Makefile
FC = gfortran OPTLVL = 3 EXEC = ../fdmnes BIBDIR = /path/to/MUMPS_5.0.1/lib FFLAGS = -O$(OPTLVL) -c OBJ = main.o clemf0.o coabs.o convolution.o dirac.o fdm.o fprime.o general.o lecture.o mat.o metric.o \ minim.o optic.o potential.o selec.o scf.o spgroup.o sphere.o tab_data.o tddft.o tensor.o \ mat_solve_mumps.o all: $(EXEC) $(EXEC): $(OBJ) $(FC) -o $@ $^ -L$(BIBDIR) -ldmumps -lzmumps -lmumps_common -lmpiseq -lmetis -lpord \ -lesmumps -lscotch -lscotcherr -lpthread -llapack -lblas %.o: %.f90 $(FC) -o $@ $(FFLAGS) $? clean: rm -f *.o $(EXEC) rm -f *.mod
Execute FDMNES (sequential) with MUMPS
Calculation goes well, HOWEVER, the %system cpu usage is VERY LARGE.
Performance of self compiled FDMNES (sequential) with MUMPS and fdmnes_linux64
I ran fdmnes_linux64 and self-compiled fdmnes with the first example file Sim/Test_stand/Cu .
$ time ./fdmnes_linux64 ... real 0m8.335s user 0m23.800s sys 0m0.488s $ time ./fdmnes ... real 0m11.262s user 0m25.988s sys 0m58.796s
CPU usage of fdmnes_linux64 and self compiled fdmnes are about 400% (4 cores) and 800% (8 cores), respectively. The self compiled fdmnes is a bit slower than fdmnes_linux64. It can be ascribed to the difference of compilers, gfortran for fdmnes and ifort for fdmnes_linux64.
In addition, the %system cpu usage of fdmnes is really high (70%) for the self compiled fdmnes. I'm not sure why, but it can lower the performance.
Compiling FDMNES (openmpi) with MUMPS
Install dependencies and make libraries
Please note you should not install libparmetis-dev and libptscotch-dev.
At least, on my machine, the linking to these libraries failed.
Makefile
Makefile for FDMNES with MUMPS can be like this.
FC = mpif90 OPTLVL = 3 EXEC = ../fdmnes_openmpi BIBDIR = /path/to/MUMPS_5.0.1/lib FFLAGS = -O$(OPTLVL) -c OBJ = main.o clemf0.o coabs.o convolution.o dirac.o fdm.o fprime.o general.o lecture.o mat.o metric.o \ minim.o optic.o potential.o selec.o scf.o spgroup.o sphere.o tab_data.o tddft.o tensor.o \ mat_solve_mumps.o all: $(EXEC) $(EXEC): $(OBJ) $(FC) -o $@ $^ -L$(BIBDIR) -ldmumps -lzmumps -lmumps_common -lmetis -lpord \ -lesmumps -lscotch -lscotcherr -lpthread -llapack -lblas \ -lscalapack-openmpi -lblacs-openmpi -lblacsF77init-openmpi \ -lblacsCinit-openmpi -lmpi -lmpi_f77 %.o: %.f90 $(FC) -o $@ $(FFLAGS) $? clean: rm -f *.o $(EXEC) rm -f *.mod
Please note you need all the headers provided in include directory except for mpif.h .
Execute FDMNES (parallel) with MUMPS
Calculation looks go well, HOWEVER, the %system cpu usage is TOO LARGE again.
Performance of self compiled FDMNES (sequential) with MUMPS and fdmnes_linux64
I ran fdmnes_linux64 and self-compiled fdmnes with the first example file Sim/Test_stand/Cu .
$ time ./fdmnes_linux64 real 0m8.335s user 0m23.800s sys 0m0.488s $ time ./fdmnes real 0m11.211s user 0m26.224s sys 0m58.484s $ time mpirun -np 8 ./fdmnes_openmpi real 0m32.581s user 2m5.808s sys 2m6.408s
Unfortunately, the fdmnes with openmpi and MUMPS (, which can be the fastest I hope) is the SLOWEST executable... The %system cpu usage of fdmnes_openmpi is really high (50%) again.