Build Trilinos 10.8.x on Ubuntu 11.04 using cmake

This is another learning notes recording what I have done for future reference.

As I tested in a previous post that matrix inversion using Amardillo C++ is no faster than Matlab/Octave, and the benefits of translating Matlab script to C++ code may lie on the potential of paralleling the C++ code. So a straightforward question would be why not translating the m-files with a parallel language or a parallel model in mind, just to save another round of trouble. At least, two libraries exist for this purpose: PETsc and Trinilos. One can translate the matlab scripts under the framework of either library. PETsc has been around since 1999 and was easily installed on ubuntu 11.04, whereas Trilinos started in 2003 and required some knowledge of compiling. I followed this blog which outlined the installation of Trilinos 10.6.x.

Step 1. Install all required math libraries Trilinos needed.

libatlas-base-dev				install
libatlas-dev					install
libatlas-doc					install
libatlas3gf-base				install
libblas-dev					install
libblas-doc					install
libblas3gf					install
liblapack-dev					install
liblapack-doc					install
liblapack3gf					install
libmumps-4.9.2					install
libmumps-dev					install
libmumps-scotch-4.9.2				install
libmumps-scotch-dev				install
libmumps-seq-4.9.2				install
libopenmpi-dev					install
libopenmpi1.3					install
libsuitesparse-dev				install
libsuperlu3					install
libsuperlu3-dev					install
libumfpack5.4.0					install

copy the above content to a file named “math.list” and use this command to install all missing libraries
cat math.list |sudo dpkg –set-selections && sudo apt-get dselect-upgrade

Step 2. I modified the one from Nuno Sucena Almeida

# uncompress trilinos package
~$ tar xf trilinos-10.8.4-Source.tar.gz
# get into source and create build directory
~$ cd trilinos-10.8.4-Source
~/trilinos-10.8.4-Source$ mkdir build && cd build
# copy configure script to build directory and run it
~/trilinos-10.8.4-Source/build$ cp ~/configure-trilinos-ubuntu.sh .
build$ chmod u+x configure-trilinos-ubuntu.sh
build$ ./configure-trilinos-ubuntu.sh
# start build process with 3 cores
build$ make -j3
build$ make install

Here is the content of configure-trilinos-ubuntu.sh:

#!/bin/bash
# (C) 2009-2011 Nuno Sucena Almeida
# (C) 2011 Nov Modified by JunweiHuang.info

# 1. mkdir BUILD && cd BUILD
# 2. rm -f CMakeCache.txt && ./configure-trilinos-ubuntu.sh
# 3. time make -j2 2>&1 |tee build.log

EXTRA_ARGS=$@
TRILINOS_HOME=~/data/trilinos/src/

INSTALL_DIR=~/data/trilinos/10.8.3

#SWIG was build from source, since 10.8.3 trilinos needs SWIG version > 2.0
SWIG_EXECUTABLE=/usr/local/bin/swig

#SUPERLU was not included
UMFPACK_INCLUDE=/usr/include/suitesparse
CMAKE_EXECUTABLE=/usr/bin/cmake

( time ${CMAKE_EXECUTABLE} 
  -D CMAKE_CXX_FLAGS:STRING="-pipe" 
  -D CMAKE_C_FLAGS:STRING="-pipe" 
  -D CMAKE_Fortran_FLAGS:STRING="-pipe" 
  -D CMAKE_INSTALL_PREFIX:PATH=${INSTALL_DIR} 
  -D PyTrilinos_INSTALL_PREFIX:PATH=${INSTALL_DIR} 
  -D CMAKE_BUILD_TYPE:STRING=DEBUG 
  -D TPL_ENABLE_MPI:BOOL=ON 
  -D TPL_ENABLE_UMFPACK:STRING=ON 
  -D TPL_UMFPACK_INCLUDE_DIRS:STRING=${UMFPACK_INCLUDE} 
  -D TPL_ENABLE_MUMPS:STRING=ON 
  -D BUILD_SHARED_LIBS:STRING=ON 
  -D Trilinos_ENABLE_ALL_PACKAGES:BOOL=ON 
  -D Trilinos_ENABLE_ALL_OPTIONAL_PACKAGES:BOOL=ON 
  -D Trilinos_ENABLE_Didasko:BOOL=ON 
  -D Trilinos_ENABLE_PyTrilinos:BOOL=ON 
  -D Trilinos_ENABLE_TESTS:BOOL=ON 
  -D SWIG_EXECUTABLE:FILEPATH=${SWIG_EXECUTABLE} 
  -D DART_TESTING_TIMEOUT:STRING=600 
  $EXTRA_ARGS 
  ${TRILINOS_HOME} ) 2>&1 |tee log

The end.

Matlab vs Octave vs Armadillo

Out of curiosity, I just conducted a toy comparison between Matlab, Octave and Armadillo, to convince myself that it is going to be paid off by translating some Matlab programs to Armadillo C++. Here are the details:
1. Matlab (2010b) running on Intel(TM) core i7-2600 (4 cores, 8 threads), Windows 7, 8 GB RAM;
2. Matlab (2011a) running on AMD(TM) Phenom II (4 cores, 4 threads), Window 7, 6 GB RAM;
3. GNU Octave, version 3.2.4, running on AMD Athlon(tm) 64 X2 Dual Core Processor 5000+, Ubuntu 11.04, 4 GB RAM;
4. Aramdillo C++ 2.3.91, running on AMD Athlon(tm) 64 X2 Dual Core Processor 5000+, Ubuntu 11.04, 4 GB RAM;
For C++, I compiled using “g++ testArmadillo.cpp -o testArmadillo -larmadillo -O2”

The test program on Matlab/Octave is

tic;A=randn(5000);B=randn(5000);C=AB;toc

The test program on Aramdillo C++ is

#include 
#include 

using namespace std;
using namespace arma;

int main(int argc, char** argv)
  {
  wall_clock timer;
  timer.tic();
  mat A = randu(5000,5000);
  mat B = randu(5000,5000);
  mat C;

  //cout << A*trans(B) << endl;
  C=solve(A,B);
  cout<<"Elapsed time is "<

Test results
1: Elapsed time is 11.648498 seconds.
2: Elapsed time is 12.639551 seconds.
3: Elapsed time is 42.7492 seconds.
4: Elapsed time is 40.4321 seconds.

Assuming 4 core CPU speeds up twice as the 2 core CPU, 3 and 4 shall be 21 seconds and 20 seconds, respectively. Does C++ really run faster than Matlab/Octave running on the same machine? I am not convinced yet. Possibly, the benefit of translating Matlab script to C++ is the potential of parallelization of the C++ program so that it can run on multiple workstations or a cluster.

Harvest AMD multi-core performance using ACML under Ubuntu 11.04

In the last poster I mentioned I selected ublas from boost C++ library for matrix inversion and other linear algebra operations. In my case, ublas is a quick solution by sacrificing performance. To further improve the performance, LAPACK is a better solution. The AMD vendor implementation of LAPACK, i.e., ACML can take advantage of the system processor and improve the performance dramatically.

With ubuntu 11.04, the workable ACML version shall be 4.4.0, and I installed it following this blog. Here I made a note for myself.
Step 1: Visit AMD download archives (link) and download version 4.4.0 for 64bit linux.
Step 2: tar -zxvf acml-4-4-0-gfortran-64bit.tgz
sudo ./install-acml-4-4-0-gfortran-64bit.sh
Step 3: Press enter to read the licence agreement (this is proprietary software)
and then type ‘accept’ when the prompt comes.
Enter /usr/local/lib/acml as the alternative path for the library.
Step 4: Configure acml as the default path. To do this, we will use the alternatives system in debian. It allows us to set up symbolic links to prioritize the packages that provide a particular library or executable interface. We are going to tell Ubuntu to use the multithreaded ACML to provide LAPACK and BLAS.

sudo update-alternatives –install /usr/lib/libblas.so.3gf libblas.so.3gf /usr/local/lib/acml/gfortran64_mp/lib/libacml_mp.so 60
sudo update-alternatives –install /usr/lib/liblapack.so.3gf liblapack.so.3gf /usr/local/lib/acml/gfortran64_mp/lib/libacml_mp.so 60

Step 5: “Lastly, libacml_mv.so is used by libacml_mp.so, so it needs to be loadable in your path. You could add /usr/local/lib/acml/gfortran64_mp/lib to the path, but that’s a little bit tricky. Since we are already using symbolic links, we will make a symbolic ink to it in your /usr/local/lib so it can be loaded dynamically. ”

sudo ln -s /usr/local/lib/acml/gfortran64_mp/lib/libacml_mv.so /usr/local/lib/libacml_mv.so

Step 6: Not there yet! If your system LD_LIBRARY_PATH does not include /usr/local/lib, you should add it. In my case, I added one file “acml.conf” under /etc/ld.so.conf.d/. The content of acml.conf is just the path of the ACML library. In this case, /usr/local/lib/acml/gfortran64_mp/lib. After that, remember typing “sudo ldconfig”

Step 7: I tested in Octave by running the examples from the blog blog. Checking the CPU info from top, I did observe that B=A*A occupied both cores 100%.

A=randn(5000);
B=A*A;

Matrix Inversion for complex number on C++

To implement Discrete Wavenumber Method (DWM) on C++, I selected boost library + FFTW. Because DWM needs support of complex number, matrix inversion and FFT, all of which can be found in the boost, FFTW & STL library. However I later realized boost ublas does not handle matrices with large condition number. For example, A=[6,6]
(((0,0.000109621),(0.0167211,-0.000109621),(0.250979,2.90726e-15),(-0.250979,-38.2832),(-0,-0),(0,0)),
((-9.33491e-05,0),(9.33491e-05,-0.0213859),(2.90726e-15,0.213724),(-48.9633,-0.213724),(0,0),(0,0)),
((0,-0),(0,0),(0,0),(0,0),(-1.26981e-18,0.000109621),(0.0167211,-0.000109621)),
((-0,-4.86612e+08),(5.4689e+10,4.86612e+08),(-1.11411e+12,0.00657373),(1.11411e+12,-1.25212e+14),(0,-0),(0,0)),
((3.96682e+08,0),(-3.96682e+08,-6.05079e+10),(0.0105204,-9.08209e+11),(-1.38534e+14,9.08209e+11),(0,-0),(0,0)),
((0,-0),(0,0),(0,-0),(0,0),(-5.73716e-07,-3.17712e+08),(1.59949e+10,3.17712e+08)))
will raise boost “internal logic error” which stems in this commend “BOOST_UBLAS_CHECK (detail::expression_type_check (prod (triangular_adaptor<const_matrix_type, unit_lower> (m), e), cm1), internal_logic ());matrix_type cm2 (e);”. Google says it is due to the ill-condition of the matrix and the boost ublas authors wanted to ensure the matrix is invertible within the machine precision (epsilon > 1.11e-16 for double).  Indeed, adding ” #define BOOST_UBLAS_NDEBUG 1″ to the code to avoid checking produced a wrong inverse matrix. Interestingly, the determinant is not zero ( -2.9762e+003 +2.4557e+016i), and Matlab can invert the above matrix with a warning, which is actually desirable at this stage. Therefore using analytic solution, i.e., the way of directly calculating an inverse matrix may be a way out. I modified the C++ code from http://chi3x10.wordpress.com/2008/05/28/calculate-matrix-inversion-in-c/ and share it here with those who are in the middle of C++ coding using boost library. Caution: do make sure your inverse matrix multiple the original matrix is not distant from the identity matrix.


 /*
  * Analytical Matrix inversion routine
  */
template 
int GetMinor(matrix& src, matrix& dest, int row, int col)
{
    // indicate which col and row is being copied to dest
    int colCount=0,rowCount=0,order=src.size1();

    for(int i = 0; i
T CalcDeterminant( matrix& mat)
{
    // order must be >= 0
	// stop the recursion when matrix is a single element
	int order=mat.size1();
	T det;

    if( order == 1 )
    {
    	det=mat(0,0);
    	return det;
    }

	det=0.0;
    // the determinant value
   // T det = 0;

    // allocate the cofactor matrix
    matrix minor_mat (order-1,order-1);

    for(int i = 0; i
void MatrixInversion(matrix& A, matrix& Y)
{
	int order=A.size1();
    // get the determinant of a
    T detA,det;
    detA=CalcDeterminant(A);
	detA=(RealType)1.0/detA;

    // memory allocation
	 matrix minor_a (order-1, order-1);

    for(int j=0;j

It is good for small square matrices. NOT tested for large matrices. If your matrix is well scaled within the machine precision, use "fancier" inverse algorithm please.

Using openmpi on Eclipse

Eclipse is a great IDE to program/debug multiple languages, such as C/C++, Java, etc. To using openmpi on Eclipse, you need to install PTP (http://eclipse.org/ptp/downloads.php), or directly install Eclipse IDE for Parallel Application Developers.

Once I installed PTP on an existing Eclipse indigo, I can directly create an “Openmpi C++ project”. However, some MPI functions are still not recognized by Eclipse, such as “MPI_Init”. That’s because your Eclipse does not know the include library path of openmpi. Two way to add this path, (1) Project->properties->C/C++ build->settings, under GCC C++ compiler and GCC C compiler ->includes, add the location of the openmpi library path, mine is /use/lib/openmpi/include. (2) or change globally, Window->Preferences -> Parallel Tools ->Parallel Language Development Tools -> MPI Include paths, click new and add your openmpi include path.

Done.