Computation of qP, qSV, qSH traveltime, rays and slowness vectors in orthorhombic media

The direct traveltime field of qP, qSV, qSH can be calculated by solving a Hamilton-Jacobi equation. With the traveltime field in 3D, the computation of rays and slowness vectors is straightforward. Note that the orthorhombic medium has a isotropic block at the center where the point source is located. The orthorhombic medium outside of the isotropic medium is obtained by perturbing an arbitrarily strong transversal isotropic medium. In other word, the orthorhombic is weak but the anisotropy can be arbitrarily strong. Here is the video:

Helmer cluster reborn on 21/12/2012

18 cores home make cluster (see here) has been in service since March 07 2012. Some show cases have been demonstrated here, here, and here. All are successful. Today (Dec 21, 2012) this 18-core cluster is upgraded to its maximum capability to host 36 cores. More exciting high-performance results will be posted soon. Stay tuned.

36-core helmer cluster after being upgraded

Helmer Cluster show case 3: 24 micro earthquake released energy within 100 ms around a tunnel

The analytic solutions do not exist in the modeling situation where a tunnel dipping toward the east and ends somewhere within the model (See Fig. 1a). The total CPU time on my 18 core cluster is about 15 hours.

Fig1a: Tunnel Model with acquisition geometry

Fig1a: Tunnel Model with acquisition geometry

Fig. 1b provides a closer view of the tunnel and the spatial distribution of the 24 micro-earthquake sources. The “delay” on the figure is the original time delayed with respect to the first source. Source types are: ISO-explosion, DC-double couple, X-linear vector along X direction, and Z-linear vector along vertical direction. The frequencies ranges from 50 Hz to 300 Hz.

Fig1b: A close view of the tunnel and the spatial distribution of the micro-earthquakes

Fig1b: A close view of the tunnel and the spatial distribution of the micro-earthquakes

Fig. 1c and 1d shows the two components seismic traces acquired by the surface receivers and the borehole receivers, respectively. It is easy to observe that the second phase immediately following the first P-wave phase is not necessarily the shear-wave phase of the same earthquake. Instead, it can be a P-wave phase of a late earthquake.

Fig1c: Seismic gather from the surface receiver array. Fig 1d: Seismic gather from the borehole receiver array.

Fig1c: Seismic gather from the surface receiver array. Fig 1d: Seismic gather from the borehole receiver array.


Fig 1e and 1f shows the waveform from the middle channel from the surface receiver array (channel 76) and from the borehole receiver array ( channel 51). Can you identify all the phases in this noise-free environment? I actually lied when I say I used 24 sources. Can you find out how many sources I really used?
Fig1e: The waveform of the two components surface receiver. Fig 1f: The waveform of the two components borehole receiver.

Fig1e: The waveform of the two components surface receiver. Fig 1f: The waveform of the two components borehole receiver.

Here is a movie showing you what is really going on inside the earth, which in reality you never observe.

Helmer Cluster show case 2: a 4.65 Magnitude earthquake shakes Great Toronto Area

Earthquakes in GTA? Yes, I am kidding. But what if a 4.65 magnitude nuclear explosion occurred at 3km deep underground secret lab, located close to the junction of Kirby road and Kipling Ave, Northwest of Vaughan (see google).

Here is a terrain map of the location. The black star indicates the hypocentre location.

Using SPECFEM3D with P=5, grid interval= 200m, half duration 0.2 second (frequency ~2~3Hz), the computation on my 18 core cluster took 1.5 hours. Here are some snapshots:
T=1.5 second

T=2.5 second

T=4.0 second

Notice the converted shear wave distribution behind the P-wave front? At some locations, the shear wave energy is amplified by the local topography and can severely damage the local surface structures. Good news: the modeled topography is 10 times amplified from the real topography. So the results here have little indication to the real case.

Coming soon:
Parallel hybrid spectral element and finite element simulation of airborne electromagnetic survey over a heterogeneous earth model with multi-scale lossy objects. Stay tuned.

Helmer Cluster show case 1: seismic wave propagation in the presence of a fluid filled fracture

It is a show case for both my 18 core cluster and the performance of SPECFEM2D (link) software package. I remember someone did that already using variable grid finite difference. But for irregular fracture, finite element is an optimum solution.

The model is 30 m by 30 m box with a fracture of 1 cm thick and 10 m long. See figure as follows:

It is simply generated by gmsh software package:

// Gmsh project created on Tue Apr 03 22:42:21 2012
lc=0.5;
lf=0.05;
Point(1) = {0, 0, 0, lc};
Point(2) = {0, 30, 0, lc};
Point(3) = {30, 30, 0, lc};
Point(4) = {30, 0, 0, lc};
Line(1) = {2, 1};
Line(2) = {1, 4};
Line(3) = {4, 3};
Line(4) = {3, 2};
Point(5) = {13.5, 15, 0, lf};
Point(6) = {13.5, 5, 0, lf};
Point(7) = {13.51, 15, 0, lf};
Point(8) = {13.51, 5, 0, lf};
Line(5) = {5, 7};
Line(6) = {5, 6};
Line(7) = {6, 8};
Line(8) = {8, 7};
Line Loop(9) = {6, 7, 8, -5};
Line Loop(10) = {1, 2, 3, 4, -6, -7, -8, 5};
Plane Surface(11) = {9};
Plane Surface(10) = {10};
Physical Line("abs") = {4, 1, 2, 3}; # absorbing boundary
Physical Surface("M1") = {10}; # material 1
Physical Surface("M2") = {11}; # material 2

and exported to SPECFEM2D mesh format using a Matlab script:

clear all

filehead='fracture';
fname=[filehead,'.msh'];
Fmesh=[filehead,'_mesh'];
Fmat=[filehead,'_mat'];
Fnodes=[filehead,'_nodesCoords'];
Fsur_abs=[filehead,'_surface_abs'];
Fsur_free=[filehead,'_surface_free'];

%%
% import gmsh file
fid=fopen(fname,'r');

tline=fgetl(fid);
while ~feof(fid)
    if strcmp(tline, '$PhysicalNames')
        tline=fgetl(fid);
        phyNum=str2num(tline);
        phyNames(phyNum,1)=struct('dimension',[], 'number',[], 'name',[]);
        phyNames(phyNum,1).dimension=[];phyNames(phyNum,1).number=[];phyNames(phyNum,1).name=[];
        for i=1:phyNum
            tline=fgetl(fid);
            C = textscan(tline,'%d %d %s');
            phyNames(i,1).dimension=C{1}(1);
            phyNames(i,1).number=C{2}(1);
            phyNames(i,1).name=C{3}{1}(2:end-1);
        end
    elseif strcmp(tline,'$Nodes')
        tline=fgetl(fid);
        nodeNum=str2num(tline);
        Nodes(nodeNum,1)=struct('coord',[]);
        Nodes(nodeNum,1).coord=[];

        for i=1:nodeNum
            tline=fgetl(fid);
            C = textscan(tline,'%d %f %f %f');
            Nodes(i,1).coord=[C{2} C{3} C{4}];
        end
    elseif strcmp(tline,'$Elements')
        tline=fgetl(fid);
        meshNum=str2num(tline);
        elements(meshNum,1)=struct('idx',[],'type',[],'numTags',[],'tags',[],'mesh',[]);
        elements(meshNum,1).idx=[];elements(meshNum,1).type=[];elements(meshNum,1).numTags=[];
        elements(meshNum,1).tags=[];elements(meshNum,1).mesh=[];

        for i=1:meshNum
            tline=fgetl(fid);
            C = textscan(tline,'%d');
            elements(i).idx=C{1}(1);
            elements(i).type=C{1}(2);
            elements(i).numTags=C{1}(3);
            for j=1:elements(i).numTags
                elements(i).tags(j)=C{1}(j+3);
            end
            if elements(i).type==1 % two node line
                elements(i).mesh=[C{1}(end-1:end)]';
            elseif elements(i).type==8 % three nodes line 2nd order
                elements(i).mesh=[C{1}(end-2:end)]';
            elseif elements(i).type==3 % 4-node quadrangle
                elements(i).mesh=[C{1}(end-3:end)]';
            elseif elements(i).type==10 % 9-node second order quadrangle
                elements(i).mesh=[C{1}(end-8:end)]';
            else
                error('Only 1st and 2nd order quadrangle type element are supported');
            end
        end

    end
    tline=fgetl(fid);

end
fclose(fid);

%%
%export nodes to SPECFEM2D
fid=fopen(Fnodes,'w');
fprintf(fid,'%dn',length(Nodes));
for i=1:length(Nodes)
    fprintf(fid,'%12.3f%12.3fn',Nodes(i).coord(1),Nodes(i).coord(2));
end
fclose(fid);

%%
% export mesh to SPECFEM2D
fid=fopen(Fmesh,'w');
fprintf(fid,'%8d%8dn',0,0);
j3=0;j10=0;
for i=1:length(elements)
    if elements(i).type==3
        fprintf(fid,'%8d%8d%8d%8dn',elements(i).mesh);
        j3=j3+1;
    elseif elements(i).type==10
        fprintf(fid,'%8d%8d%8d%8d%8d%8d%8d%8d%8dn',elements(i).mesh);
        j10=j10+1;
    end
end
frewind(fid);
if j3>0
    fprintf(fid,'%8d%8dn',j3,4);
end
if j10>0
    fprintf(fid,'%8d%8dn',j10,9);
end
fclose(fid);

%%
% export material to SPECFEM2D
fid=fopen(Fmat,'w');
for i=1:length(phyNames)
    allNames(i)=phyNames(i).number;
end
for i=1:length(elements)
    if elements(i).type==3 || elements(i).type==10
        mstr=phyNames(allNames==elements(i).tags(1)).name;
        fprintf(fid,'%8dn',str2num(mstr(2:end)));
    end
end
fclose(fid);

%%
% export boundary conditions to SPECFEM2D
for i=1:length(phyNames)
    allNames(i)=phyNames(i).number;
end

meshMatrix=[];j=1;
for i=1:length(elements)
    if elements(i).type==3 || elements(i).type==10
        meshMatrix=[meshMatrix;elements(i).mesh];
    end
end

fabs=fopen(Fsur_abs,'w');fprintf(fabs,'%8d n',0);
ffree=fopen(Fsur_free,'w');fprintf(ffree,'%8d n',0);
ja=0;jf=0;
for i=1:length(elements)
    if elements(i).type==1 || elements(i).type==8   %take edge elements only
        tmp=elements(i).mesh;
        idx=[];
        for j=1:length(meshMatrix(:,1))
            itag=0;
            for k=1:length(tmp)
                if find(meshMatrix(j,:)==tmp(k))
                    itag=itag+1;
                end
            end
            if length(tmp)==itag
                idx=[idx;j];
            end
        end
        if isempty(idx) || length(idx)>1
            error(['No match or Non edge nodes used in line element, please check element ',num2str(i)]);
        end
        mstr=phyNames(allNames==elements(i).tags(1)).name;
        if strcmp(mstr,'free') %unless specified, all boundaries are absorbing
            jf=jf+1;
            if elements(i).type==1
                fprintf(ffree,'%8d%8d%8d%8dn',idx,2,elements(i).mesh);
            elseif elements(i).type==3
                fprintf(ffree,'%8d%8d%8d%8d%8dn',idx,2,elements(i).mesh);
            else
                error('Other type of edge elements are not implemented.');
            end
        else
            ja=ja+1;
            if elements(i).type==1
                fprintf(fabs,'%8d%8d%8d%8dn',idx,2,elements(i).mesh);
            elseif elements(i).type==3
                fprintf(fabs,'%8d%8d%8d%8d%8dn',idx,2,elements(i).mesh);
            else
                error('Other type of edge elements are not implemented.');
            end
        end
    end
end
frewind(ffree);fprintf(ffree,'%8d n',jf);
frewind(fabs);fprintf(fabs,'%8d n',ja);
fclose(ffree);fclose(fabs);

A full simulation 50000 time steps (1d-8 second each step) is completed in 1 hour. Here are a few snapshots:
Time step: 12000

Time step: 17000

Time step: 30000

Next show case of my cluster: Electromagnetic wave diffusion in the presence of irregular orebody using Galerkin Discontinuous high order finite element method. Stick around and know more about what we can do with a home brewed cluster.

How to make yourself a supercomputer at home

I just made myself a 18 core cluster with a price close to a high profile workstation and cheaper than cloud price (e.g., 8 core, Cluster Compute Quadruple Extra Large Instance from Amazon cost $1450/yr). With the same amount of budget, you can own more than 8 cores forever. Here is how: follow the steps on this weblog and build a cluster like this.

Compile a trilino program 101

Offical trilino tutorials start from ex1.cpp, a program shows the basic usage of communicators.

// @HEADER
// ***********************************************************************
//
//                      Didasko Tutorial Package
//                 Copyright (2005) Sandia Corporation
//
// Under terms of Contract DE-AC04-94AL85000, there is a non-exclusive
// license for use of this work by or on behalf of the U.S. Government.
//
// This library is free software; you can redistribute it and/or modify
// it under the terms of the GNU Lesser General Public License as
// published by the Free Software Foundation; either version 2.1 of the
// License, or (at your option) any later version.
//
// This library is distributed in the hope that it will be useful, but
// WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
// Lesser General Public License for more details.
//
// You should have received a copy of the GNU Lesser General Public
// License along with this library; if not, write to the Free Software
// Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307
// USA
//
// Questions about Didasko? Contact Marzio Sala (marzio.sala _AT_ gmail.com)
//
// ***********************************************************************
// @HEADER

// Basic definition of communicator.
// This code should be run with at least two processes
// However, note that the SAME code works even if Epetra
// has been configured without MPI!

#include "Didasko_ConfigDefs.h"
#if defined(HAVE_DIDASKO_EPETRA)

#include "Epetra_ConfigDefs.h"
#ifdef HAVE_MPI
#include "mpi.h"
#include "Epetra_MpiComm.h"
#else
#include "Epetra_SerialComm.h"
#endif

int main(int argc, char *argv[])
{

#ifdef HAVE_MPI
  MPI_Init(&argc, &argv);
  // define an Epetra communicator
  Epetra_MpiComm Comm(MPI_COMM_WORLD);
#else
  Epetra_SerialComm Comm;
#endif

  // get the proc ID of this process
  int MyPID = Comm.MyPID();

  // get the total number of processes
  int NumProc = Comm.NumProc();

  // output some information to std output
  cout << Comm << endl;

  // ======================== //
  // now some basic MPI calls //
  // ------------------------ //

  int    ivalue;
  double dvalue, dvalue2;
  double* dvalues;  dvalues  = new double[NumProc];
  double* dvalues2; dvalues2 = new double[NumProc];
  int root = 0;

  // equivalent to MPI_Barrier

  Comm.Barrier();

  if (MyPID == root) dvalue = 12.0;

  // On input, the root processor contains the list of values
  // (in this case, a single value). On exit, all processes will
  // have he same list of values. Note that all values must be allocated
  // vefore the broadcast

  // equivalent to  MPI_Broadcast

  Comm.Broadcast(&dvalue, 1, root);

  // as before, but with integer values. As C++ can bind to the appropriate
  // interface based on argument typing, the type of data is not required.

  Comm.Broadcast(&ivalue, 1, root);

  // equivalent MPI_Allgather

  Comm.GatherAll(dvalues, dvalues2, 1);

  // equivalent to MPI_Allreduce with MPI_SUM

  dvalue = 1.0*MyPID;

  Comm.SumAll( &dvalue, dvalues, 1);

  // equivalent to MPI_Allreduce with MPI_SUM

  Comm.MaxAll( &dvalue, dvalues, 1);

  // equiavant to MPI_Scan with MPI_SUM

  dvalue = 1.0 * MyPID;

  Comm.ScanSum(&dvalue, &dvalue2, 1);

  cout << "On proc " << MyPID << " dvalue2  = " << dvalue2 << endl;

  delete[] dvalues;
  delete[] dvalues2;

  // ======================= //
  // Finalize MPI and return //
  // ----------------------- //

#ifdef HAVE_MPI
  MPI_Finalize();
#endif

  return( EXIT_SUCCESS );

} /* main */

#else

#include 
#include 
#ifdef HAVE_MPI
#include "mpi.h"
#endif

int main(int argc, char *argv[])
{
#ifdef HAVE_MPI
  MPI_Init(&argc,&argv);
#endif

  puts("Please configure Didasko with:n"
       "--enable-epetra");

#ifdef HAVE_MPI
  MPI_Finalize();
#endif
  return 0;
}
#endif


To compile,


junwei@Junbuntu:~/tmptest$ mpiCC -I/home/junwei/data/trilinos/10.8.3/include -L/home/junwei/data/trilinos/10.8.3/lib ex1.cpp -o ex1 -lepetra

Then run,

junwei@Junbuntu:~/tmptest$ mpirun -np 1 ./ex1
Epetra::MpiComm  Processor 0 of 1 total processors
On proc 0 dvalue2  = 0
junwei@Junbuntu:~/tmptest$ mpirun -np 2 ./ex1
Epetra::MpiComm  Processor 0 of 2 total processors
On proc 0 dvalue2  = 0
Epetra::MpiComm  Processor 1 of 2 total processors
On proc 1 dvalue2  = 1
junwei@Junbuntu:~/tmptest$ mpirun -np 3 ./ex1
Epetra::MpiComm  Processor 0 of 3 total processors
Epetra::MpiComm  Processor 2 of 3 total processors
Epetra::MpiComm  Processor 1 of 3 total processors
On proc 1 dvalue2  = 1
On proc 0 dvalue2  = 0
On proc 2 dvalue2  = 3
junwei@Junbuntu:~/tmptest$ mpirun -np 4 ./ex1
Epetra::MpiComm  Processor 0 of 4 total processors
Epetra::MpiComm  Processor 1 of 4 total processors
Epetra::MpiComm  Processor 2 of 4 total processors
Epetra::MpiComm  Processor 3 of 4 total processors
On proc 0 dvalue2  = 0
On proc 1 dvalue2  = 1
On proc 2 dvalue2  = 3
On proc 3 dvalue2  = 6

Matlab vs Octave vs Armadillo

Out of curiosity, I just conducted a toy comparison between Matlab, Octave and Armadillo, to convince myself that it is going to be paid off by translating some Matlab programs to Armadillo C++. Here are the details:
1. Matlab (2010b) running on Intel(TM) core i7-2600 (4 cores, 8 threads), Windows 7, 8 GB RAM;
2. Matlab (2011a) running on AMD(TM) Phenom II (4 cores, 4 threads), Window 7, 6 GB RAM;
3. GNU Octave, version 3.2.4, running on AMD Athlon(tm) 64 X2 Dual Core Processor 5000+, Ubuntu 11.04, 4 GB RAM;
4. Aramdillo C++ 2.3.91, running on AMD Athlon(tm) 64 X2 Dual Core Processor 5000+, Ubuntu 11.04, 4 GB RAM;
For C++, I compiled using “g++ testArmadillo.cpp -o testArmadillo -larmadillo -O2”

The test program on Matlab/Octave is

tic;A=randn(5000);B=randn(5000);C=AB;toc

The test program on Aramdillo C++ is

#include 
#include 

using namespace std;
using namespace arma;

int main(int argc, char** argv)
  {
  wall_clock timer;
  timer.tic();
  mat A = randu(5000,5000);
  mat B = randu(5000,5000);
  mat C;

  //cout << A*trans(B) << endl;
  C=solve(A,B);
  cout<<"Elapsed time is "<

Test results
1: Elapsed time is 11.648498 seconds.
2: Elapsed time is 12.639551 seconds.
3: Elapsed time is 42.7492 seconds.
4: Elapsed time is 40.4321 seconds.

Assuming 4 core CPU speeds up twice as the 2 core CPU, 3 and 4 shall be 21 seconds and 20 seconds, respectively. Does C++ really run faster than Matlab/Octave running on the same machine? I am not convinced yet. Possibly, the benefit of translating Matlab script to C++ is the potential of parallelization of the C++ program so that it can run on multiple workstations or a cluster.