# Category Archives: Scientific Programing

# Computation of qP, qSV, qSH traveltime, rays and slowness vectors in orthorhombic media

The direct traveltime field of qP, qSV, qSH can be calculated by solving a Hamilton-Jacobi equation. With the traveltime field in 3D, the computation of rays and slowness vectors is straightforward. Note that the orthorhombic medium has a isotropic block at the center where the point source is located. The orthorhombic medium outside of the isotropic medium is obtained by perturbing an arbitrarily strong transversal isotropic medium. In other word, the orthorhombic is weak but the anisotropy can be arbitrarily strong. Here is the video:

# Another Helmer Cluster (video only)

# Helmer cluster reborn on 21/12/2012

18 cores home make cluster (see here) has been in service since March 07 2012. Some show cases have been demonstrated here, here, and here. All are successful. Today (Dec 21, 2012) this 18-core cluster is upgraded to its maximum capability to host 36 cores. More exciting high-performance results will be posted soon. Stay tuned.

# Helmer Cluster show case 3: 24 micro earthquake released energy within 100 ms around a tunnel

The analytic solutions do not exist in the modeling situation where a tunnel dipping toward the east and ends somewhere within the model (See Fig. 1a). The total CPU time on my 18 core cluster is about 15 hours.

Fig. 1b provides a closer view of the tunnel and the spatial distribution of the 24 micro-earthquake sources. The “delay” on the figure is the original time delayed with respect to the first source. Source types are: ISO-explosion, DC-double couple, X-linear vector along X direction, and Z-linear vector along vertical direction. The frequencies ranges from 50 Hz to 300 Hz.

Fig. 1c and 1d shows the two components seismic traces acquired by the surface receivers and the borehole receivers, respectively. It is easy to observe that the second phase immediately following the first P-wave phase is not necessarily the shear-wave phase of the same earthquake. Instead, it can be a P-wave phase of a late earthquake.

Fig 1e and 1f shows the waveform from the middle channel from the surface receiver array (channel 76) and from the borehole receiver array ( channel 51). Can you identify all the phases in this noise-free environment? I actually lied when I say I used 24 sources. Can you find out how many sources I really used?

Here is a movie showing you what is really going on inside the earth, which in reality you never observe.

# Helmer Cluster show case 2: a 4.65 Magnitude earthquake shakes Great Toronto Area

Earthquakes in GTA? Yes, I am kidding. But what if a 4.65 magnitude nuclear explosion occurred at 3km deep underground secret lab, located close to the junction of Kirby road and Kipling Ave, Northwest of Vaughan (see google).

Here is a terrain map of the location. The black star indicates the hypocentre location.

Using SPECFEM3D with P=5, grid interval= 200m, half duration 0.2 second (frequency ~2~3Hz), the computation on my 18 core cluster took 1.5 hours. Here are some snapshots:

T=1.5 second

T=2.5 second

T=4.0 second

Notice the converted shear wave distribution behind the P-wave front? At some locations, the shear wave energy is amplified by the local topography and can severely damage the local surface structures. Good news: the modeled topography is 10 times amplified from the real topography. So the results here have little indication to the real case.

**Coming soon:**

Parallel hybrid spectral element and finite element simulation of airborne electromagnetic survey over a heterogeneous earth model with multi-scale lossy objects. Stay tuned.

# Helmer Cluster show case 1: seismic wave propagation in the presence of a fluid filled fracture

It is a show case for both my 18 core cluster and the performance of SPECFEM2D (link) software package. I remember someone did that already using variable grid finite difference. But for irregular fracture, finite element is an optimum solution.

The model is 30 m by 30 m box with a fracture of 1 cm thick and 10 m long. See figure as follows:

It is simply generated by gmsh software package:

// Gmsh project created on Tue Apr 03 22:42:21 2012 lc=0.5; lf=0.05; Point(1) = {0, 0, 0, lc}; Point(2) = {0, 30, 0, lc}; Point(3) = {30, 30, 0, lc}; Point(4) = {30, 0, 0, lc}; Line(1) = {2, 1}; Line(2) = {1, 4}; Line(3) = {4, 3}; Line(4) = {3, 2}; Point(5) = {13.5, 15, 0, lf}; Point(6) = {13.5, 5, 0, lf}; Point(7) = {13.51, 15, 0, lf}; Point(8) = {13.51, 5, 0, lf}; Line(5) = {5, 7}; Line(6) = {5, 6}; Line(7) = {6, 8}; Line(8) = {8, 7}; Line Loop(9) = {6, 7, 8, -5}; Line Loop(10) = {1, 2, 3, 4, -6, -7, -8, 5}; Plane Surface(11) = {9}; Plane Surface(10) = {10}; Physical Line("abs") = {4, 1, 2, 3}; # absorbing boundary Physical Surface("M1") = {10}; # material 1 Physical Surface("M2") = {11}; # material 2

and exported to SPECFEM2D mesh format using a Matlab script:

clear all filehead='fracture'; fname=[filehead,'.msh']; Fmesh=[filehead,'_mesh']; Fmat=[filehead,'_mat']; Fnodes=[filehead,'_nodesCoords']; Fsur_abs=[filehead,'_surface_abs']; Fsur_free=[filehead,'_surface_free']; %% % import gmsh file fid=fopen(fname,'r'); tline=fgetl(fid); while ~feof(fid) if strcmp(tline, '$PhysicalNames') tline=fgetl(fid); phyNum=str2num(tline); phyNames(phyNum,1)=struct('dimension',[], 'number',[], 'name',[]); phyNames(phyNum,1).dimension=[];phyNames(phyNum,1).number=[];phyNames(phyNum,1).name=[]; for i=1:phyNum tline=fgetl(fid); C = textscan(tline,'%d %d %s'); phyNames(i,1).dimension=C{1}(1); phyNames(i,1).number=C{2}(1); phyNames(i,1).name=C{3}{1}(2:end-1); end elseif strcmp(tline,'$Nodes') tline=fgetl(fid); nodeNum=str2num(tline); Nodes(nodeNum,1)=struct('coord',[]); Nodes(nodeNum,1).coord=[]; for i=1:nodeNum tline=fgetl(fid); C = textscan(tline,'%d %f %f %f'); Nodes(i,1).coord=[C{2} C{3} C{4}]; end elseif strcmp(tline,'$Elements') tline=fgetl(fid); meshNum=str2num(tline); elements(meshNum,1)=struct('idx',[],'type',[],'numTags',[],'tags',[],'mesh',[]); elements(meshNum,1).idx=[];elements(meshNum,1).type=[];elements(meshNum,1).numTags=[]; elements(meshNum,1).tags=[];elements(meshNum,1).mesh=[]; for i=1:meshNum tline=fgetl(fid); C = textscan(tline,'%d'); elements(i).idx=C{1}(1); elements(i).type=C{1}(2); elements(i).numTags=C{1}(3); for j=1:elements(i).numTags elements(i).tags(j)=C{1}(j+3); end if elements(i).type==1 % two node line elements(i).mesh=[C{1}(end-1:end)]'; elseif elements(i).type==8 % three nodes line 2nd order elements(i).mesh=[C{1}(end-2:end)]'; elseif elements(i).type==3 % 4-node quadrangle elements(i).mesh=[C{1}(end-3:end)]'; elseif elements(i).type==10 % 9-node second order quadrangle elements(i).mesh=[C{1}(end-8:end)]'; else error('Only 1st and 2nd order quadrangle type element are supported'); end end end tline=fgetl(fid); end fclose(fid); %% %export nodes to SPECFEM2D fid=fopen(Fnodes,'w'); fprintf(fid,'%dn',length(Nodes)); for i=1:length(Nodes) fprintf(fid,'%12.3f%12.3fn',Nodes(i).coord(1),Nodes(i).coord(2)); end fclose(fid); %% % export mesh to SPECFEM2D fid=fopen(Fmesh,'w'); fprintf(fid,'%8d%8dn',0,0); j3=0;j10=0; for i=1:length(elements) if elements(i).type==3 fprintf(fid,'%8d%8d%8d%8dn',elements(i).mesh); j3=j3+1; elseif elements(i).type==10 fprintf(fid,'%8d%8d%8d%8d%8d%8d%8d%8d%8dn',elements(i).mesh); j10=j10+1; end end frewind(fid); if j3>0 fprintf(fid,'%8d%8dn',j3,4); end if j10>0 fprintf(fid,'%8d%8dn',j10,9); end fclose(fid); %% % export material to SPECFEM2D fid=fopen(Fmat,'w'); for i=1:length(phyNames) allNames(i)=phyNames(i).number; end for i=1:length(elements) if elements(i).type==3 || elements(i).type==10 mstr=phyNames(allNames==elements(i).tags(1)).name; fprintf(fid,'%8dn',str2num(mstr(2:end))); end end fclose(fid); %% % export boundary conditions to SPECFEM2D for i=1:length(phyNames) allNames(i)=phyNames(i).number; end meshMatrix=[];j=1; for i=1:length(elements) if elements(i).type==3 || elements(i).type==10 meshMatrix=[meshMatrix;elements(i).mesh]; end end fabs=fopen(Fsur_abs,'w');fprintf(fabs,'%8d n',0); ffree=fopen(Fsur_free,'w');fprintf(ffree,'%8d n',0); ja=0;jf=0; for i=1:length(elements) if elements(i).type==1 || elements(i).type==8 %take edge elements only tmp=elements(i).mesh; idx=[]; for j=1:length(meshMatrix(:,1)) itag=0; for k=1:length(tmp) if find(meshMatrix(j,:)==tmp(k)) itag=itag+1; end end if length(tmp)==itag idx=[idx;j]; end end if isempty(idx) || length(idx)>1 error(['No match or Non edge nodes used in line element, please check element ',num2str(i)]); end mstr=phyNames(allNames==elements(i).tags(1)).name; if strcmp(mstr,'free') %unless specified, all boundaries are absorbing jf=jf+1; if elements(i).type==1 fprintf(ffree,'%8d%8d%8d%8dn',idx,2,elements(i).mesh); elseif elements(i).type==3 fprintf(ffree,'%8d%8d%8d%8d%8dn',idx,2,elements(i).mesh); else error('Other type of edge elements are not implemented.'); end else ja=ja+1; if elements(i).type==1 fprintf(fabs,'%8d%8d%8d%8dn',idx,2,elements(i).mesh); elseif elements(i).type==3 fprintf(fabs,'%8d%8d%8d%8d%8dn',idx,2,elements(i).mesh); else error('Other type of edge elements are not implemented.'); end end end end frewind(ffree);fprintf(ffree,'%8d n',jf); frewind(fabs);fprintf(fabs,'%8d n',ja); fclose(ffree);fclose(fabs);

A full simulation 50000 time steps (1d-8 second each step) is completed in 1 hour. Here are a few snapshots:

Time step: 12000

Time step: 17000

Time step: 30000

Next show case of my cluster: Electromagnetic wave diffusion in the presence of irregular orebody using Galerkin Discontinuous high order finite element method. Stick around and know more about what we can do with a home brewed cluster.

# How to make yourself a supercomputer at home

I just made myself a 18 core cluster with a price close to a high profile workstation and cheaper than cloud price (e.g., 8 core, Cluster Compute Quadruple Extra Large Instance from Amazon cost $1450/yr). With the same amount of budget, you can own more than 8 cores forever. Here is how: follow the steps on this weblog and build a cluster like this.

# Compile a trilino program 101

Offical trilino tutorials start from ex1.cpp, a program shows the basic usage of communicators.

// @HEADER // *********************************************************************** // // Didasko Tutorial Package // Copyright (2005) Sandia Corporation // // Under terms of Contract DE-AC04-94AL85000, there is a non-exclusive // license for use of this work by or on behalf of the U.S. Government. // // This library is free software; you can redistribute it and/or modify // it under the terms of the GNU Lesser General Public License as // published by the Free Software Foundation; either version 2.1 of the // License, or (at your option) any later version. // // This library is distributed in the hope that it will be useful, but // WITHOUT ANY WARRANTY; without even the implied warranty of // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU // Lesser General Public License for more details. // // You should have received a copy of the GNU Lesser General Public // License along with this library; if not, write to the Free Software // Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 // USA // // Questions about Didasko? Contact Marzio Sala (marzio.sala _AT_ gmail.com) // // *********************************************************************** // @HEADER // Basic definition of communicator. // This code should be run with at least two processes // However, note that the SAME code works even if Epetra // has been configured without MPI! #include "Didasko_ConfigDefs.h" #if defined(HAVE_DIDASKO_EPETRA) #include "Epetra_ConfigDefs.h" #ifdef HAVE_MPI #include "mpi.h" #include "Epetra_MpiComm.h" #else #include "Epetra_SerialComm.h" #endif int main(int argc, char *argv[]) { #ifdef HAVE_MPI MPI_Init(&argc, &argv); // define an Epetra communicator Epetra_MpiComm Comm(MPI_COMM_WORLD); #else Epetra_SerialComm Comm; #endif // get the proc ID of this process int MyPID = Comm.MyPID(); // get the total number of processes int NumProc = Comm.NumProc(); // output some information to std output cout << Comm << endl; // ======================== // // now some basic MPI calls // // ------------------------ // int ivalue; double dvalue, dvalue2; double* dvalues; dvalues = new double[NumProc]; double* dvalues2; dvalues2 = new double[NumProc]; int root = 0; // equivalent to MPI_Barrier Comm.Barrier(); if (MyPID == root) dvalue = 12.0; // On input, the root processor contains the list of values // (in this case, a single value). On exit, all processes will // have he same list of values. Note that all values must be allocated // vefore the broadcast // equivalent to MPI_Broadcast Comm.Broadcast(&dvalue, 1, root); // as before, but with integer values. As C++ can bind to the appropriate // interface based on argument typing, the type of data is not required. Comm.Broadcast(&ivalue, 1, root); // equivalent MPI_Allgather Comm.GatherAll(dvalues, dvalues2, 1); // equivalent to MPI_Allreduce with MPI_SUM dvalue = 1.0*MyPID; Comm.SumAll( &dvalue, dvalues, 1); // equivalent to MPI_Allreduce with MPI_SUM Comm.MaxAll( &dvalue, dvalues, 1); // equiavant to MPI_Scan with MPI_SUM dvalue = 1.0 * MyPID; Comm.ScanSum(&dvalue, &dvalue2, 1); cout << "On proc " << MyPID << " dvalue2 = " << dvalue2 << endl; delete[] dvalues; delete[] dvalues2; // ======================= // // Finalize MPI and return // // ----------------------- // #ifdef HAVE_MPI MPI_Finalize(); #endif return( EXIT_SUCCESS ); } /* main */ #else #include#include #ifdef HAVE_MPI #include "mpi.h" #endif int main(int argc, char *argv[]) { #ifdef HAVE_MPI MPI_Init(&argc,&argv); #endif puts("Please configure Didasko with:n" "--enable-epetra"); #ifdef HAVE_MPI MPI_Finalize(); #endif return 0; } #endif

To compile,

junwei@Junbuntu:~/tmptest$ mpiCC -I/home/junwei/data/trilinos/10.8.3/include -L/home/junwei/data/trilinos/10.8.3/lib ex1.cpp -o ex1 -lepetra

Then run,

junwei@Junbuntu:~/tmptest$ mpirun -np 1 ./ex1 Epetra::MpiComm Processor 0 of 1 total processors On proc 0 dvalue2 = 0 junwei@Junbuntu:~/tmptest$ mpirun -np 2 ./ex1 Epetra::MpiComm Processor 0 of 2 total processors On proc 0 dvalue2 = 0 Epetra::MpiComm Processor 1 of 2 total processors On proc 1 dvalue2 = 1 junwei@Junbuntu:~/tmptest$ mpirun -np 3 ./ex1 Epetra::MpiComm Processor 0 of 3 total processors Epetra::MpiComm Processor 2 of 3 total processors Epetra::MpiComm Processor 1 of 3 total processors On proc 1 dvalue2 = 1 On proc 0 dvalue2 = 0 On proc 2 dvalue2 = 3 junwei@Junbuntu:~/tmptest$ mpirun -np 4 ./ex1 Epetra::MpiComm Processor 0 of 4 total processors Epetra::MpiComm Processor 1 of 4 total processors Epetra::MpiComm Processor 2 of 4 total processors Epetra::MpiComm Processor 3 of 4 total processors On proc 0 dvalue2 = 0 On proc 1 dvalue2 = 1 On proc 2 dvalue2 = 3 On proc 3 dvalue2 = 6

# Matlab vs Octave vs Armadillo

Out of curiosity, I just conducted a toy comparison between Matlab, Octave and Armadillo, to convince myself that it is going to be paid off by translating some Matlab programs to Armadillo C++. Here are the details:

1. Matlab (2010b) running on Intel(TM) core i7-2600 (4 cores, 8 threads), Windows 7, 8 GB RAM;

2. Matlab (2011a) running on AMD(TM) Phenom II (4 cores, 4 threads), Window 7, 6 GB RAM;

3. GNU Octave, version 3.2.4, running on AMD Athlon(tm) 64 X2 Dual Core Processor 5000+, Ubuntu 11.04, 4 GB RAM;

4. Aramdillo C++ 2.3.91, running on AMD Athlon(tm) 64 X2 Dual Core Processor 5000+, Ubuntu 11.04, 4 GB RAM;

For C++, I compiled using “g++ testArmadillo.cpp -o testArmadillo -larmadillo -O2”

The test program on Matlab/Octave is

tic;A=randn(5000）；B=randn(5000);C=AB;toc

The test program on Aramdillo C++ is

#include#include using namespace std; using namespace arma; int main(int argc, char** argv) { wall_clock timer; timer.tic(); mat A = randu (5000,5000); mat B = randu (5000,5000); mat C; //cout << A*trans(B) << endl; C=solve(A,B); cout<<"Elapsed time is "< Test results

1: Elapsed time is 11.648498 seconds.

2: Elapsed time is 12.639551 seconds.

3: Elapsed time is 42.7492 seconds.

4: Elapsed time is 40.4321 seconds.Assuming 4 core CPU speeds up twice as the 2 core CPU, 3 and 4 shall be 21 seconds and 20 seconds, respectively. Does C++ really run faster than Matlab/Octave running on the same machine? I am not convinced yet. Possibly, the benefit of translating Matlab script to C++ is the potential of parallelization of the C++ program so that it can run on multiple workstations or a cluster.