Why a c++ program based on openmpi 1.4.3 failed to execute the system() function

It wasted me a full day to find out why running a c++ program with a system() function call failed.

Background: to output moderate size of files by 30 processors and merge them to a single file.

if(my_rank==0) {
   int sysflag;
   if (!system(NULL)) {printf("nError!...command processor is unavailablen");exit(1);}
   for (int i=0;i stackX.bin ",i);sysflag=system(cmd);//printf("system returned %dn", sysflag);
      } else {
         sprintf(cmd,"cat PE%d_loc_stackX.3d >> stackX.bin ",i);sysflag=system(cmd);//printf("system returned %dn", sysflag);
         sprintf(cmd,"rm PE%d_loc_stackX.3d",i);sysflag=system(cmd); //printf("system returned %dn", sysflag);

That “Command processor is unavailable” is always the case, although this “save-individually-and-merge-later” works for another program. Extensive webpage search may indicate openmpi sometimes doesn’t support fork() which is used by system() call. However, this is not the reason in my case, since it works for other programs.

(a) all slave processors send data to the master which writes to a single file;
(b) all processors write their own file and merge to a single file afterwards;
(c) use MPI I/O.

My choice:
(b). (b) is faster than (a) and is easier to implement than (c).

My guess:
My understanding is the system() will spawn a sub-process costing the same amount of memory as other processors. Therefore in my case, one node running 5 processors already consumed 90% of memory and thus won’t have enough memory to run another sub-process. Consequently, function call of system() fails. In addition, system() does not throw exceptions upon failure. Thus the c++ program doesn’t execute the command passing to system() and appears nothing happened.

My solution:
reduce the number of node from 5 to 4 solves the problem.

Leave a Reply

Your email address will not be published. Required fields are marked *