Hi All,
I implemented an algorithm with MPI and tried to measure its execution time.
The Execution time varies too much. I measured each parts of the algorithm separately and found the problem. In my code there is a receive command after a send command in a loop. This receive causes sometimes 2 seconds delay. It does not appear every iteration but irregularly. Can somebody explain this to me?
To show this problem I wrote the following code. As you will see in the output, I am getting sometimes more than 0.2 seconds delay.
I cant use MPI_Sendrecv because there are some computations between send and receive in my real code.
I am using sock channel (tcp/ip sockets) of mpich2.
#include <stdio.h>
#include <mpi.h>
int main (argc, argv)
int argc;
char *argv[];
{
int rank, size;
MPI_Init (&argc, &argv); /* starts MPI */
MPI_Comm_rank (MPI_COMM_WORLD, &rank); /* get current process id */
MPI_Comm_size (MPI_COMM_WORLD, &size); /* get number of processes */
MPI_Status status;
double timer;
int buffer[1000];
int iter;
for (iter =0; iter<30; iter++)
{
if (rank==0){
MPI_Send(buffer, 1000, MPI_INT, 1, 123, MPI_COMM_WORLD );
MPI_Barrier(MPI_COMM_WORLD); //Barrier
timer = MPI_Wtime(); //Execution timer is started
MPI_Recv(buffer, 1000, MPI_INT, 1, 234, MPI_COMM_WORLD, &status);
timer = MPI_Wtime()-timer; //Execution timer is stopped
printf("Execution time is %f iteration is %d \n", timer, iter);
}
if (rank==1) {
MPI_Recv(buffer, 1000, MPI_INT, 0, 123, MPI_COMM_WORLD, &status);
MPI_Barrier(MPI_COMM_WORLD);
MPI_Send(buffer, 1000, MPI_INT, 0, 234, MPI_COMM_WORLD );
}
}
MPI_Finalize();
return 0;
}
root@rck00:~> mpirun -np 2 /shared/sendrecv/./a.out
Execution time is 0.000412 iteration is 0
Execution time is 0.000175 iteration is 1
Execution time is 0.250426 iteration is 2
Execution time is 0.038946 iteration is 3
Execution time is 0.258253 iteration is 4
Execution time is 0.038870 iteration is 5
Execution time is 0.008447 iteration is 6
Execution time is 0.008509 iteration is 7
Execution time is 0.000234 iteration is 8
Execution time is 0.000245 iteration is 9
Execution time is 0.000141 iteration is 10
Execution time is 0.001203 iteration is 11
Execution time is 0.002454 iteration is 12
Execution time is 0.000123 iteration is 13
Execution time is 0.033437 iteration is 14
Execution time is 0.000118 iteration is 15
Execution time is 0.000128 iteration is 16
Execution time is 0.000145 iteration is 17
Execution time is 0.001204 iteration is 18
Execution time is 0.006472 iteration is 19
Execution time is 0.000119 iteration is 20
Execution time is 0.000127 iteration is 21
Execution time is 0.033959 iteration is 22
Execution time is 0.000118 iteration is 23
Execution time is 0.000131 iteration is 24
Execution time is 0.043315 iteration is 25
Execution time is 0.000111 iteration is 26
Execution time is 0.000130 iteration is 27
Execution time is 0.253596 iteration is 28
Execution time is 0.079988 iteration is 29
root@rck00:~> mpirun -np 2 /shared/sendrecv/./a.out
Execution time is 0.000423 iteration is 0
Execution time is 0.005989 iteration is 1
Execution time is 0.000100 iteration is 2
Execution time is 0.000093 iteration is 3
Execution time is 0.000092 iteration is 4
Execution time is 0.258258 iteration is 5
Execution time is 0.040296 iteration is 6
Execution time is 0.048177 iteration is 7
Execution time is 0.000223 iteration is 8
Execution time is 0.000223 iteration is 9
Execution time is 0.000145 iteration is 10
Execution time is 0.001117 iteration is 11
Execution time is 0.001528 iteration is 12
Execution time is 0.000136 iteration is 13
Execution time is 0.000141 iteration is 14
Execution time is 0.000148 iteration is 15
Execution time is 0.000150 iteration is 16
Execution time is 0.000159 iteration is 17
Execution time is 0.000153 iteration is 18
Execution time is 0.000142 iteration is 19
Execution time is 0.005847 iteration is 20
Execution time is 0.000146 iteration is 21
Execution time is 0.000111 iteration is 22
Execution time is 0.253392 iteration is 23
Execution time is 0.328551 iteration is 24
Execution time is 0.038838 iteration is 25
Execution time is 0.000207 iteration is 26
Execution time is 0.000221 iteration is 27
Execution time is 0.033603 iteration is 28
Execution time is 0.000225 iteration is 29
thanks in advance,
Yilmaz
I am using sock channel (tcp/ip sockets) of mpich2.
Hello Yilmaz,
One of the advantages of the MPICH2 channels in RCKMPI is that their jitter is very small when compared to the sock channel, when running in the SCC.
The exact reason for the large variance in transfer times when communicating through sockets on the SCC is still unknown to me, but perhaps someone from Intel or the community can give you some insight.
- Isaías
Hello Yilmaz, Isaías,
Was any root cause discovered for wild variation in timing? I am also running into similar problems, and I guess, you or someone at Intel must know better.
The answer pretty much determines whether I can use SCC for time-sensitive experiments (at least I want to expect consistent timings).
Devendra

