You can see that naturally the time decreases when the code is executing on more processors. These simulations are realized with a dimension : 128*128*64. You can see here the params.
The decreased time factor is : 0.5827 not exactly the
ideal scaling of 0.5 (the red curve) because of the communication between
processors which increases the computation time with the number of processor.
This is excellent parallel scaling behavior, especially considering the
non-local nature of FFT.
Although, the improvement is linear in a log scale : the coefficient of determination ( R2 ) is nearby one, i.e. it's a very good linear regression in log scale.
Moreover, the number of processors is limited by the dimension in Z direction because of the FFTW Library ( see the documentation ).