Ok I replicated your experiment, the only thing I can add at this point is that I getting a slightly better performance on 64k buffer size.
When I look at the time output, although 28.3 seconds elapsed - the system (user+sys values) only spent 6.7 sec to accomplish the task
The rest of the real time spent on other the other tasks happening in the background, unless I am misreading something.
> Lets see what other and Intel guys say.
A note that it wasn't that much of a performance increase for me for 64k vs 1M.
You refer to 17x MB again, when I calculate 512M / 6.7s I get a value of 76.4MB/sec ...
personally I would look towards quieting some of the background tasks to reduce the overall time.
for writing 512 MB in 64k block dd takes 27 secs which is 19.5 MB/sec
# time dd if=/dev/zero bs=64k count=8000 of=xx
27 seconds is time total time for me, it says 6.25 sec sys (which is kernel time) the remaining 20 seconds are probably wait for IO time which is not counted for CPU time in the kernel. There is no other task active.
so, when you check journalctl, you are saying there are no other tasks? Mine appeared to be quite chatty.
I am not stating it does account for the total of the 20 second difference, rather I am refuting your claim to no other tasks,
and only to the point that there may be some other methods for you to make some gains -- that was your objective.