I am making some tests with the rdtsc instruction. My objective is a light instrumentation, adding as less extra clock cycles as possible, because I have to get the value of the Time Stamp Counter every time a function is called or it ends. I have calculated that rdtsc takes, in my computer, about 84 clock cycles. Besides, I have read that, without using cpuid, rdtsc may not give accurate results, because of the out of order execution. But using cpuid is very expensive (about 200 cycles), and that would ruined my timing application, because of the overcost.
Is it true that using rdtsc without cpuid would give me bad results? How bad would they be? Would be there any solution? I've read that using "jmp $+2" put the CPU back in order too and doesn't run 200 clocks.
I hope that you understand what I talk about, sorry for my english.