Intel VTune Analyzer is a nice tool, but it's commercial and not cheap :-) I haven't worked with gprof, but looks like it's . In this thread on Stackoverflow they list quite a few: unix - What can I use to profile C++ code in Linux? - Stack Overflow
Thanks for the pointers; lots of discussions on stackoverflow. I tried the perf/tools in Eclipse and it seems like there is a lot there. (Perf seems a historic favorite of Linus - http://marc.info/?l=git&m=126262088816902&w=2).
I also wanted to run tests directly on Galileo, so I instrumented code with my own timers / counters / clocks. Based on those results, I was able to rework some code (I/O, network, USB, and computation code) for better performance.
I'd like to carry performance tests forward for comparison on future multi-core/multi-threaded chips as well.
Grumble grumble - performance - grumble...
I am very happy with the single core Quark - it handles everything I have thrown at it, and fast too.
But in the back of my mind - I would love to see an Atom multi-core SOC on the Galileo board.
I have a lot more interest in this board than "just an embedded device". It performs very well indeed as a general purpose small computer, with the added benefit of pin compatibility with Arduino shields. Come up with a faster solution than the current Cypress multiplexer though.
My approach was to precede code with a startClock(), and follow with a stopClock() routine, using clock() and gettimeofday() for measurement. I calculated the differences (e.g. microsecond math), stored in arrays, and aggregated totals. The rest of the effort was in formatting / displaying the results. The downside of this approach is it's knitted into sources fairly tightly. I saw big roadblocks in processing images, which I largely overcame after running these timings.
I was grabbing images from a webcam, analyzing them, and storing images to SD (and USB drives) before serving those up via NGINX (all on the Galileo.) It worked well, but saving files was a big delay / choke point. After running the profiling, and seeing big bottlenecks, I developed an "adaptive rate" approach - still able to analyze the full image, but saving compressed images (greatly reduced in size, but still recognizable) except when full image size was needed.
Dart is introducing Zones, which could be useful in profiling / performance management - anyone interested in working towards getting Dart running on the Galileo? Zones | Dart: Structured web apps