Problems running multiple processes on i7 720 qm

idata · ‎02-18-2010

I purchased a laptop from dell utilizing this processor with the purpose of running several computationaly intensive programs simultaniously. Unfortunately rather then designating each process to a core it tends to work on only one core- effectively running much slower than the 6yr old desktop it was to replace. Dell states it is the processor that controls the core selection but I see nothing in the bios that alows me to override any such added "efficiencies". Any ideas how to rectify this?

idata · ‎02-18-2010

Hi

A couple of suggestions that you might try to isolate the symptoms you saw...

You didn't say whether each of the "programs" you ran were single-threaded apps or multi-threaded app. For simplicity, let me assume you ran several single-threaded apps. I don't know if you're running under windows or Linux, I'll assume your brand new laptop probably came with Windows 7.

1. The first anomaly you want to verify and remedy relates to your observation that all the processes tend to run on one core. Your i7 720 QM has 4 cores and 8 logical processors. Task manager can give you an update about every second the cpu utilization of all 8 logical processors. If TM shows only one logical cpu is working (probably pinned at 100% utilization) while the other 7 logical processors are idling at low utilization. You probably want to focus on how to control program start that put unusual constraint to the OS scheduler that created this bottleneck.

I expect the OS would start a new process by inheriting the attribute of the parent (by default that should be making all 8 logical processors available for the OS scheduler to choose with each runnable task, you can verify this by highlight a process in TM's Process tab, right click to display the "set affinity" property, the number of checkmarks tell you how many logical processor is available to run this highlighted process). You should see the dialog box show eight boxes that can be checked or unchecked. You observation of all processes run on only one core would imply the dialog box may be showing only one checkmark? You want to make sure all eight boxes are checked!

2. After verifying all of your programs were started in the normal manner that each process has 8 logical processor available for the OS scheduler to choose. Then the question is how would the OS scheduler choose. With the TM displaying utilization on 8 logical cpu, you can use new cmd prompt to start each of your program in turn, and watch how the OS scheduler choose which logical processor it allocates for a new task.

If you ran into more complex situations dealing with multithreaded app and associated processaffinitymask property, you can send email to me offline to mailto:shihjong.kuo@intel.com shihjong.kuo@intel.com

MWong22 · ‎02-19-2010

+1000 points..

Perfectly Answered!~

idata · ‎02-19-2010

Thank you, this helps clear up a lot of confusion on my part; but doesnt solve the poor performance. The process in question is a 32 bit single threaded .exe

It is clearly spread accross the four to eight processors when called once. And each subsiquent call will be further spread across each processor rather than each one assigned to a processor as is done with other processors/OS in our lab. Unfortunately the BIOS does not enable me to turn off hyper threading.

idata · ‎02-19-2010

Also, I am wondering if recompiling the fortran code with a current intel compiler would aleviate this or will it also write it single threaded

idata · ‎02-25-2010

Hi

I believe you are describing the phenomena of thread migration. This happens even when you launched only one instance of your app.

I think there are 3 different things that may interest you

1. Can you stop thread migration and what's the best way to do that?

2. Once you solved the thread migration issue, does performance/throughput scale with number of cores as you run one to 4 instances of your app?

3. Can your app benefit from HT if you run two instances on the two sibling cpu in the same core?

The second part of item 1 is a bit complicated when you consider field deployment, so I start with a lab technique. You can simply uncheck 7 of the eight logical processors as you start each instance of your app, and make sure the only checked logical cpu for each instance of your app is distinct and correspond to a different core from the previous instance.

The correlation of cpu # in the set affinity dialog box with respect to distinct processor core is not trivial in general. You can download the reference code I posted in Intel Software Network http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/ http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/, compile it to run on your machine, the OS affinity mask value that you can use for each core can be identified more easily. And the position of the non-zero bits should correspond to the cpu # you see in the dialog box.

You should be able to observe the core-count performance scaling of your app as you launch one to 4 instances (each instance affinitized to a unique core to avoid migration). If you see the core count scaling becomes significantly sub-linear before you reach 4 instance on 4 separate core, one of the first thing to check might be how much memory traffic each instance of your app generates relative to the sustainable memory b/w your platform can support.

If you see perfect or nearly-perfect linear scaling with core counts up to 4 instances, then you can pursue other angles such as whether HT can benefit two instances of your app or whether you have some execessive L1 evictions when running two instances on the same core.

idata · ‎02-24-2010

I think in house Intel experts will do a much better job in helping you overcome the situation! Ask them during the Intel Live Chat event on the 26th of Feb- http://www.intellivechat.com/facebook.html