I'm looking for help from someone knowledgeable in the memory subsystem of the quad core X5470 (DELL POWEREDGE 1950 SERVER).
In an application that regularly dispatches a group of tasks we have observed long stalls when the number of pages accessed by these tasks exceeds a certain amount (tens of thousands of pages). We understand that because so many tasks are dispatched by the kernel to all the cores simultaneously that there is going to be a large rate of TLB misses. However there seems to be a threshold beyond which the stalls increase non-linearly. We are wondering if there is a queue in the chipset or memory controller that is overflowing with memory requests (table walks from the TLB misses) and whether the cores denied access are using an exponential backoff to deal with the congestion. This would explain the non-linear increase in stall time. Any help from the experts in trying to understand what is happening would be appreciated. Thanks.