The only way to split load of single-threaded application to multiple execution units is a usage of the special middleware, which will do the job. Such technologies is developed for some enterprise-level applications, for example, Microsoft has a special driver for Excel which distribute calculations on grid network of nods joined in Microsoft HPC Cluster, which may be number of servers, desktop pcs during office off-hours and even processes on same multicore/multisocket server. But this is a special solution for one/limited number of applications (and costs a lot of bucks). Here is no magic key for any apllication in any system.
Actually, I just remembered that graphics processors use something similar when using multiple GPUs. Since no multi-threading system has been made for GPUs, they work as one really fast processor. It's possible to write software for desktop CPUs, so why hasn't anyone done it yet? I guess people are truely moving on to multithreaded programming.
Multi-GPU systems is not good example. In graphics rendering there are middleware - it`s graphics driver itself, and GPU vendors like AMD & NVIDIA was able of produce adequate results only when they started making support for nearly each game personally, via driver profiles system. And Multi-GPU systems are share the same workset, i.e. all cards/gpus loads whole rendering data into own memory, so on the CPU such tech will result with even multiplies of system memory consumption for apps.
GPGPU techs like OpenCL or CUDA have little to no kind of obfuscation of GPU internal architecture - the programmer just faced against a heap of execution units and then usual multithreaded programming begins.