There has long been an ongoing battle on which operating system is superior to the other and with virtualization technology this battle is soon coming to an end. The truth is that no operating system is superior to the other. It is for example well known that Windows has some severe flaws at the low level when you look at things "under the hood" but it is unmatched when it comes to the abundance of software and computer games. It is also well known that ZFS which is found in Solaris based operating systems is a file system that is unmatched in terms of reliability and safety against data corruption, which is a growing concern as larger and more dense storage hardware has become less reliable in the past few years (many more hard drives have failed on me compared to 10 years ago). I'm very concerned about these issues and I can no longer trust a hard drive in a Windows environment to reliably keep my data. Linux has many advantages in terms of system resources efficiency and stability. This list of operating systems and their advantages/disadvantages can go on...
So why should I have to choose? Why can't I take advantage of all of these benefits from these operating systems and get the best of all worlds? The answer is that I can, by virtualization. In the past few years the world has seen exciting development in the Xen community and really powerful extensions that enhance the capabilities of a virtualization such as the Intel VT-x/ AMD-v and the Intel VT-d / AMD-Vi (IOMMU) have become widespread among desktop hardware whereas it has been commonplace among enterprise-level hardware for quite some time by now.
So it is quite evident that the role of an operating system is going to change considerably in the future. The operating system that runs on-the-metal is going to become a simplistic hypervisor that manages simplistic virtual machines. The operating systems as they are today will shrink into so-called wrappers that merely supply the frameworks required to run a particular piece of software (such as .net, Visual Runtime etc).
So there will be a separation between the hardware and the operating systems by an abstraction layer where different wrappers (that used to be operating systems) share the underlying hardware with each other. There will no longer be a question whether you use Windows, MacOS or Linux. You just use whatever you prefer as a base OS and use whatever is needed to run the applications you want, which in reality could mean that you run several operating systems simultaneously on the very same machine.
This separation has already begun, ZFS is a good example of that. The ZFS file system looks at the hard drives as a storage pool and the user is not concerned with the physical characteristics of the partitions and where the sectors begin or end. I didn't like it at first but later found this approach to be ingenious. So I see it as a natural step that the rest of the hardware will undergo the same transition. I also think a lot can be done with the UEFI framework in this regard.
The latest advancement in the virtualization technology is the set of IOMMU extensions which allows virtual machines to run directly on selected parts of the hardware on the host. This means that I can run say, Linux on-the-metal while playing Crysis 2 on a virtual machine that runs directly on the GPUs. Here's a video showing Unigine Heaven running on a virtual Windows machine inside Ubuntu on a dual GPU setup:
This is called PCI passthrough where PCI slots are passed through to the virtual machine or VGA passthrough where also the VGA-BIOS mappings are sorted out. In another setup I may want to run Windows on-the-metal and pass through a whole hard disk controller to a Solaris machine where I run a secured storage pool with redundancy (e.g. raidz3). For ZFS to give proper protection against data corruption it is an imperative that it runs directly on the hardware and not through a virtualized abstraction layer. There currently is no support for IOMMU on Windows hosts but that will change eventually, our hopes lie with hyper-v, VirtualBox and VMWare.
However, there is a lot to be done and the purpose of my post in these forums is to address this. For PCI passthrough and VGA passthrough to work it is a requirement that the hardware supports function level reset (FLR) which is a feature that allows the hardware to be reset and reinitialized at any time on a running machine (i.e. at function level). FLR is standard on QuadroFX cards and nVidia supply patches that enable FLR on Geforce cards upon request.
Another issue is that current virtualization technologies only support passthrough of entire GPUs to virtual machines and GPUs can currently only be shared through emulation which makes it impossible to run applications that rely on hardware accelerated 3D (such as DirectX games). This situation is pretty much the same as where the virtualization was before the VT-x/AM-v extensions were introduced. The CPU instructions had to be emulated on the VM which severely degraded the performance on that machine. When VT-x/AMD-v came, virtual machines could be run directly on the CPU with almost no overhead at all.
So I would like to suggest similar extensions that allow the GPUs to be shared over several machines just like CPUs can be shared via VT-x/AMD-v.
So my suggestions in short:
Awsome POST
Thanks, I hope hardware developers will share your opinion. It has been said that a picture says more than a thousand words:
Some advantages of using the technology I discussed above on a desktop computer:
For people who are interested in learning more about virtualization technology and issues related to silent data corruption (data gets corrupted on your hard drive without you knowing it), I provide links to research papers:
Additional reading about virtualization:
In virtualization the operating system that runs on-the-metal, or the host is called dom0 (or domain 0) whereas virtual machines are called domUs.
There are several different issues that have been worked on with the IOMMU extensions. One is passthrough of single-function vs multi-function devices. The problem used to be to get the entire multi-function device passed through to the domU, which is now resolved. Link: http://www.valinux.co.jp/documents/tech/presentlib/2009/jls/multi-function_b.pdf
For more information about VT-d and IOMMU, the following paper is a recommended read:
http://developer.amd.com/assets/IOMMU-ben-yehuda.pdf
More on VGA passthrough:
http://staff.science.uva.nl/~delaat/sne-2008-2009/p22/report.pdf
The Xen community maintains the following documentation resource pages on this subject:
http://wiki.xensource.com/xenwiki/XenVGAPassthrough
http://wiki.xensource.com/xenwiki/XenPCIpassthrough
http://wiki.xensource.com/xenwiki/XenUSBPassthrough
Additional information about data corruption (thanks Kebabbert for the links and info!):
Here is a whole PhD disertation showing that normal file systems are unreliable:
http://www.zdnet.com/blog/storage/how-microsoft-puts-your-data-at-risk/169
Dr. Prabhakaran stated in this paper that he found that ALL the file systems shared
...ad hoc failure handling and a great deal of illogical inconsistency in failure policy...such inconsistency leads to substantially different detection and recovery strategies under similar fault scenarios, resulting in unpredictable and often undesirable fault-handling strategies.
We observe little tolerance to transient failures;...none of the file systems can recover from partial disk failures, due to a lack of in-disk redundancy.
Regarding shortcomings in hardware RAID:
http://www.cs.wisc.edu/adsl/Publications/corruption-fast08.pdf
Detecting and recovering from data corruption requires protection techniques beyond those provided by the disk drive. In fact, basic protection schemes such as RAID [13] may also be unable to detect these problems.
..
As we discuss later, checksums do not protect against all forms of corruption
http://www.cs.wisc.edu/adsl/Publications/corrupt-mysql-icde10.pdf
Recent work has shown that even with sophisticated RAID protection strategies, the "right" combination of a single fault and certain repair activities (e.g., a parity scrub) can still lead to data loss [19].
CERN discusses how their data was corrupted in spite of hardware RAID:
http://storagemojo.com/2007/09/19/cerns-data-corruption-research/
Here is a whole site that only talks about the lacks and shortcomings in RAID-5:
Lacks and shortcomings in RAID-6:
http://kernel.org/pub/linux/kernel/people/hpa/raid6.pdf
The paper explains that the best RAID-6 can do is use probabilistic methods to distinguish between single and dual-disk corruption, eg."there are 95% chances it is single-disk corruption so I am going to fix it assuming that, but there are 5% chances I am going to actually corrupt more data, I just can't tell ", . I wouldn't want to rely on a RAID controller that takes gambles :-)
In other words, RAID-5 and RAID-6 are not safe at all and if you care about your data you should migrate to other solutions. In the past the disks were small and you were much less likely to run into problems. Today when the hard drives are big and RAID clusters are even bigger you are much more likely to run inte problems. Assume that there is a 0.00001% chance that you run into problems, if the hard drives are large and fast enough you will run into problems quite frequently.

