We have been buying systems based on the S3XXX series of motherboards for a few years now. A continual problem that we have been experiencing with the later models of these boards has been the failure of the onboard RAID controller. We use the controller in RAID mode to create a simple hard drive mirror while running 2 additional non-raid hard drives for on-system backups. We have deployed a total of about 20 of these motherboards used in our low-end basic Windows 2003/2008 servers.
In the past 6 months alone, we have had well over a dozen RAID crashes on these machines. Only one of these was from a possibly bad hard drive. The rest of the instances happened due to this RAID controller being an utter piece of junk!
Most of these breaks occur upon a server reboot. If a server is force-rebooted (due to a lock-up) we have seen approximately 50% chance of the array coming up ‘degraded’. However, we have seen arrays break on clean shutdowns and restarts as well. We also have a number of instances where we believe the array broke with the OS running, which caused the OS to lock up.
The controllers and arrays are configured per Intel documentation. In all cases our servers have matching hard drives – Hitachi, Western Digital or Seagate. In most cases the firmware versions on the drives is also matching.
We have tried updating the Motherboard BIOS, the Controller Firmware, Hard Drive firmware, the OS and the software driver to latest versions and sometimes to ‘known good’ versions as recommended by Intel Tech Support. We have also tried turning off various caches on the controller, in the OS and on the hard disks. NOTHING HAS HELPED!
My techs have contacted Intel technical support a number of times. We are getting the runaround, ‘try this, do that, stand on one leg and try it again.’ Bottom line, the issue is unresolved and we have a ton of man hours invested in chasing our tails and having production servers down due to this problem!
About 3 weeks ago, we decided to test one of our problem servers by purchasing an entry-level Adaptec PCI Controller. Since installing that controller (with factory firmware and cache settings) we have stress-tested the server in every way we could think of – we yanked the power, we forced shutdowns, we ran 2 days of burn-in tests and abused the server beyond regular use. The array, with all same hardware and software (with addition of the Adaptec controller) had no issues whatsoever!!
I am very frustrated with the situation – a company like Intel should test products better. Light online research has revealed that we are only one end-user, amongst many others, having this same exact issue. Intel has failed to resolve this issue. At this point we feel that the only proper resolution is for Intel to either provide or compensate for the cost of a known good modular RAID controller to the end users that have this problem.
Tooday we are facing thousands of dollars in lost time, productivity and purchase price of new controllers for our servers. My techs have been working with Intel support for MONTHS on this and we have not come to a resolution. Unless we have a definitive solution to this problem by the end of this week, we will be filing a civil lawsuit for damages as a result of a non-function of an advertised component.