I was using Intel Matrix Storage Console to manage a RAID 5 volume on my "ICH8R/ICH9R/ICH10R/DO/PCH SATA RAID Controller" (at least that's how it reads in Device Manager), and I was running a RAID 5 volume without issue until I experienced a drive failure a couple weeks ago. All three drives are Seagate, and I warranteed the failed drive. After an extended wait I finally have the refurbished drive back from Seagate. I put the drive back into my machine, wired it up as normal, and now the Intel controller utility (when I first start the machine, following a successful POST) shows 2 member drives, and one non-raid member drive. I entered the utility, and it doesn't give me an option to rebuild the array, only to delete it, which would DELETE the volume AND it's data, which, in my mind, defeats the purpose of RAID 5 altogether.
Now, we come to the point of the issue. The definition in Intel Storage Matrix Console's help for a "failed" array is as follows: "A RAID 5 volume is reported as 'Failed' when more than one member has failed. If this occurs, please follow the procedure shown below. This procedure deletes the failed RAID 5 volume and creates a new RAID 5 volume; it does not recover the failed RAID 5 volume and its data. After the new RAID 5 volume has been created, you must restore the data from backups and install any software that was on the RAID 5 volume."
While I understand the definition, only 1 of my 3 member drives have failed, and honestly, I cannot reasonably afford to lose the data on my array. Intel Storage Matrix Console's help also gives this procedure to recover a degraded array: "In the device pane, right-click on the new non-RAID hard drive and select 'Rebuild to this Hard Drive'." I have no such option. All I have is "Mark as Spare", and "Activate Port LED", since the utility has flagged this as a "FAILED" array.
I've attached a screenshot of exactly how my utility appears, and my System Report exported from Intel Matrix Storage Console follows. I really need to find a way to recover my data, please help me!!!
This really isn't a fix to whats going on with the Intel Matrix software, but here is how I was able to recover the data on my failed RAID 5 array. I had just about given up and was getting ready to wipe the drives when I decided to look at a data recovery app I had purchased some time ago for an accidentally formatted drive. This software (Zero Assumption Recovery) has the ability to also recover data from failed RAID arrays (both RAID 0 and RAID 5). In order to use it, I first had to take the remaining original 3 drives of the array and delete the RAID array that was setup (I did this through the boot time interface as some people have suggested it won't affect the data on the drives). Once you boot back into Windows and start up the ZAR software, it will be able to see the 3 drives that made up the array and after quite a few hours, it will have scanned and should have recovered your data. The trial lets you run the full scan, but only gives you the ability to restore 4 items or folders (can't remember). The software didn't fully reconstruct the folder path that I had on the drive before, but I was able to find all of my files that weren't in the orginial folder as well. Since my array was 3TB, I had used GPT instead of MBR and I'm not sure if this caused the folder structure to not be fully recognized. To avoid this issue in the future, I have built a new server with a dedicated RAID controller (LSI MegaRAID SAS 6260-4i) and 4 2TB drives in a RAID 5 configuration. On my system with the Intel Matrix RAID controller, I switched from a RAID 5 configuration to a RAID 10. I went from 3TB to 2TB in storage, but it should be a bit faster and more reliable since no calcuations are needed for the bit parity. Hopefully this method will work for you guys as well so you can recover any of the data you have lost.
FYI, the link to the ZAR site is http://www.z-a-recovery.com/
Thanks very much for reply!
Intel is useless - i did as you told - everithing restored! only difference i have user Diskinternals raid restoration software : http://www.diskinternals.com/raid-recovery/
Looks like intel's raid onboard implementation is crap and should be avoided.. I have never had any issues like this with proper RAID setups (Adaptec etc.)
Had a similar problem last month and also a couple of days ago on an ICH8R chipset. Believe the "failed" situation was caused by an intermittent problem with a cd burner while trying to burn a CD, but below is the procedure I used to make the array clean again without recovery software...
When system boots, hit ctl-I to enter the bios utility. A warning screen will pop up at the top of the screen. Don't remember the exact verbage on this screen, but it says in effect that the array failed, do you want to fix it... Answer yes and the array status will change from failed to degraded. (You may have tophysically unplug power for 30 secs or so first to get the controller to totally reset - don't have enough permutations of the problem to be sure of this...)
Boot the system, open the Matrix Storage Manager, right click on the drive with the red x, and select "mark as normal"
Wait for raid to rebuild.
I believe the problem in my situation was caused by a flaky or dying DVD Burner. Historically, I have found that hard errors on a CD or DVD burner will really hang the OS (at least XP) at a very low level in the OS to the point it won't respond to ctl-alt-del and the only solution is red button reset or power. I'm thinking that this low level hang is messing with the low level OS driver software that controls the ICH chipset and the delay is making it think a drive or array went offline and "failed". I replaced the DVD burner two days ago, and will probably take 6-8 weeks to be pretty sure the problem is solved.
While your situation might not be caused by the same thing, I believe that any low-level software driver hang in the OS could produce a failed array and this is probably a pitfall of having an OS software driven RAID controller. Would be nice if Intel would confirm this as a possible limitation...
Well, I have many problems with different motherboards with the chipset ICH8R when using RAID 1, 5 and 10!
Always get various disk errors and all disks were OK when testing after removed from array. On a PC with 2xHDD and RAID1, I changed 5 times the disks!
With another PC, with RAID10 and 4 disks, I can't remembers how many disks I changed too. Very strange when I have the disk1 fail and after a while (some reboots) the disk is OK and the 4th disk fails. After some time all disks are OK, but when restarting, all disks appears offline!
Too much errors to be true!
It is URGENT to get a solution; we paid for something that is not working!
I'm not sure if this is going to help, but I give it a go:
I had similar problems with RAID1. Drive 2 failed regularly. This happened both with Seagate Barracuda ST3xxxxxxAS drives as with Fujitsu 2.5 MHW20xxBK drives.
In all cases I found only one drive being really bad (thermal tripped). The other cases the drives all were found okay.
After reformatting I could use the drives again.
Repairing failed drives: use the Seatools to do a destrictive test and format (Seagate drives) or use the apprpriate tools from other manufacturers, eventually format the drive again if the tool doesn't have an option to format the drive.
Read the logs (applications and system) to find out about possible causes.
I found many references to HP CUE service having caused problems as Windows Search also did.
Error reports also both disk and ftdisk having failed (after an error on drive 0 in iastor0).
Peculiarly: the logs show the culprit not being the second drive, but the first drive to momentarily go wrong causing the other drive to fail!!!
After disabling both Windows Search and HP CUE serivice these errors didn't show up again.
In all cases I also disabled the write back cache on all drives in the Storage Manager (one of the first ations I took, but it didn't solve the problem until I disabled HP CUE service and Windows Search).
Note: HP CUE service is part of a software bundle with various HP printers. The CUE service is used for detecting new devices, which is not a necessity.
The program (digital monitor in the system task bar) is a notorious one and on the HP forum well known for its bad manners (both HP not doing a thing about it and the program itself).
Disabling the Search program may not have been necessary, though the result seems to show this program also have its bad habits. The Search companion this works after disbling, but the new Search program can't be used (no big deal I presume).
Hopefully this will help everyone meeting problems to overcome them.
WelmoedJ, thank you for your reply.
I used different disks too, Seagate, Western Digital, Maxtor, Samsung... and one of them was a Western Digital Enterprise, supposed to do not break easly, but even that was not successful.
Then, the problem IS NOT the disks, definitely!
I do not use HP CUE. I also have the write back cache in all drives disabled in the Storage Manager (never been enabled). The search was enable but I'll try to disable it next time, even if I do not see how could it interfere on the problem and I want to use all the system. Search is an important issue.
After reading many opinions around, I think there is a problem in that chip, some bug killing the communication...
Today I removed the RAID10 and installed Windows in a single disk. The other 3 disks are out of the array too (but in use) and lets see if they fail... but the security is OFF and is is not that the purpose!
Intel instructs us after a RAID 5 drive failure to add the replacement drive as a hot spare in the boot-time RAID Option ROM (CTRL i) utility and once Windows has booted, the array will automatically be rebuilt. My problem though is that when I replaced my drive and booted my server, the 4 drive RAID 5 array's status was "FAILED" and showed 2 member drives missing, although just one drive failed and both were listed (the replacement drive and the other array member reported as missing) as available to be added as spares.
This behavior is documented in the Rapid Storage Technology 10.8.0.1003 release notes found here: http://downloadmirror.intel.com/20624/eng/iata_10.8.0.1003_ReleaseNotes.htm
Apparently, that release resolves defects 3234763 and 3235920 and directs those looking for more details to the following documents: IBP #487724 “Intel® Rapid Storage Technology: RAID 5 Volume Data Loss Exposure” and IBP #487730 “Intel® Rapid Storage Technology: 4TB Disk Data Loss Exposure”. I have looked everywhere for the location or any other citing without success. Supposedly, the RST 10.5 and 10.6 updates introduced this bug (they've since been removed from the site) and I was running 10.5 after upgrading from Matrix Storage 8.9 when my failure occurred.