Important update.
The Intel Matrix Storage Manager 8.9 has been replaced by:
Intel Rapid Storage Technology and the current version is 9.6.
http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&ProdId=2101&DwnldID=18859&lang=eng
64-bit Intel® RST Driver Files
http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&ProdId=2101&DwnldID=18861&lang=eng
The other thread “Random drive fails with new Rapid Storage Technology 9.6?” that was started after Rapid Storage Technology fixed the problem for most of us here for anyone still have the same problem with Rapid Storage Technology but are not having any problems with Matrix Storage Manager 8.8.
http://communities.intel.com/thread/8139?start=0&tstart=0
Matrix Storage Manager 8.8 should you need it.
http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=17412&lang=eng
64-bit Floppy
http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=17415&lang=eng
32-bit Floppy
http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=17413&lang=eng
Setup:
XP Professional SP3
Dual Intel ICH8R RAID 5
3 x ST380811AS 160GB RAID 5 OS
3 x ST3320620AS 640GB RAID 5 Storage
Problem happened on the OS array so far when using Matrix Storage Manager 8.9.0.1023.
http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&ProductID=2101&DwnldID=17882&strOSs=44&OSFullName=Windows* XP Professional&lang=eng
Here's what happened I downloaded, installed and rebooted to the newest Matrix Storage Manager 8.9.0.1023 then Winkey + L to lock computer over night. The next day a drive failed listed as port 1 in the console I know I should of replaced the drive but I did a rebuild and run a Volume Verification and Repair and got no errors.
Thinking it was fine I Winkey + L to lock computer over night again but the next day a drive failed BUT this time on port 2 in the console!
I have now reinstalled back to Matrix Storage Manager 8.8.0.1009 and rebuilt the array and run a Volume Verification and Repair with no errors and see if a drive fails to night.
So could their be a problem with the new Matrix Storage Manager RAID driver or could my OS array be about to fail?
The only way this will get resolved/looked at is if anyone here that went from Matrix Storage Manager 8.8 to 8.9 with a random drive failure when the drive is fine after going back to 8.8 for a week or more please do report this to Intel here:
http://supportmail.intel.com/scripts-emf/welcome.aspx?id=40
Line: Chipset Software
Product: Intel(R) Matrix Storage Manager
So far no drive failed when I went back to 8.8.0.1009.
I will give it some days then reinstall 8.9.0.1012 and see if a drive fails again.
Odd I'm sure it said 8.9.0.1012 when a downloaded it? Either way a did a hash check its the same as 8.9.0.1023 so same problem.
Just a note about error recovery in hard drives. A regular desktop-class drive is configured to be very persistent when it comes to data errors; this can cause problems in a RAID configuration. There's a good article about it on Wikipedia:
Time-Limited Error Recovery
Modern hard drives feature an ability to recover from some read/write errors by internally remapping sectors and other forms of self test and recovery. The process for this can sometimes take several seconds or (under heavy usage) minutes, during which time the drive is unresponsive. RAID controllers are designed to recognize a drive which does not respond within a few seconds, and mark it as unreliable, indicating that it should be withdrawn from use and the array rebuilt from parity data. This is a long process, degrades performance, and if a second drive should fail under the resulting additional workload, it can be catastrophic.
If the drive itself is inherently reliable but has some bad sectors, then TLER and similar features prevent a disk from being unnecessarily marked as "failed" by limiting the time spent on correcting detected errors before advising the array controller of a failed operation. The array controller can then handle the data recovery for the limited amount involved, rather than marking the entire drive as faulty.
http://en.wikipedia.org/wiki/Time-Limited_Error_Recovery
The article primarily addresses Western Digital's implementation of this feature, but it also mentions Seagate (Error Recovery Control [ERC]) and Samsung & Hitachi (Command Completion Time Limit [CCTL]). Here's another short article that specifically addresses Seagate:
What is Error Recovery Control?
http://www.hddoctor.net/what-is-error-recovery-control/
Western Digital has a utility to enable/disable TLER in their drives' firmware. However, it doesn't look like Seagate is so accomodating; their solution is to buy enterprise-class drives.
It could be that recent releases of the Intel Matrix Storage Manager software are less tolerant of these error-recovery delays. Some people have reported problems with 8.8 and 8.9 that were solved by dropping back to 8.7.
Thanks for the reply
I'm not having any problems with 8.8.0.1009 so far if that means anything.
What I don't get is a drive would fail over night (with 8.9.0.1023) when the system is idle...I could try an experiment where I give the drives something to do over night or do you think its a coincidence that two random drives failed over night?
Setup
Asus P6T deluxe v1
4x WD WD15001AALS
Vista 64
I have been using the above system now 8 months, updating to the latest version of IMSM each time a new version was released, The last update was April 2009 when 8.8.0.109 was released. All versions of IMSM worked fine until I updated to 8.9.0.1023 on the 18 July 2009 and within 4 hours I had two major system lockups where I had to use the reset button and go through the volume rebuild process each time. Since nothing else had changed on the system I quickly decided to go back to 8.8.0.1009, and since then I have had no problems - the system is as it was before I updated to 8.9.0.1023. I wasted a whole day on this problem.
I complained to Intel and got a quick reply, but unfortunately it was not much use. In summary they said that 8.9 had been fully tested, and had passed those tests, that they had no other reports of problems, and therefore there was nothing wrong with it! They told me to update the HDD firmware, the Intel Option ROM and the MB BIOS!
Because I downloaded 8.9 as soon as it appeared on the Intel website I checked again to see if it had been withdrawn or changed. I noticed that some of the download documentation referred to 8.9.0.1012 and other in places it referred to 8.8.0.1023. There was obviously some confusion inside Intel. I contacted customer support again and asked whether 8.9.0.1023 was the correct version. They replied that 8.9.0.1023 was the correct version, and that was the version that downloaded, and that the mention of 8.8.0.1012 was simply a mistake on the website. They also reiterated that there is nothing wrong with 9.9.0.1023.
Over the years I have probably used all versions of IMSM and IAA and have only had a problem with one other version which they acknowledged. With other users now having the same problem Intel have obviously dropped the ball again - they need to recognize it! Everyone who has the problem should complain to Intel and perhaps they will the acknowledge the problem and do something about it. Otherwise we may find the same issue being rolled over into 8.10
System:-
Asus P6T Deluxe v1
4x WD5001AALS in RAID10
Vista 64
I built the above system 8 months ago and have used each IMSM as soon as it was posted on the Intel website. Version 8.8.0.1009 was installed in April and worked flawlessly until I updated to 8.9.0.1023. Within 4 hours of installing 8.9 I had two major system freezes which I could only get out of by using Reset and waiting for the volume to rebuild. Each time IMSM reported different drives had failed, either port 2 or 3. Nothing else had changed on the system and I was not doing anything different or difficult on the system when it froze. I quickly decided that I had had enough of 8.9 and went back to 8.8.0.1009. Since then (several days) I have had no problems at all - the system is behaving as it did before I put on 8.9.
Annoyed about the whole day I lost recovering from these problems I complained to Intel. In summary, Intel replied that 8.9 had been fully tested, they had no other reports of problems and that there was nothing wrong with it! They then went on to advise me to update the HDD firmware, the Intel Option ROM and MB BIOS! Their reply was arrogant, some would say rude.
Needless to say I have not done any of their suggested updates and the system is now working just fine on 8.8.0.1009. Also as far as I know I have the latest firmwares and BIOS anyway.
With others reporting exactly the same problems there is obviously bug in 8.9 which Intel need to acknowledge and fix. Everyone who has experienced this problem should complain to Intel, otherwise it wont get fixed, and may even get carried over into 8.10.
PeterUK wrote:
What I don't get is a drive would fail over night (with 8.9.0.1023) when the system is idle...I could try an experiment where I give the drives something to do over night or do you think its a coincidence that two random drives failed over night?
Is your computer set to do a virus scan or a disk defrag in the middle of the night? Perhaps the drives in question have some marginal sectors. When the drives tried to read/write data at those locations, they had trouble and went into error-recovery mode. This took longer than IMSM 8.9 was willing to wait, so the software flagged them as failed and took them out of the RAID. That does seem to be a bit much for mere coincidence, but it's still a possibility.
As a test, you could tell Windows to perform an error-check on the drives, selecting "Automatically fix file system errors" and "Scan for and attempt recovery of bad sectors". This will take a while to complete, especially with large RAID arrays. If your drives have bad sectors that aren't locked out, this should find them and fix the problem. I recommend that you back up your critical data beforehand, just to be safe.
Nope just nothing at night... I don't know maybe if it happen in the day then I would think its this Time-Limited Error Recovery thing but I'm sure its more then just that in this version.
I took the drives out of RAID to run them as SATA mode so I can see the S.M.A.R.T and all 6 drives seem fine no Reallocated Sectors for the 3 x ST380811AS and only one of the three ST3320620AS had 169 Reallocated Sectors but that's part of the storage array not the OS array where the problem happened.
I did a "Automatically fix file system errors" and "Scan for and attempt recovery of bad sectors" on both arrays some months back but could run just to be sure anyway.
In the system log when it happened both times I got lots of this:
The device, \Device\Ide\iaStor0, did not respond within the timeout period.
PeterUK wrote:
In the system log when it happened both times I got lots of this:
The device, \Device\Ide\iaStor0, did not respond within the timeout period.
One of the drives in your RAID array is busy doing something; it would be nice if the log recorded more detail....
Check the data cables on the drives and make sure they're fully seated. You could also try replacing the cables with known good ones to see if that helps things.
Since the arrays have been running fine for four days when going back to 8.8. its not a cable problem just like JefUK did.
I'm going to reinstall 8.9.and run some tests to me its like the driver for the RAID has nothing to do when the drives are idle and it randomly fails a drive.
But if you think about it could the problem be with the manager (IAANTmon.exe) so what I might do is have the 8.9. driver and manager 8.8. I can't see any risk in doing this.
The problem I had was not a timeout error, there were no timeout error messages. I have had timeout errors on other systems, sometimes they have been caused by drive problems, but I have also have had them due to a faulty PSU (which took a long time to find). It does not necessarily follow that time out errors are caused by faulty HDD's.
The only messages I got were warnings about the volume being rebuilt. The fact that my system has worked perfectly on 8.6, 8.7 and 8.8, failed with 8.9, and now works perfectly with 8.8 again I believe is proof, as conclusive as it can be, that the problem lies with 8.9. For the short time I was using 8.9 I had a feeling that the system was not as responsive as it was with 8.8 - but that may be imagination. 8.9 appeared to install correctly and IMSM reported all the various modules as 8.8.0.1023.
in the week that passed since i installed v8.9 (as an upgrade over v8.8) i had about 6 system freezes and today 1 hdd volume marked as degraded (and its hdd labelled as with errors) for no apparent reason, except for the fact that the windows system event log contains a few events of type event id 9: "The device, \Device\Ide\iaStor0, did not respond within the timeout period."
the freezes: when they happen, the mouse remains responsive, but the OS refuses to do any action, not even alt+tab has any effect (but is recognized and the task switcher overlay appears), during this time the hdd light flashes very lightly, about every 2 seconds, staying lit for only a few milliseconds
today i had another of these freezes, but after waiting a while and then pressing the reset button, i was met by the message that the drive is being marked as with errors. now i'm in the middle of rebuilding a mirror volume on it.
After it finishes rebuilding i'll install v8.8 back.
Quotes from the report created by the manager console:
Kit Installed: 8.9.0.1023
Kit Install History: 8.9.0.1023, Uninstall
Shell Version: 8.9.0.1023
OS Name: Microsoft Windows XP Professional
OS Version: 5.1.2600 Service Pack 3 Build 2600
System Manufacturer: ASUSTeK Computer INC.
System Model: P5B-Premium
Processor: Intel Pentium III Xeon processor ~2507 MHz
BIOS Version/Date: American Megatrends Inc. 1102 , 07/14/2008
Language: ENU
btw the CPU is an intel E5200, s-spec: SLAY7, it is wrongly detected as a Xeon by the Matrix Storage Console in that system report. WinXPSP3 has all the updates available from Microsoft Update applied.
Array_0000
Status: Rebuilding
Hard Drive Data Cache Enabled: Yes
Size: 1192.3 GB
Free Space: 0 GB
Number of Hard Drives: 2
Hard Drive Member 1: WDC WD6401AALS-00L3B2
Hard Drive Member 2: WDC WD6401AALS-00L3B2
Number of Volumes: 2
Volume Member 1: stripe
Volume Member 2: mirror
stripe
Status: Normal
System Volume: Yes
Volume Write-Back Cache Enabled: No
RAID Level: RAID 0 (striping)
Strip Size: 64 KB
Size: 600 GB
Physical Sector Size: 512 Bytes
Logical Sector Size: 512 Bytes
Number of Hard Drives: 2
Hard Drive Member 1: WDC WD6401AALS-00L3B2
Hard Drive Member 2: WDC WD6401AALS-00L3B2
Parent Array: Array_0000
mirror
Status: Rebuilding: 52% complete
System Volume: No
Volume Write-Back Cache Enabled: No
RAID Level: RAID 1 (mirroring)
Size: 296.1 GB
Physical Sector Size: 512 Bytes
Logical Sector Size: 512 Bytes
Number of Hard Drives: 2
Hard Drive Member 1: WDC WD6401AALS-00L3B2
Hard Drive Member 2: WDC WD6401AALS-00L3B2
Parent Array: Array_0000
Hard Drive 0
Usage: Array member
Status: Normal
Device Port: 0
Device Port Location: Internal
Current Serial ATA Transfer Mode: Generation 2
Model: WDC WD6401AALS-00L3B2
Serial Number: --------------not posted here-------------------
Firmware: 01.03B01
Native Command Queuing Support: Yes
Hard Drive Data Cache Enabled: Yes
Size: 596.1 GB
Physical Sector Size: 512 Bytes
Logical Sector Size: 512 Bytes
Number of Volumes: 2
Volume Member 1: stripe
Volume Member 2: mirror
Parent Array: Array_0000
Hard Drive 1
Usage: Array member
Status: Normal
Device Port: 1
Device Port Location: Internal
Current Serial ATA Transfer Mode: Generation 2
Model: WDC WD6401AALS-00L3B2
Serial Number: --------------not posted here-------------------
Firmware: 01.03B01
Native Command Queuing Support: Yes
Hard Drive Data Cache Enabled: Yes
Size: 596.1 GB
Physical Sector Size: 512 Bytes
Logical Sector Size: 512 Bytes
Number of Volumes: 2
Volume Member 1: stripe
Volume Member 2: mirror
Parent Array: Array_0000
Hi aditza welcome to the problem.
I've done a test with 8.9. driver and manager 8.8 problem happened again so I'm now trying 8.8. driver and manager 8.9 just to see if its just driver. If it runs fine for a week with this setup then I report my finding to Intel to pull the driver off the site before many people replace good drives.
You would think they have someone over looking these posts from time to time just to see how real a problem this is?
follow-up: i installed v8.8.0.1009 again and did a full verification of both volumes (stripe and mirror), the verification took about three hours but it seems i got away clean, no errors whatsoever, ZERO!
i'll have to wait and see if it freezes again, but since i've been running the v8.8 for more than a month on this machine and didn't have any problem before deciding to try v8.9, i think that they won't happen again and that 8.9 was the source of the freezes. i'll post back in a week or two if i had any more freezes or not.

