1 2 3 36 Previous Next 526 Replies Latest reply: Sep 15, 2012 7:30 AM by paludo RSS

Random drive fails with new Matrix Storage Manager 8.9

PeterUK Community Member
Currently Being Moderated

Important update.

 

The Intel Matrix Storage Manager 8.9 has been replaced by:

 

Intel Rapid Storage Technology and the current version is 9.6.

http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&ProdId=2101&DwnldID=18859&lang=eng

64-bit Intel® RST Driver Files

http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&ProdId=2101&DwnldID=18861&lang=eng

 

The other thread “Random drive fails with new Rapid Storage Technology 9.6?” that was started after Rapid Storage Technology fixed the problem for most of us here for anyone still have the same problem with Rapid Storage Technology but are not having any problems with Matrix Storage Manager 8.8.

http://communities.intel.com/thread/8139?start=0&tstart=0

 

 

Matrix Storage Manager 8.8 should you need it.

http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=17412&lang=eng

64-bit Floppy

http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=17415&lang=eng

32-bit Floppy

http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=17413&lang=eng

 

Setup:

XP Professional SP3

Dual Intel ICH8R RAID 5

3 x ST380811AS 160GB RAID 5 OS

3 x ST3320620AS 640GB RAID 5 Storage


Problem happened on the OS array so far when using Matrix Storage Manager 8.9.0.1023.

http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&ProductID=2101&DwnldID=17882&strOSs=44&OSFullName=Windows* XP Professional&lang=eng


Here's what happened I downloaded, installed and rebooted to the newest Matrix Storage Manager 8.9.0.1023 then Winkey + L to lock computer over night. The next day a drive failed listed as port 1 in the console I know I should of replaced the drive but I did a rebuild and run a Volume Verification and Repair and got no errors.


Thinking it was fine I Winkey + L to lock computer over night again but the next day a drive failed BUT this time on port 2 in the console!


I have now reinstalled back to Matrix Storage Manager 8.8.0.1009 and rebuilt the array and run a Volume Verification and Repair with no errors and see if a drive fails to night.


So could their be a problem with the new Matrix Storage Manager RAID driver or could my OS array be about to fail?

 

The only way this will get resolved/looked at is if anyone here that went from Matrix Storage Manager 8.8 to 8.9 with a random drive failure when the drive is fine after going back to 8.8 for a week or more please do report this to Intel here:

http://supportmail.intel.com/scripts-emf/welcome.aspx?id=40

Line: Chipset Software

Product: Intel(R) Matrix Storage Manager

  • 1. Re: Random drive fails with new Matrix Storage Manager 8.9.0.1023
    PeterUK Community Member
    Currently Being Moderated

    So far no drive failed when I went back to 8.8.0.1009.

     

    I will give it some days then reinstall 8.9.0.1012 and see if a drive fails again.

  • 2. Re: Random drive fails with new Matrix Storage Manager 8.9.0.1023
    PeterUK Community Member
    Currently Being Moderated

    Odd I'm sure it said 8.9.0.1012 when a downloaded it? Either way a did a hash check its the same as 8.9.0.1023 so same problem.

  • 3. Re: Random drive fails with new Matrix Storage Manager 8.9.0.1012
    Crazy_Train Community Member
    Currently Being Moderated

    Just a note about error recovery in hard drives. A regular desktop-class drive is configured to be very persistent when it comes to data errors; this can cause problems in a RAID configuration. There's a good article about it on Wikipedia:

     

    Time-Limited Error Recovery

     

    Modern hard drives feature an ability to recover from some read/write errors by internally remapping sectors and other forms of self test and recovery. The process for this can sometimes take several seconds or (under heavy usage) minutes, during which time the drive is unresponsive. RAID controllers are designed to recognize a drive which does not respond within a few seconds, and mark it as unreliable, indicating that it should be withdrawn from use and the array rebuilt from parity data. This is a long process, degrades performance, and if a second drive should fail under the resulting additional workload, it can be catastrophic.

     

    If the drive itself is inherently reliable but has some bad sectors, then TLER and similar features prevent a disk from being unnecessarily marked as "failed" by limiting the time spent on correcting detected errors before advising the array controller of a failed operation. The array controller can then handle the data recovery for the limited amount involved, rather than marking the entire drive as faulty.

     

    http://en.wikipedia.org/wiki/Time-Limited_Error_Recovery

     

    The article primarily addresses Western Digital's implementation of this feature, but it also mentions  Seagate (Error Recovery Control [ERC]) and Samsung & Hitachi (Command Completion Time Limit [CCTL]). Here's another short article that specifically addresses Seagate:

     

    What is Error Recovery Control?
    http://www.hddoctor.net/what-is-error-recovery-control/

     

    Western Digital has a utility to enable/disable TLER in their drives' firmware. However, it doesn't look like Seagate is so accomodating; their solution is to buy enterprise-class drives.

     

    It could be that recent releases of the Intel Matrix Storage Manager software are less tolerant of these error-recovery delays. Some people have reported problems with 8.8 and 8.9 that were solved by dropping back to 8.7.

  • 4. Re: Random drive fails with new Matrix Storage Manager 8.9.0.1023
    PeterUK Community Member
    Currently Being Moderated

    Thanks for the reply


    I'm not having any problems with 8.8.0.1009 so far if that means anything.


    What I don't get is a drive would fail over night (with 8.9.0.1023) when the system is idle...I could try an experiment where I give the drives something to do over night or do you think its a coincidence that two random drives failed over night?

  • 5. Re: Random drive fails with new Matrix Storage Manager 8.9.0.1023
    JefUK Community Member
    Currently Being Moderated

    Setup

    Asus P6T deluxe v1

    4x WD WD15001AALS

    Vista 64

     

    I have been using the above system now 8 months, updating to the latest version of IMSM each time a new version was released, The last update was April 2009 when 8.8.0.109 was released. All versions of IMSM worked fine until I updated to 8.9.0.1023 on the 18 July 2009 and within 4 hours I had two major system lockups where I had to use the reset button and go through the volume rebuild process each time. Since nothing else had changed on the system I quickly decided to go back to 8.8.0.1009, and since then I have had no problems - the system is as it was before I updated to 8.9.0.1023. I wasted a whole day on this problem.

     

    I complained to Intel and got a quick reply, but unfortunately it was not much use. In summary they said that 8.9 had been fully tested, and had passed those tests, that they had no other reports of problems, and therefore there was nothing wrong with it! They told me to update the HDD firmware, the Intel Option ROM and the MB BIOS!

     

    Because I downloaded 8.9 as soon as it appeared on the Intel website I checked again to see if it had been withdrawn or changed. I noticed that some of the download documentation referred to 8.9.0.1012 and other in places it referred to 8.8.0.1023. There was obviously some confusion inside Intel. I contacted customer support again and asked whether 8.9.0.1023 was the correct version. They replied that 8.9.0.1023 was the correct version, and that was the version that downloaded, and that the mention of 8.8.0.1012 was simply a mistake on the website. They also reiterated that there is nothing wrong with 9.9.0.1023.

     

    Over the years I have probably used all versions of IMSM and IAA and have only had a problem with one other version which they acknowledged. With other users now having the same problem Intel have obviously dropped the ball again - they need to recognize it! Everyone who has the problem should complain to Intel and perhaps they will the acknowledge the problem and do something about it. Otherwise we may find the same issue being rolled over into 8.10

  • 6. Re: Random drive fails with new Matrix Storage Manager 8.9.0.1023
    JefUK Community Member
    Currently Being Moderated

    System:-

     

    Asus P6T Deluxe v1

    4x WD5001AALS   in RAID10

    Vista 64

     

    I built the above system 8 months ago and have used each IMSM as soon as it was posted on the Intel website. Version 8.8.0.1009 was installed in April and worked flawlessly until I updated to 8.9.0.1023. Within 4 hours of installing 8.9 I had two major system freezes which I could only get out of by using Reset and waiting for the volume to rebuild. Each time IMSM reported different drives had failed, either port 2 or 3. Nothing else had changed on the system and I was not doing anything different or difficult on the system when it froze. I quickly decided that I had had enough of 8.9 and went back to 8.8.0.1009. Since then (several days) I have had no problems at all - the system is behaving as it did before I put on 8.9.

     

    Annoyed about the whole day I lost recovering from these problems I complained to Intel. In summary, Intel replied that 8.9 had been fully tested, they had no other reports of problems and that there was nothing wrong with it! They then went on to advise me to update the HDD firmware, the Intel Option ROM and MB BIOS! Their reply was arrogant, some would say rude.

     

    Needless to say I have not done any of their suggested updates and the system is now working just fine on 8.8.0.1009. Also as far as I know I have the latest firmwares and BIOS anyway.

     

    With others reporting exactly the same problems there is obviously bug in 8.9 which Intel need to acknowledge and fix. Everyone who has experienced this problem should complain to Intel, otherwise it wont get fixed, and may even get carried over into 8.10.

  • 7. Re: Random drive fails with new Matrix Storage Manager 8.9.0.1012
    Crazy_Train Community Member
    Currently Being Moderated

    PeterUK wrote:

     

    What I don't get is a drive would fail over night (with 8.9.0.1023) when the system is idle...I could try an experiment where I give the drives something to do over night or do you think its a coincidence that two random drives failed over night?

     

    Is your computer set to do a virus scan or a disk defrag in the middle of the night? Perhaps the drives in question have some marginal sectors. When the drives tried to read/write data at those locations, they had trouble and went into error-recovery mode. This took longer than IMSM 8.9 was willing to wait, so the software flagged them as failed and took them out of the RAID. That does seem to be a bit much for mere coincidence, but it's still a possibility.

     

    As a test, you could tell Windows to perform an error-check on the drives, selecting "Automatically fix file system errors" and "Scan for and attempt recovery of bad sectors". This will take a while to complete, especially with large RAID arrays. If your drives have bad sectors that aren't locked out, this should find them and fix the problem. I recommend that you back up your critical data beforehand, just to be safe.

  • 8. Re: Random drive fails with new Matrix Storage Manager 8.9.0.1023
    PeterUK Community Member
    Currently Being Moderated

    Nope just nothing at night... I don't know maybe if it happen in the day then I would think its this Time-Limited Error Recovery thing but I'm sure its more then just that in this version.

     

    I took the drives out of RAID to run them as SATA mode so I can see the S.M.A.R.T and all 6 drives seem fine no Reallocated Sectors for the 3 x ST380811AS and only one of the three ST3320620AS had 169 Reallocated Sectors but that's part of the storage array not the OS array where the problem happened.

     

    I did a "Automatically fix file system errors" and "Scan for and attempt recovery of bad sectors" on both arrays some months back but could run just to be sure anyway.

     

    In the system log when it happened both times I got lots of this:

     

    The device, \Device\Ide\iaStor0, did not respond within the timeout period.

  • 9. Re: Random drive fails with new Matrix Storage Manager 8.9.0.1023
    Crazy_Train Community Member
    Currently Being Moderated

    PeterUK wrote:

     

    In the system log when it happened both times I got lots of this:

     

    The device, \Device\Ide\iaStor0, did not respond within the timeout period.

     

    One of the drives in your RAID array is busy doing something; it would be nice if the log recorded more detail....

     

    Check the data cables on the drives and make sure they're fully seated. You could also try replacing the cables with known good ones to see if that helps things.

  • 10. Re: Random drive fails with new Matrix Storage Manager 8.9.0.1023
    PeterUK Community Member
    Currently Being Moderated

    Since the arrays have been running fine for four days when going back to 8.8. its not a cable problem just like JefUK did.


    I'm going to reinstall 8.9.and run some tests to me its like the driver for the RAID has nothing to do when the drives are idle and it randomly fails a drive.


    But if you think about it could the problem be with the manager (IAANTmon.exe) so what I might do is have the 8.9. driver and manager 8.8. I can't see any risk in doing this.

  • 11. Re: Random drive fails with new Matrix Storage Manager 8.9.0.1023
    JefUK Community Member
    Currently Being Moderated

    The problem I had was not a timeout error, there were no timeout error messages. I have had timeout errors on other systems, sometimes they have been caused by drive problems, but I have also have had them due to a faulty PSU (which took a long time to find). It does not necessarily follow that time out errors are caused by faulty HDD's.

     

    The only messages I got were warnings about the volume being rebuilt. The fact that my system has worked perfectly on 8.6, 8.7 and 8.8, failed with 8.9, and now works perfectly with 8.8 again I believe is proof, as conclusive as it can be, that the problem lies with 8.9. For the short time I was using 8.9 I had a feeling that the system was not as responsive as it was with 8.8 - but that may be imagination. 8.9 appeared to install correctly and IMSM reported all the various modules as 8.8.0.1023.

  • 12. Re: Random drive fails with new Matrix Storage Manager 8.9
    aditza Community Member
    Currently Being Moderated

    in the week that passed since i installed v8.9 (as an upgrade over v8.8) i had about 6 system freezes and today 1 hdd volume marked as degraded (and its hdd labelled as with errors) for no apparent reason, except for the fact that the windows system event log contains a few events of type event id 9: "The device, \Device\Ide\iaStor0, did not respond within the timeout period."

     

    the freezes: when they happen, the mouse remains responsive, but the OS refuses to do any action, not even alt+tab has any effect (but is recognized and the task switcher overlay appears), during this time the hdd light flashes very lightly, about every 2 seconds, staying lit for only a few milliseconds

     

    today i had another of these freezes, but after waiting a while and then pressing the reset button, i  was met by the message that the drive is being marked as with errors. now i'm in the middle of rebuilding a mirror volume on it.

    After it finishes rebuilding i'll install v8.8 back.

     

    Quotes from the report created by the manager console:


    Kit Installed: 8.9.0.1023
    Kit Install History: 8.9.0.1023, Uninstall
    Shell Version: 8.9.0.1023

     

    OS Name: Microsoft Windows XP Professional
    OS Version: 5.1.2600 Service Pack 3 Build 2600

    System Manufacturer: ASUSTeK Computer INC.
    System Model: P5B-Premium
    Processor: Intel Pentium III Xeon processor ~2507 MHz
    BIOS Version/Date: American Megatrends Inc. 1102   , 07/14/2008

    Language: ENU

     

    btw the CPU is an intel E5200, s-spec: SLAY7, it is wrongly detected as a Xeon by the Matrix Storage Console in that system report. WinXPSP3 has all the updates available from Microsoft Update applied.


    Array_0000
    Status: Rebuilding
    Hard Drive Data Cache Enabled: Yes
    Size: 1192.3 GB
    Free Space: 0 GB
    Number of Hard Drives: 2
    Hard Drive Member 1: WDC WD6401AALS-00L3B2
    Hard Drive Member 2: WDC WD6401AALS-00L3B2
    Number of Volumes: 2
    Volume Member 1: stripe
    Volume Member 2: mirror

    stripe
    Status: Normal
    System Volume: Yes
    Volume Write-Back Cache Enabled: No
    RAID Level: RAID 0 (striping)
    Strip Size: 64 KB
    Size: 600 GB
    Physical Sector Size: 512 Bytes
    Logical Sector Size: 512 Bytes
    Number of Hard Drives: 2
    Hard Drive Member 1: WDC WD6401AALS-00L3B2
    Hard Drive Member 2: WDC WD6401AALS-00L3B2
    Parent Array: Array_0000

    mirror
    Status: Rebuilding: 52% complete
    System Volume: No
    Volume Write-Back Cache Enabled: No
    RAID Level: RAID 1 (mirroring)
    Size: 296.1 GB
    Physical Sector Size: 512 Bytes
    Logical Sector Size: 512 Bytes
    Number of Hard Drives: 2
    Hard Drive Member 1: WDC WD6401AALS-00L3B2
    Hard Drive Member 2: WDC WD6401AALS-00L3B2
    Parent Array: Array_0000

    Hard Drive 0
    Usage: Array member
    Status: Normal
    Device Port: 0
    Device Port Location: Internal
    Current Serial ATA Transfer Mode: Generation 2
    Model: WDC WD6401AALS-00L3B2
    Serial Number: --------------not posted here-------------------
    Firmware: 01.03B01
    Native Command Queuing Support: Yes
    Hard Drive Data Cache Enabled: Yes
    Size: 596.1 GB
    Physical Sector Size: 512 Bytes
    Logical Sector Size: 512 Bytes
    Number of Volumes: 2
    Volume Member 1: stripe
    Volume Member 2: mirror
    Parent Array: Array_0000

    Hard Drive 1
    Usage: Array member
    Status: Normal
    Device Port: 1
    Device Port Location: Internal
    Current Serial ATA Transfer Mode: Generation 2
    Model: WDC WD6401AALS-00L3B2
    Serial Number: --------------not posted here-------------------
    Firmware: 01.03B01
    Native Command Queuing Support: Yes
    Hard Drive Data Cache Enabled: Yes
    Size: 596.1 GB
    Physical Sector Size: 512 Bytes
    Logical Sector Size: 512 Bytes
    Number of Volumes: 2
    Volume Member 1: stripe
    Volume Member 2: mirror
    Parent Array: Array_0000

  • 13. Re: Random drive fails with new Matrix Storage Manager 8.9.0.1023
    PeterUK Community Member
    Currently Being Moderated

    Hi aditza welcome to the problem.

     

    I've done a test with 8.9. driver and manager 8.8 problem happened again so I'm now trying 8.8. driver and manager 8.9 just to see if its just driver. If it runs fine for a week with this setup then I report my finding to Intel to pull the driver off the site before many people replace good drives.

     

    You would think they have someone over looking these posts from time to time just to see how real a problem this is?

  • 14. Re: Random drive fails with new Matrix Storage Manager 8.9
    aditza Community Member
    Currently Being Moderated

    follow-up: i installed v8.8.0.1009 again and did a full verification of both volumes (stripe and mirror), the verification took about three hours but it seems i got away clean, no errors whatsoever, ZERO!

     

    i'll have to wait and see if it freezes again, but since i've been running the v8.8 for more than a month on this machine and didn't have any problem before deciding to try v8.9, i think that they won't happen again and that 8.9 was the source of the freezes. i'll post back in a week or two if i had any more freezes or not.

1 2 3 36 Previous Next