1 2 3 Previous Next 527 Replies Latest reply: Jan 1, 2015 10:45 AM by anthonyM RSS

    Random drive fails with new Matrix Storage Manager 8.9

    PeterUK

      Important update.

       

      The Intel Matrix Storage Manager 8.9 has been replaced by:

       

      Intel Rapid Storage Technology and the current version is 9.6.

      http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&ProdId=2101&DwnldID=18859&lang=eng

      64-bit Intel® RST Driver Files

      http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&ProdId=2101&DwnldID=18861&lang=eng

       

      The other thread “Random drive fails with new Rapid Storage Technology 9.6?” that was started after Rapid Storage Technology fixed the problem for most of us here for anyone still have the same problem with Rapid Storage Technology but are not having any problems with Matrix Storage Manager 8.8.

      http://communities.intel.com/thread/8139?start=0&tstart=0

       

       

      Matrix Storage Manager 8.8 should you need it.

      http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=17412&lang=eng

      64-bit Floppy

      http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=17415&lang=eng

      32-bit Floppy

      http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=17413&lang=eng

       

      Setup:

      XP Professional SP3

      Dual Intel ICH8R RAID 5

      3 x ST380811AS 160GB RAID 5 OS

      3 x ST3320620AS 640GB RAID 5 Storage


      Problem happened on the OS array so far when using Matrix Storage Manager 8.9.0.1023.

      http://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&ProductID=2101&DwnldID=17882&strOSs=44&OSFullName=Windows* XP Professional&lang=eng


      Here's what happened I downloaded, installed and rebooted to the newest Matrix Storage Manager 8.9.0.1023 then Winkey + L to lock computer over night. The next day a drive failed listed as port 1 in the console I know I should of replaced the drive but I did a rebuild and run a Volume Verification and Repair and got no errors.


      Thinking it was fine I Winkey + L to lock computer over night again but the next day a drive failed BUT this time on port 2 in the console!


      I have now reinstalled back to Matrix Storage Manager 8.8.0.1009 and rebuilt the array and run a Volume Verification and Repair with no errors and see if a drive fails to night.


      So could their be a problem with the new Matrix Storage Manager RAID driver or could my OS array be about to fail?

       

      The only way this will get resolved/looked at is if anyone here that went from Matrix Storage Manager 8.8 to 8.9 with a random drive failure when the drive is fine after going back to 8.8 for a week or more please do report this to Intel here:

      http://supportmail.intel.com/scripts-emf/welcome.aspx?id=40

      Line: Chipset Software

      Product: Intel(R) Matrix Storage Manager

        • 1. Re: Random drive fails with new Matrix Storage Manager 8.9.0.1023
          PeterUK

          So far no drive failed when I went back to 8.8.0.1009.

           

          I will give it some days then reinstall 8.9.0.1012 and see if a drive fails again.

          • 2. Re: Random drive fails with new Matrix Storage Manager 8.9.0.1023
            PeterUK

            Odd I'm sure it said 8.9.0.1012 when a downloaded it? Either way a did a hash check its the same as 8.9.0.1023 so same problem.

            • 3. Re: Random drive fails with new Matrix Storage Manager 8.9.0.1012
              Crazy_Train

              Just a note about error recovery in hard drives. A regular desktop-class drive is configured to be very persistent when it comes to data errors; this can cause problems in a RAID configuration. There's a good article about it on Wikipedia:

               

              Time-Limited Error Recovery

               

              Modern hard drives feature an ability to recover from some read/write errors by internally remapping sectors and other forms of self test and recovery. The process for this can sometimes take several seconds or (under heavy usage) minutes, during which time the drive is unresponsive. RAID controllers are designed to recognize a drive which does not respond within a few seconds, and mark it as unreliable, indicating that it should be withdrawn from use and the array rebuilt from parity data. This is a long process, degrades performance, and if a second drive should fail under the resulting additional workload, it can be catastrophic.

               

              If the drive itself is inherently reliable but has some bad sectors, then TLER and similar features prevent a disk from being unnecessarily marked as "failed" by limiting the time spent on correcting detected errors before advising the array controller of a failed operation. The array controller can then handle the data recovery for the limited amount involved, rather than marking the entire drive as faulty.

               

              http://en.wikipedia.org/wiki/Time-Limited_Error_Recovery

               

              The article primarily addresses Western Digital's implementation of this feature, but it also mentions  Seagate (Error Recovery Control [ERC]) and Samsung & Hitachi (Command Completion Time Limit [CCTL]). Here's another short article that specifically addresses Seagate:

               

              What is Error Recovery Control?
              http://www.hddoctor.net/what-is-error-recovery-control/

               

              Western Digital has a utility to enable/disable TLER in their drives' firmware. However, it doesn't look like Seagate is so accomodating; their solution is to buy enterprise-class drives.

               

              It could be that recent releases of the Intel Matrix Storage Manager software are less tolerant of these error-recovery delays. Some people have reported problems with 8.8 and 8.9 that were solved by dropping back to 8.7.

              • 4. Re: Random drive fails with new Matrix Storage Manager 8.9.0.1023
                PeterUK

                Thanks for the reply


                I'm not having any problems with 8.8.0.1009 so far if that means anything.


                What I don't get is a drive would fail over night (with 8.9.0.1023) when the system is idle...I could try an experiment where I give the drives something to do over night or do you think its a coincidence that two random drives failed over night?

                • 5. Re: Random drive fails with new Matrix Storage Manager 8.9.0.1023
                  JefUK

                  Setup

                  Asus P6T deluxe v1

                  4x WD WD15001AALS

                  Vista 64

                   

                  I have been using the above system now 8 months, updating to the latest version of IMSM each time a new version was released, The last update was April 2009 when 8.8.0.109 was released. All versions of IMSM worked fine until I updated to 8.9.0.1023 on the 18 July 2009 and within 4 hours I had two major system lockups where I had to use the reset button and go through the volume rebuild process each time. Since nothing else had changed on the system I quickly decided to go back to 8.8.0.1009, and since then I have had no problems - the system is as it was before I updated to 8.9.0.1023. I wasted a whole day on this problem.

                   

                  I complained to Intel and got a quick reply, but unfortunately it was not much use. In summary they said that 8.9 had been fully tested, and had passed those tests, that they had no other reports of problems, and therefore there was nothing wrong with it! They told me to update the HDD firmware, the Intel Option ROM and the MB BIOS!

                   

                  Because I downloaded 8.9 as soon as it appeared on the Intel website I checked again to see if it had been withdrawn or changed. I noticed that some of the download documentation referred to 8.9.0.1012 and other in places it referred to 8.8.0.1023. There was obviously some confusion inside Intel. I contacted customer support again and asked whether 8.9.0.1023 was the correct version. They replied that 8.9.0.1023 was the correct version, and that was the version that downloaded, and that the mention of 8.8.0.1012 was simply a mistake on the website. They also reiterated that there is nothing wrong with 9.9.0.1023.

                   

                  Over the years I have probably used all versions of IMSM and IAA and have only had a problem with one other version which they acknowledged. With other users now having the same problem Intel have obviously dropped the ball again - they need to recognize it! Everyone who has the problem should complain to Intel and perhaps they will the acknowledge the problem and do something about it. Otherwise we may find the same issue being rolled over into 8.10

                  • 6. Re: Random drive fails with new Matrix Storage Manager 8.9.0.1023
                    JefUK

                    System:-

                     

                    Asus P6T Deluxe v1

                    4x WD5001AALS   in RAID10

                    Vista 64

                     

                    I built the above system 8 months ago and have used each IMSM as soon as it was posted on the Intel website. Version 8.8.0.1009 was installed in April and worked flawlessly until I updated to 8.9.0.1023. Within 4 hours of installing 8.9 I had two major system freezes which I could only get out of by using Reset and waiting for the volume to rebuild. Each time IMSM reported different drives had failed, either port 2 or 3. Nothing else had changed on the system and I was not doing anything different or difficult on the system when it froze. I quickly decided that I had had enough of 8.9 and went back to 8.8.0.1009. Since then (several days) I have had no problems at all - the system is behaving as it did before I put on 8.9.

                     

                    Annoyed about the whole day I lost recovering from these problems I complained to Intel. In summary, Intel replied that 8.9 had been fully tested, they had no other reports of problems and that there was nothing wrong with it! They then went on to advise me to update the HDD firmware, the Intel Option ROM and MB BIOS! Their reply was arrogant, some would say rude.

                     

                    Needless to say I have not done any of their suggested updates and the system is now working just fine on 8.8.0.1009. Also as far as I know I have the latest firmwares and BIOS anyway.

                     

                    With others reporting exactly the same problems there is obviously bug in 8.9 which Intel need to acknowledge and fix. Everyone who has experienced this problem should complain to Intel, otherwise it wont get fixed, and may even get carried over into 8.10.

                    • 7. Re: Random drive fails with new Matrix Storage Manager 8.9.0.1012
                      Crazy_Train

                      PeterUK wrote:

                       

                      What I don't get is a drive would fail over night (with 8.9.0.1023) when the system is idle...I could try an experiment where I give the drives something to do over night or do you think its a coincidence that two random drives failed over night?

                       

                      Is your computer set to do a virus scan or a disk defrag in the middle of the night? Perhaps the drives in question have some marginal sectors. When the drives tried to read/write data at those locations, they had trouble and went into error-recovery mode. This took longer than IMSM 8.9 was willing to wait, so the software flagged them as failed and took them out of the RAID. That does seem to be a bit much for mere coincidence, but it's still a possibility.

                       

                      As a test, you could tell Windows to perform an error-check on the drives, selecting "Automatically fix file system errors" and "Scan for and attempt recovery of bad sectors". This will take a while to complete, especially with large RAID arrays. If your drives have bad sectors that aren't locked out, this should find them and fix the problem. I recommend that you back up your critical data beforehand, just to be safe.

                      • 8. Re: Random drive fails with new Matrix Storage Manager 8.9.0.1023
                        PeterUK

                        Nope just nothing at night... I don't know maybe if it happen in the day then I would think its this Time-Limited Error Recovery thing but I'm sure its more then just that in this version.

                         

                        I took the drives out of RAID to run them as SATA mode so I can see the S.M.A.R.T and all 6 drives seem fine no Reallocated Sectors for the 3 x ST380811AS and only one of the three ST3320620AS had 169 Reallocated Sectors but that's part of the storage array not the OS array where the problem happened.

                         

                        I did a "Automatically fix file system errors" and "Scan for and attempt recovery of bad sectors" on both arrays some months back but could run just to be sure anyway.

                         

                        In the system log when it happened both times I got lots of this:

                         

                        The device, \Device\Ide\iaStor0, did not respond within the timeout period.

                        • 9. Re: Random drive fails with new Matrix Storage Manager 8.9.0.1023
                          Crazy_Train

                          PeterUK wrote:

                           

                          In the system log when it happened both times I got lots of this:

                           

                          The device, \Device\Ide\iaStor0, did not respond within the timeout period.

                           

                          One of the drives in your RAID array is busy doing something; it would be nice if the log recorded more detail....

                           

                          Check the data cables on the drives and make sure they're fully seated. You could also try replacing the cables with known good ones to see if that helps things.

                          • 10. Re: Random drive fails with new Matrix Storage Manager 8.9.0.1023
                            PeterUK

                            Since the arrays have been running fine for four days when going back to 8.8. its not a cable problem just like JefUK did.


                            I'm going to reinstall 8.9.and run some tests to me its like the driver for the RAID has nothing to do when the drives are idle and it randomly fails a drive.


                            But if you think about it could the problem be with the manager (IAANTmon.exe) so what I might do is have the 8.9. driver and manager 8.8. I can't see any risk in doing this.

                            • 11. Re: Random drive fails with new Matrix Storage Manager 8.9.0.1023
                              JefUK

                              The problem I had was not a timeout error, there were no timeout error messages. I have had timeout errors on other systems, sometimes they have been caused by drive problems, but I have also have had them due to a faulty PSU (which took a long time to find). It does not necessarily follow that time out errors are caused by faulty HDD's.

                               

                              The only messages I got were warnings about the volume being rebuilt. The fact that my system has worked perfectly on 8.6, 8.7 and 8.8, failed with 8.9, and now works perfectly with 8.8 again I believe is proof, as conclusive as it can be, that the problem lies with 8.9. For the short time I was using 8.9 I had a feeling that the system was not as responsive as it was with 8.8 - but that may be imagination. 8.9 appeared to install correctly and IMSM reported all the various modules as 8.8.0.1023.

                              • 12. Re: Random drive fails with new Matrix Storage Manager 8.9
                                aditza

                                in the week that passed since i installed v8.9 (as an upgrade over v8.8) i had about 6 system freezes and today 1 hdd volume marked as degraded (and its hdd labelled as with errors) for no apparent reason, except for the fact that the windows system event log contains a few events of type event id 9: "The device, \Device\Ide\iaStor0, did not respond within the timeout period."

                                 

                                the freezes: when they happen, the mouse remains responsive, but the OS refuses to do any action, not even alt+tab has any effect (but is recognized and the task switcher overlay appears), during this time the hdd light flashes very lightly, about every 2 seconds, staying lit for only a few milliseconds

                                 

                                today i had another of these freezes, but after waiting a while and then pressing the reset button, i  was met by the message that the drive is being marked as with errors. now i'm in the middle of rebuilding a mirror volume on it.

                                After it finishes rebuilding i'll install v8.8 back.

                                 

                                Quotes from the report created by the manager console:


                                Kit Installed: 8.9.0.1023
                                Kit Install History: 8.9.0.1023, Uninstall
                                Shell Version: 8.9.0.1023

                                 

                                OS Name: Microsoft Windows XP Professional
                                OS Version: 5.1.2600 Service Pack 3 Build 2600

                                System Manufacturer: ASUSTeK Computer INC.
                                System Model: P5B-Premium
                                Processor: Intel Pentium III Xeon processor ~2507 MHz
                                BIOS Version/Date: American Megatrends Inc. 1102   , 07/14/2008

                                Language: ENU

                                 

                                btw the CPU is an intel E5200, s-spec: SLAY7, it is wrongly detected as a Xeon by the Matrix Storage Console in that system report. WinXPSP3 has all the updates available from Microsoft Update applied.


                                Array_0000
                                Status: Rebuilding
                                Hard Drive Data Cache Enabled: Yes
                                Size: 1192.3 GB
                                Free Space: 0 GB
                                Number of Hard Drives: 2
                                Hard Drive Member 1: WDC WD6401AALS-00L3B2
                                Hard Drive Member 2: WDC WD6401AALS-00L3B2
                                Number of Volumes: 2
                                Volume Member 1: stripe
                                Volume Member 2: mirror

                                stripe
                                Status: Normal
                                System Volume: Yes
                                Volume Write-Back Cache Enabled: No
                                RAID Level: RAID 0 (striping)
                                Strip Size: 64 KB
                                Size: 600 GB
                                Physical Sector Size: 512 Bytes
                                Logical Sector Size: 512 Bytes
                                Number of Hard Drives: 2
                                Hard Drive Member 1: WDC WD6401AALS-00L3B2
                                Hard Drive Member 2: WDC WD6401AALS-00L3B2
                                Parent Array: Array_0000

                                mirror
                                Status: Rebuilding: 52% complete
                                System Volume: No
                                Volume Write-Back Cache Enabled: No
                                RAID Level: RAID 1 (mirroring)
                                Size: 296.1 GB
                                Physical Sector Size: 512 Bytes
                                Logical Sector Size: 512 Bytes
                                Number of Hard Drives: 2
                                Hard Drive Member 1: WDC WD6401AALS-00L3B2
                                Hard Drive Member 2: WDC WD6401AALS-00L3B2
                                Parent Array: Array_0000

                                Hard Drive 0
                                Usage: Array member
                                Status: Normal
                                Device Port: 0
                                Device Port Location: Internal
                                Current Serial ATA Transfer Mode: Generation 2
                                Model: WDC WD6401AALS-00L3B2
                                Serial Number: --------------not posted here-------------------
                                Firmware: 01.03B01
                                Native Command Queuing Support: Yes
                                Hard Drive Data Cache Enabled: Yes
                                Size: 596.1 GB
                                Physical Sector Size: 512 Bytes
                                Logical Sector Size: 512 Bytes
                                Number of Volumes: 2
                                Volume Member 1: stripe
                                Volume Member 2: mirror
                                Parent Array: Array_0000

                                Hard Drive 1
                                Usage: Array member
                                Status: Normal
                                Device Port: 1
                                Device Port Location: Internal
                                Current Serial ATA Transfer Mode: Generation 2
                                Model: WDC WD6401AALS-00L3B2
                                Serial Number: --------------not posted here-------------------
                                Firmware: 01.03B01
                                Native Command Queuing Support: Yes
                                Hard Drive Data Cache Enabled: Yes
                                Size: 596.1 GB
                                Physical Sector Size: 512 Bytes
                                Logical Sector Size: 512 Bytes
                                Number of Volumes: 2
                                Volume Member 1: stripe
                                Volume Member 2: mirror
                                Parent Array: Array_0000

                                • 13. Re: Random drive fails with new Matrix Storage Manager 8.9.0.1023
                                  PeterUK

                                  Hi aditza welcome to the problem.

                                   

                                  I've done a test with 8.9. driver and manager 8.8 problem happened again so I'm now trying 8.8. driver and manager 8.9 just to see if its just driver. If it runs fine for a week with this setup then I report my finding to Intel to pull the driver off the site before many people replace good drives.

                                   

                                  You would think they have someone over looking these posts from time to time just to see how real a problem this is?

                                  • 14. Re: Random drive fails with new Matrix Storage Manager 8.9
                                    aditza

                                    follow-up: i installed v8.8.0.1009 again and did a full verification of both volumes (stripe and mirror), the verification took about three hours but it seems i got away clean, no errors whatsoever, ZERO!

                                     

                                    i'll have to wait and see if it freezes again, but since i've been running the v8.8 for more than a month on this machine and didn't have any problem before deciding to try v8.9, i think that they won't happen again and that 8.9 was the source of the freezes. i'll post back in a week or two if i had any more freezes or not.

                                    1 2 3 Previous Next