2 Replies Latest reply on Sep 29, 2009 1:08 PM by Onedutch

    Diagnosing and Recovering Raid-1 Volume Verification Errors


      Hi All,


      I'm having problems with volume verification errors on a raid-1 volume on 2 brand new 1Tb Hitachi drives attached to a (~3yr old) ICH7R controller with drivers.  I tested the drives individually before use (see below).


      Having migrated data from my old drives (a group of 250Gb Hitachi drives) and done a couple of day's work on them (including some non-reproducible stuff), I then ran a "verify volume data" on the raid-1 volume, just to be safe and, disappointingly, it returned 2 errors.  There hadn't been any crashes or "dirty" reboots since the clean check.


      This raises a few questions

           1. I can't find any log file that indicates where exactly these 2 errors were - isn't this recorded anywhere?  (clearly the Matrix Manager must know where the error occurs, and I would have thought that logging this was a pretty basic requirement in a checking utility.  Knowing the sector numbers would help a lot in restoring the corrupt files).

           2. Does the driver/storage manager have any "intelligent" way of deciding which of the mismatching sectors to use?  I can't see how it would be able to tell which of the drives has good data and which has bad data.  Does it just make an entirely random choice?

           3. Has anyone seen similar failures (yes, I know it might be the drives themselves, but I'm skeptical about that)?


      I'm intending to downgrade the drivers to 8.8 (after reading the threads regarding problems with 8.9) in the hope that they're more stable and won't give further errors.


      All of the "recovery" options that I have at this point seem rather unpleasant (either accept the "unknown losses" or spend a *lot* of time checksumming, or revert to the previous disk data). None of these options really appeals to me.






      The pre-use tests were: reading each individual drive end-to-end with Spinrite, then creating a raid-1 volume, formatting it, wiping it (non-zero) and verifying the volume using Jetico's BCWipe (thus ensuring a write-verify cycle on each sector), then a Matrix Storage Manager "verify volume data".  The drives tested clean on all of this.  I would have liked to test further, but this took well over 24hrs of testing, and this was all the time I could afford to take.

        • 1. Re: Diagnosing and Recovering Raid-1 Volume Verification Errors

          Just an update - I downgraded to 8.8 and re-ran the verification check - 14 errors.  I had tried to cut out all disk activity on this volume from the point that I first saw the errors (the problematic volume isn't the system drive), but the downgrade reboot kicked off a process  (now disabled) that updated some files, so I'm guessing (!) that those updated files must contain the new mismatching sectors.


          Just to repeat the original questions:


               1. Is there *really* no logging of where the verification errors occur?  This looks like an unforgivable oversight.

               2. How does the driver decide which of the mismatching sectors is "good" when fixing verification errors?  Is one drive designated (invisibly) as the "master"?


          Also, there were no media errors shown in any of the verification checks, but I can't see any way to access SMART data to see if either drive is reporting problems, so

               3. Do the drivers report SMART errors in any way?  (Does a lack of messages via the driver indicate that there have been no sectors re-allocated at the drive level?)





          • 2. Re: Diagnosing and Recovering Raid-1 Volume Verification Errors



            I'm running Windows 7 and also experienced 'verification errors'. I would love to have some answers on the quesiton GVM has. I'm running IMSM


            Regards Onedutch