5 Replies Latest reply on Jan 4, 2011 4:13 PM by

    What must I do to maintain/verify a RAID health?


      I'm running RAID 1 (mirror) drives on an Intel DX58SO board, overseen by the latest ( Intel Rapid Storage Technology program. Everything seems to be running nicely.


      The ideal goal is that IRST will automatically monitor the array and, if there's a problem, either take corrective action or advise me to replace a drive. However, documentation and controls are sparse, so I'm not sure about my protection.


      It's been suggested in other threads that IRST should report the S.M.A.R.T. results. Presumably IRST is monitoring S.M.A.R.T., and will post a "replace disk!" warning if it's bad; is that right?


      There's an option to manually initiate a "Verify" sequence, but the documentation doesn't say whether there's any reason to do this. Should I be running a monthly manual verify, or weekly? Or is it superfluous?


      I ran a "Verify", and afterwards IRST displays "Verification errors found: 10". But it says the array and component disks are normal and healthy, and there are no blocks with media errors. So it seems happy...but then, why are there any verification errors? Is this benign? Again, the documentation is no help, at least not that I've found. Nor does IRST display any history of the errors, nor details about what failed to verify or what correction was made. It seems I need to know more than IRST is reporting. Is there additional doc somewhere?

        • 1. Re: What must I do to maintain/verify a RAID health?

          Please, someone answer the original poster!  My 3-disk RAID 5 array seemed to be working fine, but IRST reports 1064 errors already, and the verification process is only 5% complete.  Will it tell me if the errors are associated with a particular disk?  How many errors is considered acceptable?





          • 2. Re: What must I do to maintain/verify a RAID health?

            Have you overclocked your system? If not then either bad SATA cable maybe PSU or one or more of your HDD's is faulty.

            • 3. Re: What must I do to maintain/verify a RAID health?

              Thanks - I do have the system overclocked by increasing the bus rate.  Maybe some of these errors occurred when I was testing higher rates.  I'll see if the number of errors is lower after one cycle of verification and a day or two using the system.  If it's not very small, I'll back off on the bus speed.


              Is some level of errors normal, or should I assume that only zero is really acceptable?



              • 4. Re: What must I do to maintain/verify a RAID health?

                On another setup I had with RAID I overclocked the system and had a lot of these errors when I run Verify every time one after the other, I then put the system back to stock speeds first pass had errors and every other time no more errors.


                So should you be seeing any errors? no you should not unless you had a bad shutdown or faulty disk.

                • 5. Re: What must I do to maintain/verify a RAID health?

                  Thanks again for the information.  Now I'm just going to summarize my experience for the benefit of anyone who is curious.


                  I recently replaced my motherboard with a Gigabyte GA-EP45T-USB3P which uses the ich10r chip.  I decided to try a RAID 5 setup for my three data drives.  I didn't "verify" the array at first - in fact I didn't know the software existed.  I spent some time experimenting with overclocking the system, and ended up with a stable system (overnight runs of both memory and prime95 tests) at 3.6 GHz (E8400 processor rated for 3 GHz).  Everything seemed find until I did a verify.


                  I ended up running verify many times.  Here are the more interesting data points.


                  1) First verify, many thousands of errors found.  I assume this reflects errors when overclocked to even higher speeds.


                  2) I did not complete the verify process with overclocking, so I don't know how many errors there would have been at 3.6 GHz.


                  3) With everything set back to defaults (no overclocking), I was still getting 50 to 100 errors per pass.


                  4) I swapped the first SATA cable for an old one, and the error rate dropped to about 10 per pass!  Thanks for the suggestion to try that.


                  5) Swapping the other two cables out did nothing.  Still around 10 errors per pass.


                  6) I experimented some with small changes to the southbridge voltage.  Small increases seemed to help compared to the motherboard's default "auto" setting, but there were still errors.  I haven't found a clear statement of what maximum voltage is acceptable, so I didn't try the largest values the motherboard allowed.  I went as far as 1.63 volts for "ICH I/O" and 1.2 volts for "ICH Core".


                  Bottom line: it seems that I can't trust RAID 5 on this hardware, but I can't tell what's at fault.  I'm going to go back to RAID 0 and test that, or maybe I'll just use a single data drive.