Rapid Storage Technology
Intel® RST, RAID
2046 Discussions

How to determine which RAID 1 drive failed?

idata
Employee
3,070 Views

I've got a "whea uncorrectable error" 3 times in 24 hours. After the system collects info and reboots, the RST shows Volume 1 in rebuild status. Everything I've read says to replace one of the drives. In the RST screen, both drives have green check marks on them. How do I tell which drive went bad?

Is it possible that since both drives have green checks, the controller has gone bad?

Additional info: The first rebuild finished about midnight last night. PC worked fine until about 3 pm today where I got the whea uncorrectable error again and the system started rebuilding.

Windows 10, 64 bit, IRST version 14.6.0.1029, Asus Maximus VIII Hero Alpha motherboard, 2 WD Black 6TB drives

Thanks in advance for any ideas.

Wayne

0 Kudos
12 Replies
idata
Employee
1,485 Views

Hello wdonn,

Based on your description, it seems like the drives are fine this since you mentioned the green checks, if you had one bad a red "X" should appear. Please try the following suggestion in case this is related to the Intel® RST driver; update your current version to the one mentioned below.

Intel® Rapid Storage Technology (Intel® RST) RAID Driver 15.2.0.1020;

https://downloadcenter.intel.com/download/26361/Intel-Rapid-Storage-Technology-Intel-RST-RAID-Driver?product=55005 Download Intel® Rapid Storage Technology (Intel® RST) RAID Driver

Please let me know your results.

Regards,

Amy.

0 Kudos
idata
Employee
1,485 Views

Thank you for the link. The link had a newer version than what was available on the Asus website.

I've encountered the "whea uncorrectable error" two additional times but each time, the RAID volume was not affected. Not sure if it is coincidental but this is a huge improvement. I think this problem is highly likely to be caused by a hardware failure of some sort.

Thanks again.

0 Kudos
idata
Employee
1,485 Views

/thread/112882 wdonn, sure you're welcome.

Yes, like you mentioned it could be hardware related. Have you tried replacing the drivers?

Regards,

Amy.

0 Kudos
idata
Employee
1,485 Views

Thank you for the response and suggestion.

I've looked at the 10 Windows mini-dumps and they all start with "WHEA_UNCORRECTABLE_ERROR (124) A fatal hardware error has occurred. Parameter 1 identifies the type of error". The error is always Machine Check Exception (which I think means the CPU is forcing a dump). There isn't any consistency in the process running at the time of the dump.

As best I can tell, my drivers are up-to-date and Device Manager does not indicate any issues with the drivers.

I'm going to run some hardware stress tests (mem, CPU, HDD) to see if that results in a blue screen. There is no consistency in the uptimes shown in the mini-dumps. CPU temp is never higher than 33 C.

Thanks again.

Wayne

0 Kudos
idata
Employee
1,485 Views

Wayne, please let me know how it goes.

Once you have those results we can move from there.

Regards,

Amy.

0 Kudos
idata
Employee
1,485 Views

Wayne, any update?

Let me know your results.

Regards,

Amy.

0 Kudos
idata
Employee
1,485 Views

I ran 12 hours of memory tests using memtest and there were no errors. HD Tune Pro identified communication issues with my SSD suggesting it might be a bad cable. As best I can tell, HD Tune will not give a health check on a RAID drive.

0 Kudos
idata
Employee
1,485 Views

Thanks for the update.

And, did you change the cable?

Regards,

Amy.

0 Kudos
idata
Employee
1,485 Views

Thank you for the reply.

I have not replaced the cable. I've read several forums that describe people who have replaced the cable and still get that status from HD Tune. I've also run Samsung Magician. It gives the same CRC error data but indicates that the status is "OK".

I've had a couple of additional BSODs. Again the good news is that since upgrading to the latest version of Intel RST, I've not lost a drive after a BSOD.

0 Kudos
idata
Employee
1,485 Views

You can always give a try, of course if that is possible for you. Yes, good news that the Intel® RST helped to reduce the BSODs.

Regards,

Amy.

0 Kudos
idata
Employee
1,485 Views

I thought I would try to bring this thread to closure. After many tests, chats and emails, it was determined that my motherboard had problems. I replaced it a 5 days ago and so far everything is working properly. Thank you for the timely responses and support.

Wayne

0 Kudos
idata
Employee
1,485 Views

Wayne, thank you for sharing this with the community. I am really glad you track this down the motherboard and fixed the issue.

Regards,

Amy C.

0 Kudos
Reply