1 2 Previous Next 15 Replies Latest reply on Jan 9, 2018 6:13 AM by Drone

    Intel RSTe Issue RAID 1 SSD Drives

    Drone

      I currently have 88 servers in production. The hardware for all servers is identical. The motherboard is a SuperMicro X9DRL-iF which is based on the Intel X79 PCH 602 Chipset. We are using Micron M500 SSDs (2) in a RAID 1 Configuration using the RSTe onboard RAID Controller.

       

      44 of the 88 servers are running RedHat Linux version 6x. 44 of the 88 servers are running Microsoft Windows 2008R2. We are using version 4.3.0.1223 of the RSTe driver on the Windows Servers.

       

      These servers are on separate intranets without internet access. 4 Windows and 4 Linux servers on each intranet. These installations are at various locations around the world. Their configurations are locked down. Remote  access is not an option.

       

      We are getting calls every week about failed drives in production from all locations.  Most of these failed drives are discovered while rebooting the servers. The RSTe boot screen (Ctrl-I screen) will either show an error or a failure. It is usually on port 1.

       

      For months now I have been trying to determine root cause of these failures, to no avail. I am unable to duplicate this issue consistently in the lab, although from time to time I will see a failure.

       

      I would question the Micron SSDs as a possible culprit of these errors, but all of the failed drives are on the Windows 2008R2 servers only, leading me to believe that it must be the RSTe driver for Windows 2008 R2. The Linux servers have never shown an error.

       

      I noticed that there is now a 4.6.0.1085 version of the RSTe driver for Windows 2008 R2. In the RSTe_Windows_DRV_v4.6.0.1085_readme.txt file, under Fixes/Updates, #2 and #3 fixes seem to pertain to my hardware platform.

       

      If these fixes do pertain to my problem, is it possible for Intel to provide me with more details to the reasons for these fixes, so perhaps I can determine root cause, duplicate roote cause and prove that this upgrade will fix root cause? If they do not pertain to my problem, is there anything I can check to further uncover the root cause of this problem?

        1 2 Previous Next