1 2 Previous Next 20 Replies Latest reply on Feb 22, 2018 4:54 PM by Intel Corporation

    I/O request timeouts on Linux with Intel P3520/P4600 NVMe PCIe SSDs

    sg@intel

      Hi,

       

      We are experiencing persistent I/O request timeouts on Linux with P3520/P4600 SSDs.  We have tried multiple different kernels (3.10, 4.4, 4.9) and see the timeouts on all of them.  The P4600 seems to be more prone to these than the P3520 though we see them on the latter as well.  We have the latest firmware installed on both drives which are housed in the same machine (Supermicro 5018R-WR with X10SRW-F motherboard and E5-1650 V4 CPU).  We can reproduce the timeouts by simply running mkfs -t xfs on the drive.

       

      Here is the output from isdct (version isdct-3.0.9.400-17.x86_64):

       

      - Intel SSD DC P3520 Series CVPF717100L01P2JGN -

       

      Bootloader : MB1B0105

      DevicePath : /dev/nvme0n1

      DeviceStatus : Healthy

      Firmware : MDV10271

      FirmwareUpdateAvailable : The selected Intel SSD contains current firmware as of this tool release.

      Index : 0

      ModelNumber : INTEL SSDPEDMX012T7

      ProductFamily : Intel SSD DC P3520 Series

      SerialNumber : CVPF717100L01P2JGN

       

      - Intel SSD DC P4600 Series BTLE736007F54P0KGN -

       

      Bootloader : 0110

      DevicePath : /dev/nvme1n1

      DeviceStatus : Healthy

      Firmware : QDV10150

      FirmwareUpdateAvailable : The selected Intel SSD contains current firmware as of this tool release.

      Index : 1

      ModelNumber : INTEL SSDPEDKE040T7

      ProductFamily : Intel SSD DC P4600 Series

      SerialNumber : BTLE736007F54P0KGN

       

      Here are the messages the 4.9 kernel prints when using the P4600

       

      [  151.297903] nvme nvme1: I/O 568 QID 1 timeout, aborting

      [  151.303130] nvme nvme1: I/O 569 QID 1 timeout, aborting

      [  151.308347] nvme nvme1: I/O 570 QID 1 timeout, aborting

      [  151.313562] nvme nvme1: I/O 571 QID 1 timeout, aborting

      [  151.355465] nvme nvme1: completing aborted command with status: 0000

      [  151.411273] nvme nvme1: completing aborted command with status: 0000

      [  151.466903] nvme nvme1: completing aborted command with status: 0000

      [  151.522609] nvme nvme1: completing aborted command with status: 0000

      [  151.578226] nvme nvme1: completing aborted command with status: 0000

      ...

      [  165.395295] nvme nvme1: Abort status: 0x0

      [  165.399296] nvme nvme1: Abort status: 0x0

      [  165.403299] nvme nvme1: Abort status: 0x0

      [  165.407304] nvme nvme1: Abort status: 0x0

       

      We would appreciate your help in resolving this issue.

       

      Regards,

      Shantanu Goel

        • 1. Re: I/O request timeouts on Linux with Intel P3520/P4600 NVMe PCIe SSDs
          Intel Corporation
          This message was posted on behalf of Intel Corporation

          Hello Shantanu Goel,

          Thank you for your interest in the Intel® SSD P3520 Series and the Intel® SSD P4600 Series.

          I understand that your system is experiencing persistent I/O request timeouts.

          Could you please tell me which is the specific Linux* OS distribution that you are using, and also provide a brief description of the intended use of the SSDs?

          Additionally, in order to provide the adequate assistance, please share the report generated by the Intel® System Support Utility for the Linux* Operating System (https://downloadcenter.intel.com/download/26735/).

          I’ll be waiting for your response.

          Regards,
          Andres V.

          • 2. Re: I/O request timeouts on Linux with Intel P3520/P4600 NVMe PCIe SSDs
            sg@intel

            Hi Andres,

             

            We are using RHEL 6 and create a filesystem on the SSD to store data.

            Please see the attached SSU output from the system.

             

            Thanks,

            Shantanu

            • 3. Re: I/O request timeouts on Linux with Intel P3520/P4600 NVMe PCIe SSDs
              Intel Corporation
              This message was posted on behalf of Intel Corporation

              Hello Shantanu,
               
              Thank you for providing the requested file.
               
              We will analyze the provided data and get back to you via this community thread as soon as we have relevant information.
               
              Thank you for your patience.
               
              Regards,
              Andres V.

              • 4. Re: I/O request timeouts on Linux with Intel P3520/P4600 NVMe PCIe SSDs
                Intel Corporation
                This message was posted on behalf of Intel Corporation

                Hello Shantanu,

                In order to further understand the issue, your system, and the troubleshooting that you have performed, could you please answer the following questions?

                • How many SSDs of each model are experiencing the timeout issue? How many SSDs do you have of each series?
                • Have you tried connecting the SSDs to different slots? Have you been able to test the drives in another motherboard or system?
                • Are you using any king of RAID controller?

                 

                Regarding the Intel® SSD DC P4600 Series, a new firmware version will tentatively be available within the next couple of weeks as part of the latest Intel® Solid State Drive Data Center Tool version, so please keep checking the download link https://downloadcenter.intel.com/download/27248?v=t, update your firmware and test again.

                I’ll be waiting for your response.

                Regards,
                Andres V.

                • 5. Re: I/O request timeouts on Linux with Intel P3520/P4600 NVMe PCIe SSDs
                  sg@intel

                  Hi,

                   

                  1. We have seen this issue on at least 3 P4600s and 2 P3520s.  We have a total of 8 P4600s and 4 P3520s.  We are in the process of replacing the P3520s with P4600s as the workload has proven to be more write-intensive than originally anticipated.

                   

                  2. We have seen the timeouts on both Supermicro (X10SRW-F) and Intel (S2600WT) systems which suggests the motherboard model is not a factor here.  In our test Supermicro machine, we have the 2 SSDs installed in separate PCIe slots and both exhibit timeouts which would seem to suggest that changing the PCIe slot is not likely to resolve the issue.

                   

                  3. No, we are not using any RAID controller and access the drive directly via /dev/nvme* devices.

                   

                  4. We will certainly try the new firmware once it is released.  If you can provide us a beta version to test sooner we would be happy to do so as well.

                   

                  Regards,

                  Shantanu

                  • 6. Re: I/O request timeouts on Linux with Intel P3520/P4600 NVMe PCIe SSDs
                    Intel Corporation
                    This message was posted on behalf of Intel Corporation

                    Hello Shantanu,

                    Again, thank you for answering our questions.

                    We’ll study this new information and as soon as I have relevant information I’ll posted here.

                    Unfortunately, we are not able to provide a beta version of the firmware version.

                    Thank you for your patience.

                    Regards,
                    Andres V.

                    • 7. Re: I/O request timeouts on Linux with Intel P3520/P4600 NVMe PCIe SSDs
                      Intel Corporation
                      This message was posted on behalf of Intel Corporation

                      Hello Shantanu,
                       
                      I would like to inform you that version 3.0.10 of the Intel® Solid State Drive Data Center Tool is now available, and it includes firmware version QDV10190 for your Intel® SSD DC P4600.
                       
                      Could you please download the corresponding tool (https://downloadcenter.intel.com/download/27497/Intel-SSD-Data-Center-Tool?v=t) update the firmware, and test again?
                       
                      I’ll be waiting for your response.
                       
                      Regards,
                      Andres V.

                      • 8. Re: I/O request timeouts on Linux with Intel P3520/P4600 NVMe PCIe SSDs
                        sg@intel

                        Hi,

                         

                        I am afraid the new firmware does not resolve the problem.

                         

                        # isdct version

                        - Version Information -

                        Name: Intel(R) Data Center Tool

                        Version: 3.0.10

                        Description: Interact and configure Intel SSDs.

                         

                        # isdct show -intelssd

                         

                        - Intel SSD DC P3520 Series CVPF717100L01P2JGN -

                         

                        Bootloader : MB1B0105

                        DevicePath : /dev/nvme0n1

                        DeviceStatus : Healthy

                        Firmware : MDV10271

                        FirmwareUpdateAvailable : The selected Intel SSD contains current firmware as of this tool release.

                        Index : 0

                        ModelNumber : INTEL SSDPEDMX012T7

                        ProductFamily : Intel SSD DC P3520 Series

                        SerialNumber : CVPF717100L01P2JGN

                         

                        - Intel SSD DC P4600 Series BTLE736007F54P0KGN -

                         

                        Bootloader : 0122

                        DevicePath : /dev/nvme1n1

                        DeviceStatus : Healthy

                        Firmware : QDV10170

                        FirmwareUpdateAvailable : The selected Intel SSD contains current firmware as of this tool release.

                        Index : 1

                        ModelNumber : INTEL SSDPEDKE040T7

                        ProductFamily : Intel SSD DC P4600 Series

                        SerialNumber : BTLE736007F54P0KGN

                         

                        When I run: mkfs -t xfs -f /dev/nvme1n1

                         

                        The driver still prints the following errors:

                         

                        nvme 0000:02:00.0: Aborting I/O 534 QID 1

                        nvme 0000:02:00.0: Aborting I/O 535 QID 1

                        nvme 0000:02:00.0: Aborting I/O 536 QID 1

                        nvme 0000:02:00.0: Aborting I/O 537 QID 1

                        nvme 0000:02:00.0: Aborting I/O 796 QID 1

                        nvme 0000:02:00.0: Aborting I/O 797 QID 1

                        nvme 0000:02:00.0: Aborting I/O 798 QID 1

                        nvme 0000:02:00.0: Aborting I/O 799 QID 1

                         

                        Thanks,

                        Shantanu

                        • 9. Re: I/O request timeouts on Linux with Intel P3520/P4600 NVMe PCIe SSDs
                          Intel Corporation
                          This message was posted on behalf of Intel Corporation

                          Hello Shantanu,

                           

                          I notice from your last post that the firmware version you have currently installed on your Intel® SSD DC P4600 is QDV10170.

                           


                          The Detailed Description in the Intel® SSD Data Center Tool site (https://downloadcenter.intel.com/download/27497?v=t) states the following:

                           

                           

                          Could you please check this example and reproduce the firmware update procedure? These images are from page 67 of the Intel® Solid State Drive Data Center Tool – User Guide (https://www.intel.com/content/dam/support/us/en/documents/memory-and-storage/Intel_SSD_DCT_3_0_x_User_Guide.pdf). Please keep in mind that on Linux systems, the tool must be run with root privileges. This can be done through either sudo or su commands.

                           

                          Linux users must call the load function twice with a system shutdown and reboot in between.

                           

                          First update:


                          The user then shuts down the system and reboots.
                          In the second update, the tool shows the next update.


                          The user shuts down the system and reboots.

                           

                           

                          In case you get any error message while performing the update, please share the screenshots associated with the firmware update process.

                           

                          In a previous message you mentioned that you are using Red Hat* Enterprise Linux* 6, is it version 6.5 or 6.6?

                           

                          I’ll be waiting for your response.

                           

                          Regards,
                          Andres V.

                          • 10. Re: I/O request timeouts on Linux with Intel P3520/P4600 NVMe PCIe SSDs
                            sg@intel

                            Hi,

                             

                            I powercycled the system and tried running the load again but it still reports the drive as having the latest firmware.  When I first downloaded and ran isdct 3.0.10 it did report having newer firmware and successfully updated it on the drive and all commands were run as root.

                             

                            Here is the version of the tool:

                             

                            # isdct version

                            - Version Information -

                            Name: Intel(R) Data Center Tool

                            Version: 3.0.10

                            Description: Interact and configure Intel SSDs.

                             

                             

                            When I attempt to load the firmware now, this is the output I get from the tool:

                             

                            # isdct load -intelssd 1

                            WARNING! You have selected to update the drives firmware!

                            Proceed with the update? (Y|N): Y

                            Updating firmware...

                             

                            - Intel SSD DC P4600 Series BTLE736007F54P0KGN -

                             

                            Status : The selected Intel SSD contains current firmware as of this tool release.

                             

                             

                            # isdct show -intelssd 1

                             

                            - Intel SSD DC P4600 Series BTLE736007F54P0KGN -

                             

                            Bootloader : 0122

                            DevicePath : /dev/nvme1n1

                            DeviceStatus : Healthy

                            Firmware : QDV10170

                            FirmwareUpdateAvailable : The selected Intel SSD contains current firmware as of this tool release.

                            Index : 1

                            ModelNumber : INTEL SSDPEDKE040T7

                            ProductFamily : Intel SSD DC P4600 Series

                            SerialNumber : BTLE736007F54P0KGN

                             

                             

                            The version of RHEL is 6.9

                             

                            Thanks,

                            Shantanu

                            • 11. Re: I/O request timeouts on Linux with Intel P3520/P4600 NVMe PCIe SSDs
                              Intel Corporation
                              This message was posted on behalf of Intel Corporation

                              Hello Shantanu,

                              There seems to be a software compatibility issue that may be causing this, because as you can see in the following image, the Intel® SSD Data Center Tool is supported for the following operating systems, and RHEL 6.9 is not one of those:

                               

                              Do you have access to a PC with any of the listed operating systems? Could you please try again to install the latest firmware using the official tool?

                              It’s important for us to find out if version QDV10190 solves the issue you are experiencing.

                              I’ll be waiting for your response.

                              Regards,
                              Andres V.

                              • 12. Re: I/O request timeouts on Linux with Intel P3520/P4600 NVMe PCIe SSDs
                                sg@intel

                                Hi,

                                 

                                RHEL 6.6 is very old (released in 2014) and we have long since upgraded our systems to 6.9 so I am unable to test on that release. I am surprised your tool releases have not kept up with vendor OS releases.  Both isdct versions 3.0.9 and 3.0.10 did update the firmware to a newer release without complaint so it is not clear what the nature of the incompatibility is here since the tool itself does not print message indicating as such.

                                 

                                Shantanu

                                • 13. Re: I/O request timeouts on Linux with Intel P3520/P4600 NVMe PCIe SSDs
                                  Intel Corporation
                                  This message was posted on behalf of Intel Corporation

                                  Hello Shantanu,

                                   

                                  Thank you for your feedback.

                                   

                                  Regarding your comment:

                                  Both isdct versions 3.0.9 and 3.0.10 did update the firmware to a newer release without complaint so it is not clear what the nature of the incompatibility is here since the tool itself does not print message indicating as such.

                                  Are you referring to an update to firmware version QDV10170 or to firmware version QDV10190? Have you been able to update the SSDs that do not show the persistent I/O request timeouts? Do you have any Intel® SSD DC P4600 with firmware version QDV10190?

                                  Regards,
                                  Andres V.

                                  • 14. Re: I/O request timeouts on Linux with Intel P3520/P4600 NVMe PCIe SSDs
                                    sg@intel

                                    Hi,

                                     

                                    I was referring to the fact that on the test machine we initially used isdct 3.0.9 to upgrade the P4600 firmware version from QDV10130 to QDV10150 and isdct 3.0.10 subsequently from QDV10150 to QDV10170.  As I posted in the output above isdct 3.0.10 shows QDV10170 as the latest revision of the firmware available and states that the drive already has that revision installed on it.  It does not report QDV10190 as being available.  Could this be a discrepancy in the firmware revision between the documentation and the tool itself?

                                     

                                    The P4600s we tried deploying in production have firmware QDV10130 and they all exhibit the timeouts so until this issue is resolved, these drives are unusable for us.  We have had great success with your SATA SSDs (S3700, S3600, S3610, S3520) on various different versions of the OS and Linux kernels which is why we purchased their NVMe counterparts but as I now, the experience with them has been a disappointing one so we would really appreciate help in resolving the issue.

                                     

                                    Thanks,

                                    Shantanu

                                    1 2 Previous Next