8 Replies Latest reply on Jan 23, 2017 5:48 AM by Intel Corporation

    my Intel SSD 750 (400GB) suddenly only does 2MB/s writes, and kernel reports aborted commands etc.

    Tomas_V

      About 6 months ago I bought the Intel SSD 750 400GB, and have been using it for various database-related benchmarking tasks and such. It was working fine until this week, when the kernel suddenly started reporting strange issues about aborted commands:

       

      Jan 12 13:10:27 bench2 kernel: nvme nvme0: I/O 0 QID 12 timeout, aborting

      Jan 12 13:10:27 bench2 kernel: nvme nvme0: I/O 1 QID 12 timeout, aborting

      Jan 12 13:10:27 bench2 kernel: nvme nvme0: I/O 2 QID 12 timeout, aborting

      Jan 12 13:10:27 bench2 kernel: nvme nvme0: I/O 3 QID 12 timeout, aborting

      Jan 12 13:10:27 bench2 kernel: nvme nvme0: completing aborted command with status: 0000

      Jan 12 13:10:27 bench2 kernel: nvme nvme0: Abort status: 0x0

      Jan 12 13:10:27 bench2 kernel: nvme nvme0: completing aborted command with status: 0000

      Jan 12 13:10:27 bench2 kernel: nvme nvme0: completing aborted command with status: 0000

      Jan 12 13:10:27 bench2 kernel: nvme nvme0: completing aborted command with status: 0000

      Jan 12 13:10:27 bench2 kernel: nvme nvme0: completing aborted command with status: 0000

      Jan 12 13:10:27 bench2 kernel: nvme nvme0: completing aborted command with status: 0000

      ...

      Jan 12 13:10:33 bench2 kernel: nvme nvme0: completing aborted command with status: 0000

      Jan 12 13:10:33 bench2 kernel: nvme nvme0: completing aborted command with status: 0000

      Jan 12 13:10:33 bench2 kernel: nvme nvme0: I/O 196 QID 12 timeout, aborting

      Jan 12 13:10:33 bench2 kernel: nvme nvme0: I/O 212 QID 12 timeout, aborting

      Jan 12 13:10:33 bench2 kernel: nvme nvme0: I/O 273 QID 12 timeout, aborting

      Jan 12 13:10:33 bench2 kernel: nvme nvme0: I/O 275 QID 12 timeout, aborting

      Jan 12 13:10:33 bench2 kernel: nvme nvme0: completing aborted command with status: 0000

      Jan 12 13:10:33 bench2 kernel: nvme nvme0: completing aborted command with status: 0000

      Jan 12 13:10:33 bench2 kernel: nvme nvme0: completing aborted command with status: 0000

      ...

      Jan 12 13:16:59 bench2 kernel: nvme nvme0: completing aborted command with status: 0000

      Jan 12 13:16:59 bench2 kernel: nvme nvme0: completing aborted command with status: 0000

      Jan 12 13:16:59 bench2 kernel: nvme nvme0: completing aborted command with status: 0000

      Jan 12 13:17:00 bench2 kernel: nvme nvme0: completing aborted command with status: fffffffc

      Jan 12 13:17:00 bench2 kernel: blk_update_request: I/O error, dev nvme0n1, sector 422162944

      Jan 12 13:17:00 bench2 kernel: Buffer I/O error on dev nvme0n1p1, logical block 52770079, lost async page write

      Jan 12 13:17:00 bench2 kernel: Buffer I/O error on dev nvme0n1p1, logical block 52770080, lost async page write

      Jan 12 13:17:00 bench2 kernel: Buffer I/O error on dev nvme0n1p1, logical block 52770081, lost async page write

      Jan 12 13:17:00 bench2 kernel: Buffer I/O error on dev nvme0n1p1, logical block 52770082, lost async page write

      Jan 12 13:17:00 bench2 kernel: Buffer I/O error on dev nvme0n1p1, logical block 52770083, lost async page write

       

      I'm regularly testing new kernels / distributions, so at first I thought it's a bug in one of these, but after a lot of experiments I doubt that - I can reproduce the same issue even with older kernels that I've used without any issue.

       

      Interestingly enough, this only affects writes - the reads seem to be working just fine (easily >2GB/s in sequential workload), but only 2MB/s in writes. Not a filesystem issue either - this happens even with simple dd writing /dev/nvme0n1 directly.

       

      I've tried to install the newest firmware using the isdct tool (v 3.0.0), and `isdct show` now reports this:

       

      [root@bench2 ~]# isdct show -a -intelssd 0

       

      - Intel SSD 750 Series CVCQ55020067400AGN -

       

      AggregationThreshold : 0

      AggregationTime : 0

      ArbitrationBurst : 0

      Bootloader : 8B1B0131

      CoalescingDisable : 1

      DevicePath : /dev/nvme0n1

      DeviceStatus : Healthy

      EnduranceAnalyzer : 0.30 years

      ErrorString :

      Firmware : 8EV10174

      FirmwareUpdateAvailable : The selected Intel SSD contains current firmware as of this tool release.

      HighPriorityWeightArbitration : 0

      IOCompletionQueuesRequested : 30

      IOSubmissionQueuesRequested : 30

      Index : 0

      Intel : True

      IntelGen3SATA : False

      IntelNVMe : True

      InterruptVector : 0

      LatencyTrackingEnabled : False

      LowPriorityWeightArbitration : 0

      MediumPriorityWeightArbitration : 0

      ModelNumber : INTEL SSDPEDMW400G4

      NVME_1_0_Supported : True

      NVME_1_2_Supported : False

      NVMeControllerID : 0

      NVMePowerState : 0

      NamespaceId : 4294967295

      NativeMaxLBA : 781422767

      NumErrorLogPageEntries : 63

      OEM : Generic

      PCILinkGenSpeed : 3

      PCILinkWidth : 4

      PowerGovernorMode : 0 25W

      Product : CarmelRidge

      ProductFamily : Intel SSD 750 Series

      ProductProtocol : NVME

      SMARTHealthCriticalWarningsConfiguration : 0

      SMBusAddress : 106

      SectorSize : 512

      SerialNumber : CVCQ55020067400AGN

      TCGSupported : False

      TempThreshold : 85

      TimeLimitedErrorRecovery : 0

      TrimSupported : True

      VolatileWriteCacheEnabled : False

      WriteAtomicityDisableNormal : 0

       

      And sensors:

       

      [root@bench2 ~]# isdct show -sensor -intelssd 0

       

      - Intel SSD 750 Series CVCQ55020067400AGN -

       

      AvailableSpare : 100

      AverageNandEraseCycles : 3148

      CrcErrorCount : 0

      DeviceStatus : Healthy

      EndToEndErrorDetectionCount : 0

      EnduranceAnalyzer : 0.30

      EraseFailCount : 0

      ErrorInfoLogEntries : 0x0B

      HighestLifetimeTemperature : 46

      LowestLifetimeTemperature : 18

      MaxNandEraseCycles : 3179

      MediaErrors : 0x00

      MinNandEraseCycles : 3118

      PercentageUsed : 105

      PowerCycles : 0x038

      PowerOnHours : 0x0A57

      ProgramFailCount : 0

      SpecifiedPCBMaxOperatingTemp : 85

      SpecifiedPCBMinOperatingTemp : 0

      Temperature : 33 Celsius

      ThermalThrottleCount : 0

      ThermalThrottleStatus : 0

      UnsafeShutdowns : 0x031

       

      So according to this the drive seems to be healthy, no errors etc.

       

      Any ideas what this might be?