1 2 3 4 Previous Next 45 Replies Latest reply on Mar 15, 2016 8:50 PM by weafon

    S3610 SSDs have failed "READ/WRITE FPDMA QUEUED" ATA commands, frozen, then link reset

    grifferz

      Hi,

       

      I have a new Linux machine with two DC S3610 1.6TB SSDs. It's Debian jessie so kernel 3.6.17. Since around one month after installation these errors started appearing:

       

      Jul 30 16:30:59 snaps kernel: [186914.249429] ata1.00: exception Emask 0x0 SAct 0x3 SErr 0x0 action 0x6 frozen

      Jul 30 16:30:59 snaps kernel: [186914.250465] ata1.00: failed command: WRITE FPDMA QUEUED

      Jul 30 16:30:59 snaps kernel: [186914.251505] ata1.00: cmd 61/08:00:39:db:8e/00:00:09:00:00/40 tag 0 ncq 4096 out

      Jul 30 16:30:59 snaps kernel: [186914.251505]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)

      Jul 30 16:30:59 snaps kernel: [186914.253613] ata1.00: status: { DRDY }

      Jul 30 16:30:59 snaps kernel: [186914.254781] ata1.00: failed command: WRITE FPDMA QUEUED

      Jul 30 16:30:59 snaps kernel: [186914.255810] ata1.00: cmd 61/08:08:71:fc:4e/00:00:66:00:00/40 tag 1 ncq 4096 out

      Jul 30 16:30:59 snaps kernel: [186914.255810]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)

      Jul 30 16:30:59 snaps kernel: [186914.257940] ata1.00: status: { DRDY }

      Jul 30 16:30:59 snaps kernel: [186914.259086] ata1: hard resetting link

      Jul 30 16:31:00 snaps kernel: [186914.577366] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)

      Jul 30 16:31:00 snaps kernel: [186914.578307] ata1.00: configured for UDMA/133

      Jul 30 16:31:00 snaps kernel: [186914.578310] ata1.00: device reported invalid CHS sector 0

      Jul 30 16:31:00 snaps kernel: [186914.578311] ata1.00: device reported invalid CHS sector 0

      Jul 30 16:31:00 snaps kernel: [186914.578316] ata1: EH complete

       

      The error is always the same, and the only thing on ata1.00 is one of the SSDs. I switched the two SSDs around and the problem followed the same SSD.

       

      I can't force the error to happen on demand, it just seems to happen every other day or so, though not at the same time of day. All IO is held up briefly while the link is reset. The drive passes a SMART long self-test.

       

      So is this drive faulty? If not, what can I try to fix this? If so, is there an easy way to prove it for RMA purposes?

       

      Jul 27 05:59:30 snaps kernel: [   33.054376] ata1.00: ATA-9: INTEL SSDSC2BX016T4, G2010110, max UDMA/133

      Jul 27 05:59:30 snaps kernel: [   33.054474] ata1.00: 3125627568 sectors, multi 1: LBA48 NCQ (depth 31/32)

      Jul 27 05:59:30 snaps kernel: [   33.054567] ata2.00: ATA-9: INTEL SSDSC2BX016T4, G2010110, max UDMA/133

      Jul 27 05:59:30 snaps kernel: [   33.054657] ata2.00: 3125627568 sectors, multi 1: LBA48 NCQ (depth 31/32)

       

      $ sudo smartctl -i /dev/sda

      smartctl 6.4 2014-10-07 r4002 [x86_64-linux-3.16.0-4-amd64] (local build)

      Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

       

      === START OF INFORMATION SECTION ===

      Device Model:     INTEL SSDSC2BX016T4

      Serial Number:    BTHC511604V41P6PGN

      LU WWN Device Id: 5 5cd2e4 04b7b1bfa

      Firmware Version: G2010110

      User Capacity:    1,600,321,314,816 bytes [1.60 TB]

      Sector Sizes:     512 bytes logical, 4096 bytes physical

      Rotation Rate:    Solid State Device

      Form Factor:      2.5 inches

      Device is:        Not in smartctl database [for details use: -P showall]

      ATA Version is:   ACS-2 T13/2015-D revision 3

      SATA Version is:  SATA 2.6, 6.0 Gb/s (current: 6.0 Gb/s)

      Local Time is:    Fri Jul 31 11:04:09 2015 UTC

      SMART support is: Available - device has SMART capability.

      SMART support is: Enabled

       

      $ sudo smartctl -i /dev/sdb

      smartctl 6.4 2014-10-07 r4002 [x86_64-linux-3.16.0-4-amd64] (local build)

      Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

                                                     

      === START OF INFORMATION SECTION ===           

      Device Model:     INTEL SSDSC2BX016T4          

      Serial Number:    BTHC511604SD1P6PGN           

      LU WWN Device Id: 5 5cd2e4 04b7b1ba2           

      Firmware Version: G2010110                     

      User Capacity:    1,600,321,314,816 bytes [1.60 TB]

      Sector Sizes:     512 bytes logical, 4096 bytes physical

      Rotation Rate:    Solid State Device           

      Form Factor:      2.5 inches                   

      Device is:        Not in smartctl database [for details use: -P showall]

      ATA Version is:   ACS-2 T13/2015-D revision 3  

      SATA Version is:  SATA 2.6, 6.0 Gb/s (current: 6.0 Gb/s)

      Local Time is:    Fri Jul 31 11:04:35 2015 UTC 

      SMART support is: Available - device has SMART capability.

      SMART support is: Enabled

       

      Message was edited by: Andy Smith Now seeing same problems with other SSD, so this is not restricted to a single drive.

        • 1. Re: One S3610 SSD has failed "WRITE FPDMA QUEUED" ATA commands, how to resolve or prove faulty?
          aleki_intel

          Hello grifferz,

           

          We are going to check on this and will provide you a reply as soon as possible.

          • 2. Re: One S3610 SSD has failed "WRITE FPDMA QUEUED" ATA commands, how to resolve or prove faulty?
            jonathan_intel

            Hello grifferz,

             

            Please make sure the BIOS of your system is up-to-date, and that you are using the drivers recommended by the system manufacturer.

             

            If the issue persists, please let us know the following:

             

            - Smart Attributes output (smartctl -A)

             

            - PC make and model

            - Motherboard model

            - BIOS version

            - Type of Storage controller where the drive is plugged into.

            • 3. Re: One S3610 SSD has failed "WRITE FPDMA QUEUED" ATA commands, how to resolve or prove faulty?
              grifferz

              Hi Jonathan,

               

              > Please make sure the BIOS of your system is up-to-date

               

              Yes, it is the latest BIOS.

               

              > and that you are using the drivers recommended by the system manufacturer.

               

              Well, this is a Debian Linux 8.0 system, with the latest kernel package, so I don't think there are any other recommended drivers.

               

              > Smart Attributes output (smartctl -A)

               

              $ sudo smartctl -A /dev/sda

              smartctl 6.4 2014-10-07 r4002 [x86_64-linux-3.16.0-4-amd64] (local build)

              Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

               

              === START OF READ SMART DATA SECTION ===

              SMART Attributes Data Structure revision number: 1

              Vendor Specific SMART Attributes with Thresholds:

              ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

                5 Reallocated_Sector_Ct   0x0032   099   099   000    Old_age   Always       -       0

                9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       630

              12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       17

              170 Unknown_Attribute       0x0033   100   100   010    Pre-fail  Always       -       0

              171 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0

              172 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0

              174 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       2

              175 Program_Fail_Count_Chip 0x0033   100   100   010    Pre-fail  Always       -       5164180714

              183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0

              184 End-to-End_Error        0x0033   100   100   090    Pre-fail  Always       -       0

              187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0

              190 Airflow_Temperature_Cel 0x0022   076   071   000    Old_age   Always       -       24 (Min/Max 24/30)

              192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       2

              194 Temperature_Celsius     0x0022   100   100   000    Old_age   Always       -       24

              197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0

              199 UDMA_CRC_Error_Count    0x003e   100   100   000    Old_age   Always       -       0

              225 Unknown_SSD_Attribute   0x0032   100   100   000    Old_age   Always       -       80726

              226 Unknown_SSD_Attribute   0x0032   100   100   000    Old_age   Always       -       20

              227 Unknown_SSD_Attribute   0x0032   100   100   000    Old_age   Always       -       62

              228 Power-off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       37677

              232 Available_Reservd_Space 0x0033   100   100   010    Pre-fail  Always       -       0

              233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       -       0

              234 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0

              241 Total_LBAs_Written      0x0032   100   100   000    Old_age   Always       -       80726

              242 Total_LBAs_Read         0x0032   100   100   000    Old_age   Always       -       131822

               

              > PC make and model

               

              A Supermicro server

               

              > Motherboard model

               

              Supermicro X10SDV-F

               

              > BIOS version

               

              AMI BIOS R 1.0a

               

              > Type of Storage controller where the drive is plugged into.

               

              Directly into motherboard SATA.

              • 4. Re: One S3610 SSD has failed "WRITE FPDMA QUEUED" ATA commands, how to resolve or prove faulty?
                andreykorolyov

                Same issue for me, but with brand new S3710s, seemingly all our samples are 'defective' and tends to reset bus once or twice per day with very moderate workload applied. S3700 and S3500 worked at the same place (SATA port, M/B revision and BIOS #) just flawless previously. Had to ask both SuperMicro and Intel support privately for possible actions, though most likely the issue is specific to a 22nm SSD generation.

                 

                Edit: would be very grateful for RMA hints as well, possibly with direct communication with a retailer involved. The risk of using those devices is too high right now, we`d prefer to replace entire party with well-known S3700 over return by defect and start detailed investigation on a selected samples after that.

                • 5. Re: One S3610 SSD has failed "WRITE FPDMA QUEUED" ATA commands, how to resolve or prove faulty?
                  jonathan_intel

                  Hello andreykorolyov,

                   

                  If the new SSD's are not working as expected in your system and you would like to exchange them, we would advise you contact the place of purchase, even more if you obtained them as samples or for testing purposes.

                   

                  Please take into consideration that for warranty issues, you should Contact Support to engage a support agent in your nearest support center.

                  • 6. Re: One S3610 SSD has failed "WRITE FPDMA QUEUED" ATA commands, how to resolve or prove faulty?
                    andreykorolyov

                    Thank you Jonathan, will contact the SC next day,

                     

                    would Intel engineering team be interested in a further investigation of the issue? I can easily reproduce the problem on a 20-minute FIO test run on any SSD from set we bought. Firmware updater says that the running version is latest, so the problem is bound to the specific SSD hardware I suppose. Again, the issue belongs at least to ten disks from our part and I strongly believe that the rest is affected as well, so I`d like to help to fix this issue instead of only giving those back. For now it looks that both C602 and C220 chipsets are affected, and I can confirm that both SATA and SAS downlinks are exposing the issue on C602.

                    • 7. Re: S3610 SSDs have failed "READ/WRITE FPDMA QUEUED" ATA commands, frozen, then link reset
                      grifferz

                      I've since seem the same problems with the other drive in the pair, so I now find it hard to believe that this is a single faulty drive. I've edited the post title to reflect this.

                       

                      I do not now know how to proceed. I need to know if the problem is a bug in the Linux kernel, in the SATA chipset or in the drives themselves.

                       

                      It seems I can make the problem go away by disabling NCQ, but this reduces the performance of the drive to around 25% of max IOPS so is not a long term solution.

                       

                      This server has an Intel C220 SATA chipset:

                       

                      00:1f.2 SATA controller: Intel Corporation 8 Series/C220 Series Chipset Family 6-port SATA Controller 1 [AHCI mode] (rev 05) (prog-if 01 [AHCI 1.0])

                              Subsystem: Super Micro Computer Inc Device 086d

                              Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+

                              Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-

                              Latency: 0

                              Interrupt: pin A routed to IRQ 164

                              Region 0: I/O ports at f070 [size=8]

                              Region 1: I/O ports at f060 [size=4]

                              Region 2: I/O ports at f050 [size=8]

                              Region 3: I/O ports at f040 [size=4]

                              Region 4: I/O ports at f020 [size=32]

                              Region 5: Memory at fb312000 (32-bit, non-prefetchable) [size=2K]

                              Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-

                                      Address: fee002b8  Data: 0000

                              Capabilities: [70] Power Management version 3

                                      Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold-)

                                      Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-

                              Capabilities: [a8] SATA HBA v1.0 BAR4 Offset=00000004

                              Kernel driver in use: ahci

                      00: 86 80 02 8c 07 04 b0 02 05 01 06 01 00 00 00 00

                      10: 71 f0 00 00 61 f0 00 00 51 f0 00 00 41 f0 00 00

                      20: 21 f0 00 00 00 20 31 fb 00 00 00 00 d9 15 6d 08

                      30: 00 00 00 00 80 00 00 00 00 00 00 00 0b 01 00 00

                       

                      Are you aware of any problems with C220 chipset and S3610 drives?

                      • 8. Re: S3610 SSDs have failed "READ/WRITE FPDMA QUEUED" ATA commands, frozen, then link reset
                        jonathan_intel

                        We are very interested in this issue and we'll need to do more research about it. We will contact you via Private Message individually with further details and to request any additional information.

                        • 9. Re: S3610 SSDs have failed "READ/WRITE FPDMA QUEUED" ATA commands, frozen, then link reset
                          dchepishev

                          Hi guys,

                           

                          Please post here when you have some progress on the subject.

                           

                          I am having similar problem with S3710 800G, connected to LSI MegaRAID SAS 9271-4i via Supermicro expander backplane. The problems appear at almost zero load.

                           

                          I have 4x800G S3710 in RAID10 array.

                          On two of the ports I was getting errors like this (errors are from LSI storage manager):

                           

                          Aug  2 05:07:06 h19 MR_MONITOR[3772]: <MRMON268> Controller ID:  0  PD Reset:   PD  #012=   -:-:3,   Critical  #012=   3,   Path   =#012 0x5003048000F3BE0F#012Event ID:268
                          Aug  2 05:07:07 h19 MR_MONITOR[3772]: <MRMON267> Controller ID:  0  Command timeout on PD:   PD  #012=   -:-:3No addtional sense information,   CDB   =0x48 0xd0 0xc0 0x00 0x00 0x00 0x00 0x00 0x08 0x00,   Sense   =   ,   Path   =#012 0x5003048000F3BE0F#012Event ID:267
                          Aug  2 05:07:07 h19 MR_MONITOR[3772]: <MRMON267> Controller ID:  0  Command timeout on PD:   PD  #012=   -:-:3No addtional sense information,   CDB   =0x58 0xd0 0xc0 0x00 0x00 0x00 0x00 0x00 0x08 0x00,   Sense   =   ,   Path   =#012 0x5003048000F3BE0F#012Event ID:267
                          Aug  2 05:07:07 h19 MR_MONITOR[3772]: <MRMON113> Controller ID:  0   Unexpected sense:   PD  #012=   -:-:3Power on, reset, or bus device reset occurred,   CDB   =0x2a 0x00 0x00 0xc0 0xd0 0x58 0x00 0x00 0x08 0x00,   Sense   =0x70 0x00 0x06 0x00 0x00 0x00 0x00 0x0a 0x00 0x00 0x00 0x00 0x29 0x00 0x00 0x00 0x00 0x00

                           

                          Contacted our vendor and they recommended to flash the firmware of the SSD disks. However just to make sure that everything with the backplane is ok we swapped the bays of all of the four disks: swapped port0 with port2, port1 with port3, and the problem somehow disappeared at least for the last 3-4 days.

                          • 10. Re: S3610 SSDs have failed "READ/WRITE FPDMA QUEUED" ATA commands, frozen, then link reset
                            jonathan_intel

                            Hello dchepishev,

                             

                            We are looking into this issue and an update will be provided once we have more information.

                            Please keep us informed in case the issue reappears.

                            • 11. Re: S3610 SSDs have failed "READ/WRITE FPDMA QUEUED" ATA commands, frozen, then link reset
                              dne

                              We're seeing this same issue on 5 identical servers with Supermicro X10SLM+-LN4F motherboards in Supermicro 813MT-350CB 1U chassis, with one S3610 SSD in each machine, connected via the hot-swap backplane on the chassis to the onboard Intel C224 chipset 6 Gbps SATA ports.

                               

                              One of the 5 machines has one additional spare S3610 - this has not shown any failures/resets, but there's no I/O being performed on it.

                               

                              These S3610 SSDs were installed in the beginning of July, and the first command failure/bus reset occurred within a couple of days. It doesn't occur every day, and at the most a couple of times per day (currently we only have logs for 4-5 weeks back). I/O load is not high.

                               

                              The machines also have DC S3500 series SSDs, which have been working flawlessly for the last year.

                               

                              OS: Debian 7 (Wheezy), 64-bit

                              BIOS: AMI BIOS, version 1.1a.

                               

                              There is currently no SMART status monitoring running on these machines.

                               

                              Except from Linux kernel log output:

                               

                              Aug 14 11:07:06 hotel kernel: [3273761.737966] ata2.00: exception Emask 0x0 SAct 0x30000000 SErr 0x0 action 0x6 frozen

                              Aug 14 11:07:06 hotel kernel: [3273761.738054] ata2.00: failed command: WRITE FPDMA QUEUED

                              Aug 14 11:07:06 hotel kernel: [3273761.738103] ata2.00: cmd 61/10:e0:c0:70:05/00:00:10:00:00/40 tag 28 ncq 8192 out

                              Aug 14 11:07:06 hotel kernel: [3273761.738105]          res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)

                              Aug 14 11:07:06 hotel kernel: [3273761.738238] ata2.00: status: { DRDY }

                              Aug 14 11:07:06 hotel kernel: [3273761.738281] ata2.00: failed command: WRITE FPDMA QUEUED

                              Aug 14 11:07:06 hotel kernel: [3273761.738334] ata2.00: cmd 61/10:e8:c0:70:25/00:00:13:00:00/40 tag 29 ncq 8192 out

                              Aug 14 11:07:06 hotel kernel: [3273761.738336]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)

                              Aug 14 11:07:06 hotel kernel: [3273761.738467] ata2.00: status: { DRDY }

                              Aug 14 11:07:06 hotel kernel: [3273761.738512] ata2: hard resetting link

                              Aug 14 11:07:07 hotel kernel: [3273762.057688] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)

                              Aug 14 11:07:07 hotel kernel: [3273762.058814] ata2.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded

                              Aug 14 11:07:07 hotel kernel: [3273762.058825] ata2.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out

                              Aug 14 11:07:07 hotel kernel: [3273762.058833] ata2.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out

                              Aug 14 11:07:07 hotel kernel: [3273762.060141] ata2.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded

                              Aug 14 11:07:07 hotel kernel: [3273762.060149] ata2.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out

                              Aug 14 11:07:07 hotel kernel: [3273762.060155] ata2.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out

                              Aug 14 11:07:07 hotel kernel: [3273762.060500] ata2.00: configured for UDMA/133

                              Aug 14 11:07:07 hotel kernel: [3273762.060510] ata2.00: device reported invalid CHS sector 0

                              Aug 14 11:07:07 hotel kernel: [3273762.060515] ata2.00: device reported invalid CHS sector 0

                              Aug 14 11:07:07 hotel kernel: [3273762.060528] ata2: EH complete

                               

                              SMART info & attributes (from one of the machines):

                               

                              root@hotel:~# smartctl -iA /dev/sdb

                              smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-4-amd64] (local build)

                              Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

                               

                              === START OF INFORMATION SECTION ===

                              Device Model:     INTEL SSDSC2BX400G4

                              Serial Number:    BTHC514101W7400VGN

                              LU WWN Device Id: 5 5cd2e4 04b7ca92d

                              Firmware Version: G2010110

                              User Capacity:    400,088,457,216 bytes [400 GB]

                              Sector Sizes:     512 bytes logical, 4096 bytes physical

                              Device is:        Not in smartctl database [for details use: -P showall]

                              ATA Version is:   8

                              ATA Standard is:  ACS-2 revision 3

                              Local Time is:    Tue Aug 18 15:28:28 2015 UTC

                              SMART support is: Available - device has SMART capability.

                              SMART support is: Enabled

                               

                              === START OF READ SMART DATA SECTION ===

                              SMART Attributes Data Structure revision number: 1

                              Vendor Specific SMART Attributes with Thresholds:

                              ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

                                5 Reallocated_Sector_Ct   0x0032   100   100   000    Old_age   Always       -       0

                                9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       1028

                              12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       2

                              170 Unknown_Attribute       0x0033   100   100   010    Pre-fail  Always       -       0

                              171 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0

                              172 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0

                              174 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       1

                              175 Program_Fail_Count_Chip 0x0033   100   100   010    Pre-fail  Always       -       21563578034

                              183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0

                              184 End-to-End_Error        0x0033   100   100   090    Pre-fail  Always       -       0

                              187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0

                              190 Airflow_Temperature_Cel 0x0022   079   077   000    Old_age   Always       -       21 (Min/Max 20/24)

                              192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       1

                              194 Temperature_Celsius     0x0022   100   100   000    Old_age   Always       -       21

                              197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0

                              199 UDMA_CRC_Error_Count    0x003e   100   100   000    Old_age   Always       -       0

                              225 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       3207

                              226 Load-in_Time            0x0032   100   100   000    Old_age   Always       -       0

                              227 Torq-amp_Count          0x0032   100   100   000    Old_age   Always       -       13

                              228 Power-off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       61697

                              232 Available_Reservd_Space 0x0033   100   100   010    Pre-fail  Always       -       0

                              233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       -       0

                              234 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       0

                              241 Total_LBAs_Written      0x0032   100   100   000    Old_age   Always       -       3207

                              242 Total_LBAs_Read         0x0032   100   100   000    Old_age   Always       -       506

                               

                               

                              PCI info for SATA controller (from lspci):

                               

                              00:1f.2 SATA controller: Intel Corporation Lynx Point 6-port SATA Controller 1 [AHCI mode] (rev 05) (prog-if 01 [AHCI 1.0])

                                  Subsystem: Super Micro Computer Inc Device 0806

                                  Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+

                                  Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-

                                  Latency: 0

                                  Interrupt: pin B routed to IRQ 51

                                  Region 0: I/O ports at f050 [size=8]

                                  Region 1: I/O ports at f040 [size=4]

                                  Region 2: I/O ports at f030 [size=8]

                                  Region 3: I/O ports at f020 [size=4]

                                  Region 4: I/O ports at f000 [size=32]

                                  Region 5: Memory at f7512000 (32-bit, non-prefetchable) [size=2K]

                                  Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-

                                  Address: fee003b8  Data: 0000

                                  Capabilities: [70] Power Management version 3

                                  Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold-)

                                  Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-

                                  Capabilities: [a8] SATA HBA v1.0 BAR4 Offset=00000004

                                  Kernel driver in use: ahci

                              • 12. Re: S3610 SSDs have failed "READ/WRITE FPDMA QUEUED" ATA commands, frozen, then link reset
                                andreykorolyov

                                In a meantime you may disable NCQ via libata: libata.force=X:noncq for the specific link. Reducing queue_length to 1 was not helpful for me, instead you probably should completely eliminate possibility of issuing NCQ tags. Hopefully the new firmware with fix will hit the public this week and this bad hack can be thrown out.

                                • 13. Re: One S3610 SSD has failed "WRITE FPDMA QUEUED" ATA commands, how to resolve or prove faulty?
                                  grifferz

                                  Hi Andrey,

                                   

                                  Have you had any indication that there is a fix for this in forthcoming firmware then?

                                   

                                  So far I've had no response to asking for updates on this and I need to make a decision as to whether I'm going to wait or return for refund and buy something else.

                                  • 14. Re: One S3610 SSD has failed "WRITE FPDMA QUEUED" ATA commands, how to resolve or prove faulty?
                                    andreykorolyov

                                    Hi Andy,

                                     

                                    in a phone conversation support engineer indicated an approximate date of the firmware release as an end of the current week a week ago, though the could be obviously delayed a little, the corresponding ticket is still open as I asked to hold it until the complete resolution. I am relatively fine with the "workaround" for now because our hot caches are not likely to generate more than 2K IOPS per caching device ever, so I changed my mind over the possibility of utilizing buggy devices as is without issuing an RMA. Over couple of months the tiering scheme in our datacenter is a subject to change and a single-queued SSD cannot be an option anymore for a tier-1 "iops-dampeners". Please share your RMA experience if you decide not to wait for a FW release.

                                    1 2 3 4 Previous Next