3 Replies Latest reply on Feb 11, 2015 11:23 AM by jonathan_intel

    Intel DC S3500 SMART Attributes if TBW exceeded

    gschoenberger

      Hi Intel Team,

       

      I am currently dealing with an Intel DC S3500, 240GB to which ~213TB have been written on.

      According to the specification TBW is 140TB, so the SSD has tolerated much more data than specified.

      I have taken a look at the SMART attributes, to check whether smartd should have notice me:

      D# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
        9 Power_On_Hours          -O--CK                100    100     000        -                    3379
        170 Available_Reservd_Space PO--CK    100     100    010       -                   0
        233 Media_Wearout_Indicator -O--CK        001     001    000       -                   0
        241 Host_Writes_32MiB       -O--CK            100     100    000        -                   7010161   

       

      Unfortunately media wearout, which is at 1%, has no threshold and no pre-fail set (maybe it would be

      an improvement to the firmware to set a threshold). What puzzles me somehow

      is that available reserved space is still at 100%, I thought this would go down also if the SSD wears out?

      Just to note, the SSD is over provisioned to 100GB.

       

      My questions:

      * Is over provisioning the reason, that available reserved space did not decrease? My understanding was, that this

        SMART attribute goes down along with media wearout, if more and more data is written to the SSD.

      * Since now I used the SMART attributes to monitor the SSDs. Is there any other way to monitor SSDs and get notifications

        if the SSD's lifetime is getting shorter? (I know that there's the Intel toolbox, but I would need it mainly for Linux and under some

        circumstances along with RAID controllers.)


      Thanks a lot for your help,

      all the best Georg

        • 1. Re: Intel DC S3500 SMART Attributes if TBW exceeded
          jonathan_intel

          Hello Gschoenberger,

           

          The Endurance Rating (140 TBW) is calculated using Global testing standards, however, Intel® SSD's are expected to exceed those values in almost all cases.

           

          The Media Wearout indicator is a more realistic indicator of the wear of the chips. It declines linearly from 100 to 1 depending on the number of cycles the NAND media has undergone. If the normalized value reaches 1, it means that the average erase cycle count has reached the maximum rated cycles. Although, it is likely that significant additional wear can be put on the drive, as is in this case.

           

          The SMART counters for the previous attributes are based on writes and expected endurance, however, if the drive is healthy and free of errors, it should work well beyond those thresholds.

           

          The Available Reserved Space reports the number of reserved blocks remaining; this is related to over provisioning. This value will decrease if the reserved space is used. In this case, the SSD still shows 100 percent availability of the reserved space.

           

          There are some SMART attributes that are direct indicators of possible issues in the SSD components or functionality. These vary for each SSD series. For the Intel® SSD DC S3500, it would be advised to contact the Support Center if you notice a continuous increase in the following attributes: Re-allocated Sector Count, Program Fail Count, Erase Fail Count, Power Loss Protection Count, End-to-End Error Detection Count, Uncorrectable Error Count, CRC Error Count.

          • 2. Re: Intel DC S3500 SMART Attributes if TBW exceeded
            gschoenberger

            THX for the quick reply and the explanation of the SMART attributes!

            One question is still left:

            * Is there a way/program to monitor the SMART attributes and get notifications on errors/wear out?

             

            To give you an example, I have written a Nagios/Icinga plugin to monitor SMART attributes:

            * https://git.thomas-krenn.com/dev/?p=check_smart_attributes.git;a=summary

            * https://www.thomas-krenn.com/en/wiki/SMART_Attributes_Monitoring_Plugin

            The plugin uses it's own JSON database to interpret the SMART attributes and return WARNING/CRITICAL if

            some values of the attributes reach a certain threshold. I have used your SSD specifications to interpret the SMART

            attributes correctly.

            Obviously this is a custom solution for Nagios/Icinga, is there a standalone daemon/program to monitor Intel SSDs?

             

            All the best, Georg

            • 3. Re: Intel DC S3500 SMART Attributes if TBW exceeded
              jonathan_intel

              We have two software applications that can be used to monitor and manage the Intel® SSD DC S3500. These are available from the Intel® Download Center for the Intel® SSD DC S3500.

               

              - Intel® Solid-State Drive Data Center Tool: is a CLI application available for Windows* and Linux*. It is a drive management tool for Intel® Solid-State Drive Data Center Family of products, it works for both Intel SATA and Intel PCIe* drives.

               

              - Intel® Solid-State Drive Toolbox: is a Windows* application mainly used for mainstream consumer SSD's, and also with the Intel® SSD DC S3500 and S3700 Series.

               

              These tools are not designed to run or notify automatically when a threshold is reached or exceeded. You would have to monitor the SMART attributes of the Solid-State Drives manually, or use 3rd party software to add them into your system routines.

               

              The report will inform if any of the attributes is abnormal and requires attention. When this happens, you should contact the Intel Customer Support in order to determine if further actions are required.