1 2 Previous Next 15 Replies Latest reply: May 27, 2011 11:24 AM by NandFlashGuy RSS

SSD NAND Endurance

redux Community Member
Currently Being Moderated

Can this be true?

 

This site experimented to see how much data could be written to an X25-V before the NAND expired. They wrote to the device 24/7. (Seems they paused once every 12 hour to run TRIM from the toolbox).

 

Translation makes it hard to understand exactly how they ran the experiment, but they state:

 

A class (Sequential?)

  • 10~100KB 80% 10 ~ 100KB 80% (System thumbnails and the like)
  • 100~500KB 10%100 ~ 500KB accounted for 10% (JPG images and the like)
  • 1~5MB 5% 5% 1 ~ 5MB (big picture, MP3 and the like)
  • 5~45MB 5% 5% 5 ~ 45MB (video clips and the like)

Where B is random write categories: (Random?)

  • 1~100 100% 1 to 100 bytes of 100% (System log and the like)

In total there were able to achieve 0.73PB in 6,185 hours! That is a phenomenal amount of writes, which appears to be way over spec of 20GB of host writes per day for a min of 5 years.

 

Here is another one 0.86PB in 7,224 hours!

 

Does that come down to the work load or are Intel specs extremely conservative?

  • 1. Re: SSD NAND Endurance
    DuckieHo Community Member
    Currently Being Moderated

    From what I heard, the rated P/E cycles are conservative.....

     

    Google Translate does a pretty choppy job translating.  Let me ask Intel if they are willing to donate one for testing?

  • 2. Re: SSD NAND Endurance
    SSDaddict Community Member
    Currently Being Moderated

    redux wrote:

     

    Can this be true?

     

    This site experimented to see how much data could be written to an X25-V before the NAND expired. They wrote to the device 24/7. (Seems they paused once every 12 hour to run TRIM from the toolbox).

     

    Translation makes it hard to understand exactly how they ran the experiment, but they state:

    ...

    In total there were able to achieve 0.73PB in 6,185 hours! That is a phenomenal amount of writes, which appears to be way over spec of 20GB of host writes per day for a min of 5 years.

     

    Here is another one 0.86PB in 7,224 hours!

     

    Does that come down to the work load or are Intel specs extremely conservative?

    If anyone wonders, it's the same SSD, same serial#.

     

    I'm about to start such a test using a rebranded X25-V, pending a fellow members decision at XS to tag alomg I'll do sequential writing only, if not I'll make it a mixed test.

     

    Although I have high hopes reproducing the result we should all acknowledge that SSDs aren't created equal, a bit of the same as with OCing, some perform great and some not.

  • 3. Re: SSD NAND Endurance
    redux Community Member
    Currently Being Moderated

    Great!

     

    A mix of random and sequential would be more representative of real life. 4K, 8K, 32K & 64K & 1MB are the most common writes sizes I have observed, but I'm sure Intel have a wealth of information on typical usage patterns that you might find more represenative.

     

    Would you be able to post weekly updates on how the experiment progresses? Maybe run two tests in parallel with different SSD's from different batches?

  • 4. Re: SSD NAND Endurance
    koitsu Community Member
    Currently Being Moderated

    The first link (Mandarin) indicates that after their test, they had a total amount of 0x5D (93) reallocated blocks.  The second link (Japanese) indicates that after their test, they had a total amount of 0xAA (170) reallocated blocks.

     

    I wish I could get some clarification from actual Intel engineers if SMART attribute 0x05 on the X25-V, X25-M, 320-series, and 510-series represented actual 512-byte LBA reallocations/remappings (like on a mechanical HDD), or if it represents NAND flash pages (I'd need to know page size; usually 256KByte or 512KByte) reallocated/remapped.  I'd really love if one of the Intel engineers chimed in here with something concrete.  It would benefit the smartmontools project as well.

     

    Regardless, this proves that wear levelling does in fact work quite well -- but what also matters is how much data they're writing to the drive.  Wear levelling becomes less and less effective the less free (unused or erased -- yes there is a difference) NAND flash pages there are.  E.g. a drive that's got a filesystem that's 95% full is going to perform horribly given the lack of free pages to apply wear levelling to.

     

    I would have been more impressed had these guys not used TRIM at all.  There are many present-day OSes which do not implement TRIM (such as major server operating system players: Solaris, OpenSolaris, and FreeBSD) on some filesystem layers (such as ZFS), and in other cases (UFS on FreeBSD) every single LBA is TRIM'd during a delete operation (which is inefficient, we know).  Therefore, I'd be truly impressed if these numbers were from systems not using TRIM at all.

     

    Still a neat couple of sites, but like I said.....  :-)

  • 5. Re: SSD NAND Endurance
    redux Community Member
    Currently Being Moderated

    XS is running an experiment now. X25-V 40GB vs 320 Series 40GB (34nm vs 25nm)

     

    http://www.xtremesystems.org/forums/showthread.php?t=271063

  • 6. Re: SSD NAND Endurance
    koitsu Community Member
    Currently Being Moderated

    Re: XS: Something is up.  I'm not sure I can trust his results based on that.  Here's why:

     

    SMART attributes 0xe2 (226; Workload Media Wear Indicator), 0xe3 (227; Workload Host Reads Percentage), and 0xe4 (228; Workload Minutes) are all a raw value of 0xffff (65535).  This doesn't mean anything to most people, but it does mean something to those of us familiar with the X25 series of drives and the 320 series: there's a special SMART test vendor code (0x40) you can submit to the X25 and 320 series which will clear some (but not all) SMART stats.  Specifically, it resets the 3 above attributes to value 0xffff.  After a few minutes of the drive having these attributes reset, they start incrementing/behaving normally again.  The X25-V behaves identically in this regard to the X25-M, the X25-E, and the 320 series.

     

    In every screen shot that fellow has posted so far, those attributes remain at 0xffff.

     

    I don't know if something seriously wonky is going on because he's using a Kingston drive (firmware may differ?  Unsure; I do see he's running the 02HD firmware rather than the 02M3 firmware), if he's intentionally clearing them every time, or if he's editing the screenshots.  I really don't know.  If someone else has a Kingston model with the exact same firmware and can confirm the drive always keeps those 3 attributes at value 0xffff, that would be enough for me to believe I'm wrong.

     

    It would really help if he wouldn't use silly SMART monitoring software like that CrystalDiskInfo crap and instead used smartmontools.  I guess if he's running Vista or Windows 7 he doesn't have much of a choice though.

     

    EDIT: Ah, I see he's keeping 12GB of free space (presumably at the end of the SSD) for wear levelling.  Okay, that's at least reasonable, and I'm less inclined to refuse acknowledgement of his tests.  I'd still like some validation of the Kingston drive behaviour with the above 3 attributes though.  Here's an example from a 320 series drive that's been used normally (nothing stressful):

     

     

    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
    226 Workld_Media_Wear_Indic 0x0032   100   100   000    Old_age   Always       -       5 227 Workld_Host_Reads_Perc  0x0032   100   100   000    Old_age   Always       -       70 228 Workload_Minutes        0x0032   100   100   000    Old_age   Always       -       24638
  • 7. Re: SSD NAND Endurance
    redux Community Member
    Currently Being Moderated

    The experiment is being run by two different people using the exact same test file. Win 7 is being used so that the drive can benefit from TRIM.

     

    I very much doubt Anvil is clearing SMART info and for sure he will not be editing screenshots to skew results.  Both XS members running this experiment are above reproach in this regard.

     

    I agree it would have been better to check the workload for an hour using SMART tools to then calculate the projected wear rate and then periodically monitor progress to see how accurate the SMART info was, but I guess the objective of the test was to just find out how long the NAND lasted.

     

    It would be interesting to know how SMART wear out values are calculated. On the actual condition of the NAND or a predetermined P/E value. If its the latter SMART is not that helpful in context of seeing how long the NAND will last.

     

    EDIT:

     

    It would also be interesting to know how PE values are calculated. Is it the minimum? Average?

  • 8. Re: SSD NAND Endurance
    SSDaddict Community Member
    Currently Being Moderated

    koitsu wrote:

     

     

    In every screen shot that fellow has posted so far, those attributes remain at 0xffff.

     

    I don't know if something seriously wonky is going on because he's using a Kingston drive (firmware may differ?  Unsure; I do see he's running the 02HD firmware rather than the 02M3 firmware), if he's intentionally clearing them every time, or if he's editing the screenshots.  I really don't know.  If someone else has a Kingston model with the exact same firmware and can confirm the drive always keeps those 3 attributes at value 0xffff, that would be enough for me to believe I'm wrong.

     

    It would really help if he wouldn't use silly SMART monitoring software like that CrystalDiskInfo crap and instead used smartmontools.  I guess if he's running Vista or Windows 7 he doesn't have much of a choice though.

     

     

    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
    226 Workld_Media_Wear_Indic 0x0032   100   100   000    Old_age   Always       -       5 227 Workld_Host_Reads_Perc  0x0032   100   100   000    Old_age   Always       -       70 228 Workload_Minutes        0x0032   100   100   000    Old_age   Always       -       24638

     

    Hi, and I'm Anvil at the XS forum

     

    I noticed the #FFFF values but didn't really care, the important thing here is host writes and that has been reported correctly.

     

    The Kingston drive did not support TRIM initially, well lets not get into that on the Intel forum, ask DuckieHo if you need more info

     

    I downloaded smartmontools from sourceforge and ran it with the -a SDA parameters and now it has actually cleared/reset those attributes so here we go.

     

    SMART Attributes Data Structure revision number: 5

    Vendor Specific SMART Attributes with Thresholds:
    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
      3 Spin_Up_Time            0x0020   100   100   000    Old_age   Offline      -       0
      4 Start_Stop_Count        0x0030   100   100   000    Old_age   Offline      -       0
      5 Reallocated_Sector_Ct   0x0032   100   100   000    Old_age   Always       -       2
      9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       174
    12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       93
    192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       33
    225 Load_Cycle_Count        0x0030   200   200   000    Old_age   Offline      -       148963
    226 Load-in_Time            0x0032   100   100   000    Old_age   Always       -       9856
    227 Torq-amp_Count          0x0032   100   100   000    Old_age   Always       -       0
    228 Power-off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       1010375343
    232 Available_Reservd_Space 0x0033   099   099   010    Pre-fail  Always       -       0
    233 Media_Wearout_Indicator 0x0032   096   096   000    Old_age   Always       -       0
    184 End-to-End_Error        0x0033   100   100   099    Pre-fail  Always       -       0
    SMART Error Log Version: 1
    No Errors Logged

     

    I'll check and post those attributes later at XS.

  • 9. Re: SSD NAND Endurance
    koitsu Community Member
    Currently Being Moderated

    1) smartmontools *absolutely* did not change anything or reset anything.  The SMART monitoring utility you've been using, simply put, is buggy or broken.  OSes can sometimes cache SMART attribute data -- yes, you read that correctly -- and smartmontools deals with that situation properly.  Guaranteed CrystalDiskInfo is broken in this regard, and as such please don't use it for SMART attribute monitoring.

     

    2) I believe you meant to run "smartctl -a /dev/sda" (smartctl lets you get away with removal of the /dev/ part, and I'm a little surprised it let you use capitals for the device name).  This command, again, does not do any modification operations to a drive.

     

    3)  Please make sure you're using the latest development build of  smartmontools, not the stable build.  There are attribute decoding improvements for Intel SSDs -- specifically the Intel models -- in the development builds:

     

    http://smartmontools-win32.dyndns.org/smartmontools/

     

    Furthermore -- and this is important for the  320-series drive you plan on testing -- please see this bug report (opened by me) and use the drivedb-add.h file the developer provided (you can place it in the same directory as your smartctl.exe binary).

     

    http://sourceforge.net/apps/trac/smartmontools/ticket/168

     

    You'll know if it's working if you see the string "Device is:        In smartctl database [for details use: -P show]" in the output near the top.

     

    Good luck.

  • 10. Re: SSD NAND Endurance
    NandFlashGuy Community Member
    Currently Being Moderated

    There are a couple of points to keep in mind in these analyses:

     

    1.  Obviously these experiments are emphasizing raw endurance.  Part of the endurance spec limitation relates to the data retention, which isn't being measured in these experiments.

     

    2.  Believe it or not, the measurement method is likely undercounting the raw endurance of the Nand.  In these experiments, the users are basically cycling the SSD (and Nand) as fast as possible.  However, the faster the Nand cycles, the less time it has to "anneal".

     

    Here is a link to a paper discussing cycling and distrubted cycling:

    www.usenix.org/event/hotstorage10/tech/full_papers/Mohan.pdf

  • 11. Re: SSD NAND Endurance
    redux Community Member
    Currently Being Moderated

    Thank you for the link, which is very interesting.

     

    There are of course a wide range of variables to how wear can be induced.  Our experiment has tried as far as possible to replicate a realistic work load, with a reasonable amount of free space and a reasonable amount of static data.

     

    I appreciate that accelerated testing introduces additional wear that would not otherwise occur.  Looking at the white paper that impact might be significant. That said the drives are performing admirably, so hopefully it helps  dispel concerns about SSD durability.

     

    Based on the 20GB a day write "allowance" over 5 years these drives are experiencing ~150 days wear in one day!

     

    Already the 320 has exceeded 35TB and the wear out indicator is still at 82%.  The X25-V is at ~30TB and the wear out indictor is at 83%. So far it would seem that 25nm NAND (320) is holding up very well compared to the X25-V (34nm), which would suggest that it's not just NAND PE cycles are being accessed.

     

    If I may could I please ask a few questions?

     

    • Does static data have to be refreshed periodically? (For data retention not wear levelling)
    • If so how often?
    • Does the 320 enable static wear levelling?
    • On what basis are the PE cycles specified? (i.e. lowest or average)

     

    Thank you.

  • 12. Re: SSD NAND Endurance
    NandFlashGuy Community Member
    Currently Being Moderated
    Already the 320 has exceeded 35TB and the wear out indicator is still at 82%.  The X25-V is at ~30TB and the wear out indictor is at 83%. So far it would seem that 25nm NAND (320) is holding up very well compared to the X25-V (34nm), which would suggest that it's not just NAND PE cycles are being accessed.

     

    Recall that the # of cycles that the Nand will see is a function of the host cycles multiplied by the "Write Amplication" factor.  Think of this as the term as the efficiency of the drive in defragging/garbage collecting.  Assuming that both drives have the exact same workload and that the Flash has the same endurance, then the difference in the Wear Out Indicator reflects the difference in Write Amplication.

     

    • Does static data have to be refreshed periodically? (For data retention not wear levelling)
    • If so how often?

     

    That would depend on the data retention capabilities of the component Nand chips.  If the component data retention is sufficient, then it would seem unnecessary to periodically refresh static data.

     

    However, keep in mind that wear leveling may end up refreshing the static data automatically.  If the "hot" data gets cycled very fast, then the SSD may move the static data to a high cycle count block on the Nand and use the low cycle block previously holding the static data for additional host writes.

     

    For more details on SSD endurance qualification, you can refer to the recently published JEDEC specs:

    http://www.jedec.org/standards-documents/results/jesd218%20taxonomy%3A2506

     

    On what basis are the PE cycles specified? (i.e. lowest or average)

     

    Lowest.  The details of the qualification standards for the Nand component can be found in the JEDEC JESD47 spec:

    http://www.jedec.org/sites/default/files/docs/JESD47H-01.pdf

  • 13. Re: SSD NAND Endurance
    redux Community Member
    Currently Being Moderated

    Thank you for your kind reply. Much appreciated.

  • 14. Re: SSD NAND Endurance
    mralpha Community Member
    Currently Being Moderated

    Do you not need to refres static data every once in a while because of read disturb?

1 2 Previous Next

More Like This

  • Retrieving data ...

Legend

  • Correct Answers - 4 points
  • Helpful Answers - 2 points