11 Replies Latest reply on Jan 30, 2017 3:32 PM by Intel Corporation

    Serious performance regression with DC P3700 2TB AIC drive

    BreakStuff

      Hi,

       

      we've got a couple of servers, each with one of the DC P3700 2TB AIC drives and we see a serious performance regression after a couple of hours.

       

      Initially, before we run the application tests, we quickly tested the I/O performance using `dd` writing 100 8GB files with a consistent rate of 2GB/s. After that, another `dd´ run reading these 100 files with direct IO and resulting in a consistent rate of 1.1GB/s. File system is XFS - but also tested ext4. (We're aware that this is not a solid test - but good enough to get some indication that the drive's working with a consistent write and read throughput.)

      The actual test is an application, that usually just writes at 100MB/s and no reads, periodically peaking to 600MB/s writes and 150MB/s reads - everything's sequential I/O. This works for a couple of hours. But after that, IO performance degrades to a few MB/s. Even after the application has been stopped and the same `dd` tests show that write throughput's degraded to maybe 200MB/s and reads to 100MB/s.

      We would have expected the drive to eventually degrade a bit, but not to 200/100 MB/s.

       

      ext4 resulted in a generally worse performance than XFS. But the general behaviour (throughput regression) is the same on all machines.

       

      Generally, we can also reproduce kernel panics in combination with isdct. One way to cause a kernel panic is to issue "isdct delete -intelssd"; the command completes but shortly after that, the kernel panic occurs.

       

      Do you have any idea what may cause these behaviours and how to fix these?

        • 1. Re: Serious performance regression with DC P3700 2TB AIC drive
          Intel Corporation
          This message was posted on behalf of Intel Corporation

          Hello BreakStuff,

           

          We understand you a couple of Intel® SSD DC P3700 Series, which are experiencing performance drops after being used for a short period of time.

           

          Before we can provide reasons and suggestions, we would like to have some more information:

           

          1. What OS are you using with these drives?
          2. These are NVMe* SSDs, which driver are you using?

          3. Which firmware version are the drives currently using?

          4. What are the make and models of your servers?
          5. What program are you using to perform the benchmarking tests?
          6. Would it be possible for you to attach the SMART details from at least one of these drives? (using the advanced editor at the top right when replying will allow you to attach files to your post).

           

          We look forward to hearing back from you.

           

          Best regards,
          Carlos A.

          • 2. Re: Serious performance regression with DC P3700 2TB AIC drive
            BreakStuff

            Hi,

             

            1: SLES 12 SP1 (3.12.59-60.45)

            2: standard kernel driver (nvme.ko)

            3: 8DV1LP11 and 8DV1LP10 (with 8DV1LP11 interrupt coalescing is disabled, with 8DV1LP10 it's not)

            4: Lenovo x3550 M5

            5: `dd` and `hdparm`. We also ran some fio and bonnie++ tests before - but these looked ok (they didn't ran for many hours though).

            6. (see attachment smart-details.txt)

             

            The benchmarking itself was a Cassandra stress test with mixed read/write workload. But all the reads never actually hit the disk since the files were cached in memory.

            The (constant) base workload against the SSD of ~100MB/s is just sequentially writing the commit log. The irregular additional workload on top of that is compaction (i.e. sequential read + write).

            After approx. 5 hours, the disk was not able to provide a higher throughput of 200MB/s for writes and 100MB/s for reads - this was measured with `dd`.

            • 3. Re: Serious performance regression with DC P3700 2TB AIC drive
              Intel Corporation
              This message was posted on behalf of Intel Corporation

              Hello BreakStuff,

              In this case we can start by recommending for you to install and use the Intel® SSD DC NVMe* Drivers instead of the built-in version provided with your OS.

              You will need to install driver version 1.7.0.1002, as the latest release did not include an updated driver for Linux*.

              - Intel® SSD Data Center Family for NVMe Drivers.
              - NVMe* Driver Installation Guide for Linux*.

              Aside from that, based on your firmware version, we can tell that your SSDs are Lenovo* OEM drives. Due to this, our firmware versions will not apply, you would need to check with your OEM to find out if you're using the latest firmware, or if there are any new releases.

              When it comes to SSD benchmarking, our recommended tool for Linux* is FIO (which I see that you've used). We also recommend using FIO Visualizer if you'd like a GUI.

              - How to benchmark SSDs with FIO Visualizer.

              From your SMART details, we were unable to find any red flags. It seems that your drive's overall health is ok, as expected.

              Note: Any links provided for third party tools or sites are offered for your convenience and should not be viewed as an endorsement by Intel® of the content, products, or services offered there. We do not offer support for any third party tool mentioned here.

              Please let us know if this helps.

              Best regards,
              Carlos A.

              • 4. Re: Serious performance regression with DC P3700 2TB AIC drive
                BreakStuff

                Hi Carlos,

                 

                thanks for your quick reply!

                I'll ask the ops guys to install the Intel NVMe driver.

                We already looked at the SMART details and nothing jumped out to us.

                Let's see how it works with the NVMe driver.

                 

                BTW: Is it sufficient to over provision the drive by just using less capacity (e.g just use for example 1.5T instead of 2T) or is it better to adjust the Max-LBA setting?

                 

                Do you have any idea why interrupt coalescing is enabled in the one firmware revision and not in the other?

                 

                Generally speaking, can I expect the SSD to achieve the specified performance numbers in steady state?

                • 5. Re: Serious performance regression with DC P3700 2TB AIC drive
                  Intel Corporation
                  This message was posted on behalf of Intel Corporation

                  BreakStuff,

                  You're correct, we do expect our drives to perform as advertised. Although in some real life applications, it's normal for the drives to show results at around 80% of the advertised numbers.

                  The reason for this is that when we test our SSDs in the lab, this is done on clean drives, under ideal conditions.

                  In the real world the SSD may already have data, or the test may not be able to access the full bandwidth due to background tasks or OS services also using the drive while you measure it's performance. This and many other factors can affect how reliable your benchmark results turn out.

                  As far as over provisioning goes, I wouldn't worry too much about this. Our data center drives already come with 20% over provisioned area to be automatically used as NAND blocks fail. This can be monitored under the "E9" Smart Attribute (The "Media Wear-out Indicator"). Due to this, manually under using the drive or making adjustments to the Host Protected Area are not usually necessary.

                  Unfortunately I don't have much of an answer in regards to why one firmware has Interrupt Coalescing enabled and the other one doesn't. OEM firmware versions are modified by them (Lenovo*, in your case) to address specific issues and often have a completely different release cadences than the ones we publish. They may add or disable features depending on what they believe works best with their systems.

                  Best regards,
                  Carlos A.

                  • 6. Re: Serious performance regression with DC P3700 2TB AIC drive
                    BreakStuff

                    Hi Carlos,

                     

                    I cannot find any special Intel NVMe driver in the downloads. The ZIP files only contain Windows or VMWare drivers and the PDF refers to the standard Linux kernel sources, which is probably exactly the driver we currently use.

                    • 7. Re: Serious performance regression with DC P3700 2TB AIC drive
                      Intel Corporation
                      This message was posted on behalf of Intel Corporation

                      Hello BreakStuff,

                      It's normal for SSD performance to decrease at least to some extend as the drive becomes full, but this should only become noticeable once the drive is reaching full capacity. 

                      If you'd like, you may review the evaluation guide for the P3700 that we created for Fujitsu*:

                      http://manuals.ts.fujitsu.com/file/12176/fujitsu_intel-ssd-dc-pcie-eg-en.pdf

                      NOTE: Any links provided for third party sites are offered for your convenience and should not be viewed as an endorsement by Intel® of the content, products, or services offered there.

                      Additionally, you may like to review the following document. There's a section on how to bypass the buffer (aka using Direct IO) in Linux*, which may help with your performance issues:

                      Intel® Solid-State Drives in Server Storage Applications. Section 3.2, page 16.

                      We hope this information helps.

                      Best regards,
                      Carlos A.

                      • 8. Re: Serious performance regression with DC P3700 2TB AIC drive
                        BreakStuff

                        Yea - that documents definitely helps. Thanks for the link!

                         

                        I still got no idea what happens in that stack though. At least, the kernel panic after an `isdct delete ...` does not happen with recent Linux kernels but with 3.12.59 in SLES's 12.1.

                        Another thing (which we could not cross-check with a newer kernel until now) is, that writes (and fstrim) seem to be prioritized over reads - e.g. for two `dd`s, 1 writing and one reading, the writing `dd` can write at full speed but the reading one can just read with a few MB/s (both are not maxing out IOPS though).

                         

                        Thanks for the useful help so far!

                        I might keep you busy

                        • 9. Re: Serious performance regression with DC P3700 2TB AIC drive
                          Intel Corporation
                          This message was posted on behalf of Intel Corporation

                          Hello BreakStuff,

                          It's very difficult for copy commands to be a fair benchmarking tool. Generally speaking, solid state drives will have much faster read speeds as compared to their write speeds. If a file is copied or moved within the SSD, this write operation will be optimized. If a file is moved from one disk to another, then you will only be measuring the slowest of the two drives. 

                          However, I'm not familiar enough with how the 'dd' command is actually excecuted, so I cannot say if it's a good benchmarking reference or not.

                          Speaking of fstrim, keep in mind that our data center drives perform garbage collection automatically at a firmware level. If additional trimming is necessary, we recommend queueing the command so it isn't excecuted at a time which may affect your drive's performance.

                          As for the kernel panics, I'm not sure what exactly could be the cause. We've been looking into this, but were unable to reproduce the issue. It might be a good idea to contact SUSE* support or post to their forums on this subject.

                          Best regards,
                          Carlos A.

                          • 10. Re: Serious performance regression with DC P3700 2TB AIC drive
                            BreakStuff

                            Hi Carlos,

                             

                            we just used

                            for i in `seq 1 100`; do time sh -c "dd if=/dev/zero of=/nvme-disk/scratch/dd-${i} bs=32k count=262144" ; done

                            for i in `seq 1 100`; do time sh -c "dd of=/dev/null if=/nvme-disk/scratch/dd-${i} bs=32k iflag=direct count=262144" ; done

                            as a "quick" test to test basic functionality and get an idea of sequential write & read throughput (despite the non-optimal block-sizes).

                             

                            The kernel panic (with isdct delete) is reproducible on SLES 12 SP1. That Linux kernel version gets in trouble when the drive's reset and doesn't find a partition table afterwards. This does not happen on recent kernel versions (at least not with 4.9, not tested w/ 4.4 yet).

                             

                            From our experience it looks like the Linux NVMe driver/firmware prefers writes over trims/GC and trims/GC over reads. So, if you execute both for-loops above concurrently, write throughput (nearly) stays at 2.0GB/s but reads throughput is just a few MB/s. Similar seems to be true when executing an `fstrim` and the reading dd-loop. Is this observation correct? Is there a way to get a fair priorization of writes and reads (i.e. let the driver/firmware not prefer writes over reads)?

                            The current performance is fine for us as we won't push the drive to its limits in production - but if there's something to optimize, it would be great.

                             

                            Thanks for the tip regarding fstrim (i.e. not necessary to schedule that "manually")!

                            • 11. Re: Serious performance regression with DC P3700 2TB AIC drive
                              Intel Corporation
                              This message was posted on behalf of Intel Corporation

                              Hello Break Stuff,

                              These commands are ok as a quick test, or even a well-being check. Just don't trust these results too much as far as performance benchmarking, for this we recommend a dedicated benchmarking tool such as FIO*.

                              Unfortunately this is all a bit outside of our support scope, since your drives don't use our firmware and the Linux* NVMe* driver is not made by us either.

                              Our recommendation in this case would be to contact your Linux* Support Community, or your computer manufacturer for more details regarding the drive's firmware.

                              As far I'm aware, there's no command to make your drive lean towards reads or writes, but then again this not our normal area of support.

                              Best regards,
                              Carlos A.