Skip navigation

IT Peer Network

12 Posts authored by: ChristianBlack

S3500Small.pngIndex of Chris's Blogs!


Here's an index of blogs, will update these as I add and change content!

Don't forget to check out my buddy Frank's blogs, too. More great content!

Thanks for reading!


- Chris


Christian Black is a Datacenter Solutions Architect covering the HPC and Big Data space within Intel’s Non-Volatile Memory Solutions Group. He comes from a 23 year career in Enterprise IT.

Follow Chris on Twitter at @RekhunSSDs.


"Turn and Face the Strange... IT Changes"

Lab Results: Explains the IO efficiency of the new NVMe protocol and Intel's new PCIe Family of Data Center SSDs.

Turn and face the strange... IT Changes :-)


"IO/CPU Efficiency in Non-Volatile Memory Express (NVMe)"

Lab Results: Explains the IO efficiency of the new NVMe protocol and Intel's new PCIe Family of Data Center SSDs.

Intel SSD P3700 Series - NVMe Efficiency


"Throw out your Hard Disks!"

Updated Blog: Rewrite of one of my 1st blogs with new data, things have changed quite a bit in 6 years!

SSD: Throw out your hard disks!


"20 Questions on SSD: Temp, Tier & Cache"

Years worth of field POCs: Explains the efficient use of SSDs concentrated ability to generate IOPS.

20 Questions on SSD #5: Temp, Tier, and Cache + Intel SSD


"20 Questions on SSD: Consistent Performance"

Feature Explanation: Explains why consistent performance in a data center SSD matters to the user.

20 Questions on SSD #4: How consistent is the performance of your SSD?


"20 Questions on SSD: Power Loss Protection"

Feature Explanation: Explains why power loss protection in a data center SSD matters to the user.

20 Questions on SSD: #3 - Is your SSD protected from unplanned power loss?


"20 Questions on SSD: Which SSD is Right for Your Workload"

Feature Explanation: Explains endurance and how overprovisioning and workload affect SSDs endurance and why this matters to the user.

20 Questions on SSD: Which SSD is right for your workload?


"20 Questions on SSD: Is your SSD Qualified for Production"

Feature Explanation: Explains why OEM qualification in a data center SSD matters to the user.

20 Questions on SSD:  #1 - Is your SSD qualified for production?


"Why choose a Data Center over a Client SSD for your Enterprise Workload"

Feature Explanation: Explains the difference between endurance in data center class and client class SSDs and why this matters to the user.

Why data center solid state drives (SSDs)!

David Bowie’s 1971 song ‘Changes’ includes the phrase, “Turn and face the strange… Ch-ch-changes”… one of my wife’s favorite recollections of my step-daughter at age 4 or 5, belting out Bowie the way only a young heart can. Obviously not one of my normal blogs... I wanted to take some time to talk about changes in Enterprise IT, places we’ve come from, and the good things about places where we’re going as an industry.


I’ve been thinking about changes in IT for some time, how life in this career began 20+ years ago taking care of PCs, then x86 servers, then datacenters, and finally IT research and path finding and my position here in an Intel product group shepherding SSD datacenter use cases. The industry has transformed a once standalone computer that, “couldn’t ever need more than 640k of RAM” into an architecture that runs everything from supercomputers, to ATMs, to cars, to the services we get from the cloud. Cloud services are what I’d like to focus on.


A few years back I wrote a blog called ‘Cloud Compute and the Psychology of Mine’ where I asserted that as consumers of datacenter space, we were going to have to get use to sharing in a world that was increasingly moving toward virtualization. In short, over the last 5 years we got use to sharing. I’m going to assert something a bit more disruptive in this blog. That is, most of the things we consider classic IT will move to cloud services in the next 5-10 years. All of the legacy services… email, IM, telephony, conferencing, video, collaboration, LAN provisioning, and even PC support will likely move to an outsourced cloud service or service provider.


Hold on one second… before you run screaming like Chicken Little because, “of all the people in IT are going to lose their jobs.” This transformation is a good thing, in fact, a really good thing! Why pray tell? Back to Bowie, “Turn and face the strange… Ch-ch-Changes… Just gonna have to be a different man.” For the last 10 years most enterprise IT shops have faced downward pressure on both budget and personnel. At the same time, our day-to-day work has been viewed more and more as a ‘Utility’ than anything else… just a necessary cost center. Anyone familiar with the local power company knows that unless the lights are out or the bill is due, nobody even thinks about them. It’s been this way in IT for a while now and cloud services are a way out of the ‘utility zone’!


Entertain this thought for a moment, what if IT unloaded all of the services they could to a ‘mature and secure’ service provider? There would certainly be a few jobs lost, but there would also be openings and opportunity at the service providers. Imagine the value a veteran Microsoft Exchange Engineer could bring to a company whose entire reason for being was providing email to Enterprise IT. This same ‘silver lining’ and possibility thinking applies to most legacy IT services that could transition to a cloud provider.


Beyond that, imagine the services that Enterprise IT would keep in-house after a transition… Absolutely business critical systems, secure enclaves with sensitive IP, high-value unique services that no one can provide except in-house IT. All the exciting stuff and suddenly, IT is a partner providing business value not just a cost center or a utility! One of the things I personally found strange in IT is that I always was so abstracted from the business, I felt disconnected from it. Here’s a prospect for IT to be viscerally connected to a company’s output once again.


Although apprehensive about change, this is one of the biggest opportunities for transformation in IT to arrive in decades. “Turn and face the strange…” embrace the next changes in IT, elevate your value, be different, and better for both you and your business!


- Chris


Christian Black is a Datacenter Solutions Architect covering the HPC and Big Data space within Intel’s Non-Volatile Memory Solutions Group. He comes from a 23 year career in Enterprise IT.

Follow Chris on Twitter at @RekhunSSDs.


Read more of Chris's SSD blogs

Intel SSD P3700 Series - NVMe Efficiency


SSD_PCIe_3700_Addin_Card_SM.pngIntel recently announced our SSD Data Center Family for PCIe products. You can find a link to the main page for the new PCIe drives on here.  Needless to say, things in the SSD world are exciting and one of the best things about these new SSDs is the NVMe protocol (non-volatile memory express). You can learn more about NVMe at For this blog, we’ll focus specifically on the number of CPU cycles it takes to generate an IO and the efficiency of the NVMe protocol. The NVMe website has some great slide decks and the graphic below is from one of Intel’s principal engineer’s presentations on NVMe.


In a nutshell, this graphic (below) outlines the latency in microseconds spent at each stage of an IO request, divided into media (drive latency), controller (HBA), and software (protocol). The great part about NVMe is that it’s extremely thin at about 6 microseconds. In addition, NVMe removes controller latency completely and uses less CPU to drive IO operations. Less CPU cycles per IO opens up a realm of new possibilities.


We decided we should test this out in the lab so we grabbed a dual-socket Intel Xeon E5-2690v2 Intel server, a 12Gb and 6Gb SAS controller, and 64GB of RAM. We paired these SAS controllers and the onboard SATA controller with a couple of 400GB SAS and SATA SSDs, then added a brand new 400GB Intel SSD DC P3700 Series drive to the mix. Using vanilla CentOS 6.5 plus current updates, we then then ran some FIO workloads to evaluate how each drive handled the same workload in identical hardware configurations. FIO is available in most Linux distros and our configuration consisted of eight workers with a queue depth of four, a block size of 4k, and random read pattern across the entire span of the SSDs under test. These tests were run on un-partitioned and unformatted devices specifically to look at the CPU utilization under test without a file system in play. We then prepped the drives by overwriting them four times with a sequential 4k workload and random data. This makes sure that we’ve actually overwritten the SSD a couple of times including the ‘spare’ area or ‘overprovisioning’ you can read more about overprovisioning in my blog on endurance.  In addition to all this preparation, to prevent frequency changes in the CPU we turned off ‘Turbo Boost’, ‘Power Management’, and ‘Hyperthreading’ options in the BIOS of the server.


Here’s the FIO syntax we used:


fio --ioengine=libaio --description=100Read100Random --iodepth=4 --rw=randread --blocksize=4096 --size=100% --runtime=600 --time_based --numjobs=1  --norandommap --name=/dev/nvme0n1 --name=/dev/nvme0n1 --name=/dev/nvme0n1 --name=/dev/nvme0n1 --name=/dev/nvme0n1 --name=/dev/nvme0n1 --name=/dev/nvme0n1 --name=/dev/nvme0n1 2>&1 | tee -a NVMeONpciE.log


Those familiar with IO testing will immediately notice that this workload is not strenuous, and we designed it that way on purpose. We’re not pushing queue depth and only 8 worker threads on a machine with 20 physical cores certainly won’t max out the box. Our objective was to put a relatively mild workload on the server so we could isolate and really look at IO efficiency. Below are our results.


When we looked at relative efficiency (below), using the 12Gb SAS controller and 12Gb SAS SSD as the baseline of 1.0 (largest CPU/IOP), we observed that other SAS and SATA combinations are roughly equivalent but the Intel PCIe/NVMe drive was 2.3 times more efficient than the other connections/protocols. This means the Intel SSD DC P3700 Series we used in our testing scenario generates 2.3 times more IOPS per percent CPU utilization!


During testing we also looked at latency (below). Using the 6Gb SAS controller and 6Gb SAS SSD as the baseline of 1.0 (highest average latency), we observed that other SAS and SATA combinations vary up to 20% lower (better) than the baseline. We also notice that the Intel SSD DC P3700 Series drive was almost half the average latency of the 6Gb SAS SSD at just over 300µs (microseconds!) with our random workload that spanned the entire logical drive!


So why does this matter? For the IT professional, this really opens some doors especially in the caching and IO tiering space. You could easily implement one of the Temp, Tier, or Cache methodologies I talk about in this blog without concern about overburdening your CPU.  With NVMe and PCIe, you get high bandwidth, extremely low IO latencies, and much more efficient CPU utilization per IO. In a nutshell, this means we can keep doing what Enterprise IT does best… do even more, with less and less. So… where could one of these Intel SSDs accelerate an application and benefit your organization?


- Chris


Christian Black is a Datacenter Solutions Architect covering the HPC and Big Data space within Intel’s Non-Volatile Memory Solutions Group. He comes from a 23 year career in Enterprise IT.

Follow Chris on Twitter at @RekhunSSDs.


Read more of Chris's SSD blogs

Throw out your hard disks: Revisited!


S3500Small.pngMy original blog from October of 2008! has been getting a bunch of traffic lately and is linked below but frankly... things have changed quite a bit in 6 years, so here's an update.


If I recall correctly the Intel X-25e 32GB SSD was about $600 from any of my favorite Internet retailers at almost $20/GB... WOW! I did a quick Internet search this morning (April '14) and came up with $1.00/GB for the Intel SSD DC S3500 Series standard endurance drive and $2.25 for the Intel SSD DC S3700 Series high endurance drive. So today I can get a 480GB Intel SSD DC S3500 Series for $100 less than that X-25e cost me 6 years ago... That's a 15x larger drive, at an 18x smaller cost per GB... not to mention the vast difference in performance! In addition, the SSDs are now certified by most of the major OEMs, software tools like the Intel SSD Toolbox have matured over the years, we now have 6Gbps SATA3 speeds instead of piddly 1.5Gbps SATA and the consistent performance of Intel's 4th generation SSD. Check out this blog at the Tech Report on Intel SSD reliability.


If you read my blog from a few weeks ago; Temp, Tier, and Cache use cases all make sense for SSD. But at $1.00/GB... the core Engineer in me starts to think about possibilities. I barely touched on this in that prior blog but... what if I replaced the RAID 1 set of boot/swap disks in my typical server with SSDs. I certainly don't need 300/600GB for an OS either Windows or Linux, so let's replace those SAS drives with say 240GB Intel SSD DC S3500 Series. Run the numbers and you'll see it doesn't significantly change the overall cost of your average server. Even with a large MTBF (mean time between failure) and a low AFR (annualized failure rate) which you can check in the spec sheets, I'm still going to put these in a RAID set. Then I'll split this up and give my OS 100GB and 'something else' the other ~140GB.


So what am I going to do with that other 140GB? I can read the SMART E9 (media wear indicator) stat through most of my OEMs RAID controllers so I can monitor wear on the SSD, I bought the drive from my OEM so I know it's guaranteed and covered by warranty, what are the possibilities? If I'm in a Windows world, I'm engineering to swap to page file as little as possible, maybe I could afford a little swapping now. In the Linux world, maybe I could increase my 'swapiness' a bit. Maybe I want to move some TempDB files to that 140GB, how about some specific web content, perhaps I could point some of MS SharePoint's caching mechanisms at that 140GB of fast storage. If I'm running VMware I now have a local disk to use for VFRC (VMware Flash Read Cache) or in OpenStack maybe I use this space for my local base images.


There are tons of possibilities, time to let that core Engineer out to play! Now... with 4TB SATA drives running 4 cents/GB or less I'm probably not going to throw out all  my hard disks. But, I think there's compelling reasons to start replacing those boot/swap disks with boot/swap SSDs to explore some possibilities. Solid-state storage has crossed over into mainstream, time to start becoming as familiar with these devices as we are with their spinning predecessors.


- Chris


Original Post:  The specified item was not found.


Christian Black is a Datacenter Solutions Architect covering the HPC and Big Data space within Intel’s Non-Volatile Memory Solutions Group. He comes from a 23 year career in Enterprise IT.

Follow Chris on Twitter at @RekhunSSDs.


#ITCenter #SSD

20 Questions on SSD: Things you’ll want to ask


Question #5: How can I use Temp, Tier, and Cache + Intel SSD?



So it’s been a few weeks since my last blog and I want to switch gears away from talking about the Intel Data Center Family of SSDs features. We’ve gone over features like production qualification, endurance requirements, power-loss protection, and finally consistent performance. What I want to talk about in this blog is solutions. Specifically, today we have a number options to leverage SSDs without going to the extreme of replacing your hard drives (HDDs) 1-for-1 with SSDs for any particular application.

The solutions team I work in at Intel has spent the last year exploring the benefits of our SSDs in a wide array of environments from Big Data to Virtualization with many stops in-between. The interesting commonality in all of these explorations is that in many cases, the best ROI and TCO benefits come from using the SSD as temp, tier, or cache! There are a few use cases where 1-for-1 replacement of traditional HDDs pays off, but for the most part… repeat after me; temp, tier, and cache... think lions, and tigers, and bears, and a zoo!


In the ‘temp’ space, (wow, that pun was even a stretch for me) we’ve seen goodness in Hadoop with jobs that produce intermediate data by changing the “mapred.local.dir” over to an SSD and with relational databases by moving ‘TempDB’ to a local SSD. In tiering with virtualization, we demonstrated a number of solutions where we built low-cost 100% SSD software based SAN and moved VMs into these high-performance NFS or iSCSI datastores. Using SSDs as buffer/journal/cache, we’ve looked at software based scale-out storage solutions such as VMware’s VSAN, PernixData, Microsoft Storage Spaces, and open source options such as CEPH. In the pure cache space, we’ve also looked at several different caching software packages, including Intel CAS (Cache Acceleration Software), and how these packages can benefit Enterprise IT workloads.


So, there’s plenty of opportunity to leverage SSDs to accelerate your enterprise workloads. The question is, “will your workload benefit enough to yield ROI or a TCO improvement?” Which brings me to my second point, your workload really matters! In my IT past, I often handed off storage workloads to the SAN (Storage Area Network) team. When application performance issues arose; we looked at storage, determined if that was the cause, and requested more IOPS or throughput from the SAN. In contrast, most of the temp, tier, or cache implementations studied by this team focus on locally attached storage or DAS (direct attached storage). This being the case, the application engineer and the systems engineer must work hand-in-hand to look at the workload, details of the IO, and in the case of our testing… determine how to best employ SSDs as temp, tier, cache, or as a final option, complete conversion to SSDs (muahahaha)!


Let me illustrate with a brief walkthrough. Almost 2 years ago now, we published this paper on accelerating database workloads with Intel SSDs. In this paper we did a 1-for1 replacement of local 15k disk drives with SSDs. If we were to repeat this exercise today, we’d need to investigate how much ‘TempDB’ was used, assess whether or not a solution like Intel CAS could help, evaluate features like ‘Buffer Pool Extension’ in Microsoft SQL’14, and then finally look at wholesale replacement of the 15k data drives with SSD based on both the IOP and capacity requirements of the application.


The point here is there are now many more opportunities to realize the benefits inherent in SSDs while keeping capital spend to a minimum. A brief query on one of my favorite retail sites this month (April '14) tells me that a 240GB Intel SSD DC S3500 runs roughly the same price as one of the major OEMs certified 10k SAS drives at 300GB, about $250.00. So in $/GB there’s still a 1.25x difference in cost.


I’m looking forward to the day when by default SSDs are the first choice in local storage and they’re getting close. In fact, for uses like boot/swap the price is close enough to warrant having a RAID 1 SSD solution in-box and an extra Ace in the engineer’s sleeve. Until then, there are a lot of things I can do with temp, tier, and cache. The question for the reader is can you use one of these methods to help your applications and data center run faster while providing a good ROI or decrease in TCO? Could you do more with less and increase efficiency by leveraging SSDs?

- Chris


Christian Black is a Datacenter Solutions Architect covering the HPC and Big Data space within Intel’s Non-Volatile Memory Solutions Group. He comes from a 23 year career in Enterprise IT.

Follow Chris on Twitter at @RekhunSSDs.


#ITCenter #SSD

20 Questions on SSD: Things you’ll want to ask.S3500Small.png

Question #4: How consistent is the performance of your SSD?

In the last few blogs of our 20 questions on SSD series, we looked at OEM qualification, endurance, and power loss protection. In this blog, we’ll look at a question that’s critical to performance in RAID sets and planning in your data center, “How consistent is the performance of your SSD?”

We’ve talked about SSDs in the past with analogies using cassette tapes, Peter Frampton, and my vintage Honda Civic… This time, we’ll use the 1981 Yamaha XJ550 Seca I started riding in the late 80s. Let’s just say it was well worn (+50k miles) when I bought it for a mere $250 in 1988. It was a sweet ride and great on gas, exciting as I was completely broke at the time. The point here is it was a great bike... although it did have a small problem with the starter. Sometimes the starter worked, sometimes it didn’t, sometimes my 119 pound self just had to push start it, and most of the time I parked at the top of a hill just in case.

I never knew whether or not the starter would behave which was terribly frustrating. I could have used a little consistency.. which brings me to my point about SSDs. Planning in the Enterprise data center requires components that perform consistently and SSDs are no exception.

So let’s start with a couple of graphs, these are from some testing we did internally at our lab in the NVM (Non-Volatile Memory) Solutions Group here at Intel. The 1st graph on the left (in blue) is an Intel SSD DC3500 Series drive performing a mixed 70/30 read/write workload with 4k blocks across the whole span of the drive at a queue depth of 4. We used Iometer and a recent Intel Xeon E5-2600 server to create this load on the SSD. You’ll notice that the performance over time averages between 20-22k IOPS consistently during the entire duration of the test series. Next let’s look at a really good drive of the same capacity as the Intel SSD DC S3500 Series from a top manufacturer on the right (in red).


This graph is the result of running the exact same workload in the exact same configuration as the Intel SSD. You’ll notice immediately that if you average out the IOPS of both the Intel SSD and this other SSD, the other SSD is a bit lower... but pretty close. You’ll also notice that the other SSD occasionally peaks at about 2x IOPS, but also dips to only 70% the IOPS we see from the Intel SSD. We’ve found this type of inconsistent performance behavior in many of the SSDs we’ve evaluated internally from other manufacturers.

RAID0ScalingSM.pngNow take a look at the RAID 0 scaling tests we did with the same drives. We used RAID 0 for demonstration purposes as it requires the least amount of overhead from the RAID controller. Perfect linear scaling is shown in gray, good scaling from a consistent SSD is shown in blue, and poor scaling from an inconsistent SSD is shown in red. What we found here is that the worst element in a RAID set dictates the overall performance of the array in general. In other words, you really can't duplicate Shakespeare's work by running a tornado through a pile of random shredded documents.

Now let’s frame this in terms of the Enterprise data center. For example, I’m planning for a new SQL database (DB) deployment and I’m putting SSDs in a RAID 5 for my main database files and a RAID 10 for my log files. The question is, “Do I want to plan around a drive that delivers the same consistent performance regularly?” I know that Intel Data Center SSD is going to cost a little more than the competition. However, my alternative is to deal with inconsistent performance and the potential for application problems that would be difficult to troubleshoot and impossible to reproduce. It's akin to asking whether or not you want a guitar amplifier that “goes to 11”… as long as you’re OK with a volume knob that sticks at “7” on an unpredictable basis. And... unlike my erratic starter in that XJ550 Seca, I can’t exactly park my database at the top of a hill every time and give it the old run, jump, & bump. I know which SSD I’d prefer for my production environment.

Consistency in performance, much like OEM qualification, endurance, and power loss protection are critical to data center deployments and come built-in with every member of the Intel Data Center Family of SSDs. I’ll be back in a few weeks to talk about more SSD goodness in the 5th installment of our 20 questions blog, see you then…


- Chris


Christian Black is a Datacenter Solutions Architect covering the HPC and Big Data space within Intel’s Non-Volatile Memory Solutions Group. He comes from a 23 year career in Enterprise IT.



Follow Chris on Twitter at @RekhunSSDs.




#ITCenter #SSD

20 Questions on SSD: Things you’ll want to ask.


Question #3 - Is your SSD protected from unplanned power loss?

S3500Small.pngIn the 1st and 2nd blogs of our 20 questions on SSD series, we looked at production/OEM qualification and write endurance. In this blog, we’ll look at a more sinister question, “Is your SSD protected from unplanned power loss?”

The Enterprise IT reader’s first thought here might be, “I don’t lose power in my data center, and if I do… I’m protected with uninterruptable power supplies (UPS) and a backup generator to boot… not an issue, right?” With that in mind, let’s dive down and get a little more granular with a couple of scenarios.

I’m building a server in the data center, and the install hangs for some strange memory error and I need to pull the power cord or hard-cycle the box. I’m replacing a failed drive in a RAID set and accidentally pull the wrong one. I personally disavow any knowledge of having done something like this in a production environment at any time in the past, but it seems possible. ;-)  I’m cycling through old drives at a lab workbench to see which ones to re-use and which ones to run a DOD secure wipe on before sending them to e-waste/recovery. Every one of these scenarios, and many others, carries with it the risk associated with ‘surprise removal’ or unsafe/unplanned power down.

The question is, is your SSD up to the challenge? What if the SSD was doing some housekeeping internally and moving/writing data to free up spare area? What if the OS received an acknowledgement for a pending write and that data was still in the DRAM buffer of the drive? What if your SSD can’t recover? The answer to this last question could range from no problem; to undetectable data corruption, partition corruption, or possibly a device that needs to be re-partitioned and re-formatted to be useful. In the immortal words of William S. Preston Esquire & Theodore Logan (Bill & Ted), “Bummer Dude!”

SuperCaps.pngIntel’s response to this issue in our Data Center Family of SSDs is a feature called PLI or Power Loss Imminent! You can watch a short video explanation of PLI out on YouTube, but in a nutshell; PLI watches the voltages supplied to the SSD continuously, and if that voltage decreases significantly over a short time period the drive halts IO both internally and externally. It then switches over the internal super capacitor and makes sure than any in-flight writes are committed to the storage media (NAND). Think of PLI as a little spare gas tank or battery for your car. You’re almost home and only have a half mile to go when for some unknown reason your gas tank goes from half-full to nothing... PLI steps in and makes sure you get all the way home, all in one piece! PLI is an absolutely necessary feature for datacenters where unplanned removal or surprise power outages might occur, and you’d like to keep your data intact.

In conclusion, I’ll leave you with a quote from Iclk and one of his articles out there on where he came to this conclusion after testing a number of widely available SSDs, “if you care about data even when power could be unreliable, only buy Intel SSDs.”

OEM qualification, endurance, and power losses seem to be covered with the Intel Data Center Family of SSDs. I’ll be back in a few weeks to talk about more SSD goodness in the 4th installment of our 20 questions blog, see you then…


Christian Black is a Datacenter Solutions Architect covering the HPC and Big Data space within Intel’s Non-Volatile Memory Solutions Group. He comes from a 23 year career in Enterprise IT.


Follow him on Twitter (@RekhunSSDs)

Question #2: Which SSD is right for your workload?


S3500_small.jpgIn the 1st blog of our 20 questions on SSD series, we looked at whether or not your SSD was qualified by your OEM, opening the gateway to the Enterprise IT data center. This 2nd blog addresses the question, “Which SSD is ‘right’ for your workload?” and will help you select the data center solid-state drive with the 'right' write characteristics.

In later blogs, we’ll also address other features in Intel Data Center SSDs like power loss protection. For this blog, let’s look directly at the elephant in the room and talk about write endurance.

To get started, let’s define the term and then explore how IO workload affects an SSD’s endurance. Simply put, endurance is the ability to sustain write activity in a storage medium that has known write limitations. You can think of this in many ways, but I personally like the good old cassette tape analogy. There is a limit to how many times you can play a tape before Peter Frampton just doesn’t sound like himself anymore. The mechanical parts wear and stretch the tape in the process of both playback and recording. Akin to this, the flash memory ( NAND ) media that makes up an SSD also wears out on an atomic level as you read and write to it. From the Intel SSD spec sheets, we rate a drive in DWPD or drive-writes per-day for a 5-year time period. For our high endurance Intel SSD DC S3700 Series we get 10 DWPD and for our standard endurance Intel SSD DC S3500 Series we get 0.3 DWPD. But there’s more to endurance than a simple multiplication of DWPD * capacity = daily writes. Hey, what does DPWD have to do with a worn out Frampton tape still in my vintage Honda Civic?

EnduranceGraphicSmall.pngHere’s the scoop. NAND is a tricky medium and requires some serious electrical gymnastics to make it function in an Enterprise IT role. When the drive is reading, writing, leveling wear, clearing space, and doing other housekeeping operations it produces something we call Write Amplification (Write Amp). At the most basic level, Write Amp is the ratio of host writes to NAND writes. In other words the ratio of writes a computer system made to the device to the number of internal operations the drive needed to accomplish to satisfy those host writes. When we measure endurance in the Intel Data Center family of products, we do so at a Write Amp of about 3.0. This Write Amp corresponds roughly to a 4k block size 100% random write workload across the entire size/span of the drive.

The reader is thinking at this point…. AAACH TOO DEEP! But hang in there, your workload matters because as a general rule anything that decreases Write Amp increases the endurance of the SSD. There are three major things that can decrease Write Amp these include; an increase in block size from 4k, a decrease in randomness from 100%, and use of less than 100% of the capacity of the drive. Conversely, shrinking the block size (since we can’t get any more random than 100%) will increase Write Amp because more housekeeping is required for each operation. Once you get this, it’s more fun than Schrodinger’s cat!

So there it is, your workload has a direct effect on the endurance of your Intel SSD. Intel’s SSDs are tested for that worst case of 4k-100% span-100% random workload, and your workload may be less intense which will allow the drive to function longer! It’s as if playing that Frampton cassette in its entirety (sequentially) or just playing only the 1st song makes the whole tape last longer. With this info under your belt you can measure any particular workload with perfmon, iostat, or your favorite performance tool, then look at those stats in comparison to the benchmark to get a good idea of which drive to purchase and at what capacity. The Intel DC S3700 SSD Series is better for high endurance needs, while the Intel DC S3500 SSD Series is better for those more standard endurance workloads.

This leads us into the additional features I breezed by in the beginning of this blog. We’ll look at some of these in the 3rd blog in this series a few weeks from now, see you then…


Christian Black is a Datacenter Solutions Architect covering the HPC and Big Data space within Intel’s Non-Volatile Memory Solutions Group. He comes from a 23 year career in Enterprise IT and you can follow his travels on Twitter at @RekhunSSDs.


#ITCenter #SSD

20 Questions on SSD: Things you’ll want to ask.


Question #1: Is your SSD qualified for production?




This is the 1st in a series of blogs where we’ll explore a number a questions we regularly receive and discuss with enterprise customers when selecting an SSD for their data centers. These go beyond that first and most obvious, “how much does it cost” question. Often, these secondary questions and details play a larger role in in selecting the proper SSD than the $/GB or $/IOP (IO Operation) cost that begins many of these discussions.  In my Enterprise IT experience, I always depended on the production support, qualification, and service from one or more OEMs. Only in  the rarest conditions did anything ever land in production without full support and a 4-hour call-to-repair service agreement. The good news here is the Intel Data Center Family of SSDs are qualified by many of the major OEMs for use in the data center. As an example, the 2.5” form factor Intel SSD DC 3700 Series 400GB drive is available as an Intel branded drive with SKU (stock keeping unit) SSDSC2BA400G3, from Dell the SKU is 6XJ05, from HP the SKU is MK0400GCTZA, and from IBM the SKU is 41Y8336. Many other OEMs have certified the Intel Data Center SSDs and use the Intel generic SKU in place of a custom number.


So what does this say for the Intel Data Center SSDs? It says that those OEMs have acknowledged the quality of the devices, run them through exhaustive qualification and stress testing, and are willing to support them in the various configurations you might deploy in production. These same statements are not true of all SSDs in the marketplace today. If you’re deploying into a production environment, make sure your OEM supports the device because unsupported configurations can make it difficult to root-cause problems when they arise.


SSD to Data Center roadblock number one… GONE! The door to the Enterprise Data Center is opened by qualification, but we still have a lot more questions to ask. Enterprise IT is a complex space, and we’ll need more than just the ability to deploy trusted hardware with production support, we need the ‘why’ and ‘to what benefit’ questions answered too. Next in the upcoming series, we’ll talk about more details starting with workload characterization and how knowing the IO profile of an application will help select the right SSD for the job. See you then…


Christian Black is a Datacenter Solutions Architect covering the HPC and Big Data space within Intel’s Non-Volatile Memory Solutions Group. He comes from a 23 year career in Enterprise IT and you can follow his travels on Twitter at @RekhunSSDs


#ITCenter #SSD

Why choose a data center class SSD...


In my recent travels, I’ve had a number of questions from end users about the differences between data center series and client series Intel’s SSDs. Folks have expressed the desire to use the less expensive client drive, specifically the Intel SSD 530 Series client drive in place of the Intel SSD DC S3500 Series data center drive in a data center workload. What I’d like to convey is the reasoning for using a data center product, so we’ll do a little walkthrough in this blog. For this “why” exercise, we’ll use the specification sheets listed below and some estimated web pricing for the 80GB versions of these drives as a reference. We’re using the smallest common size of drive to highlight the differences between the two classes of device. At this lowest capacity point the GB/Day in write endurance is similar, which is normally what spawns this discussion.


Here are the specification sheets:

Intel SSD DC S3500 Series product specification:
Intel SSD 530 Series product specification:


For those new to the SSD space, write endurance is how long the device will last typically specified in GB/Day or TBW (terabytes written). Here are the endurance ratings from the specification sheets, and some roughly estimated web pricing.


Endurance rating 80GB Intel SSD DC S3500 Series: 45TBW for 80GB Drive – Estimated web pricing $120.00 (December ’13)
Endurance rating 80GB Intel SSD 530 Series: 20GB of host writes per day – Estimated web pricing $80.00 (December ’13)


Looking at the spec sheets closely you’ll notice that the TBW (terabytes written) increases with the capacity of the Intel SSD DC S3500 Series whereas all the 530 Series drive’s write capacity is fixed at 20GB/day. The write capacity of the client drive does not scale as the capacity of the drive increases.


Let’s start with endurance...

Evaluating the specification sheets, both drives are guaranteed for 5 years. For the 80GB Intel 530 Series drive, this guarantee is at 20GB host writes per day. For the 80GB Intel DC S3500 Series drive the specification lists 45TBW (terabytes written). Doing some quick base-ten math on the Intel SSD DC S3500 Series; 45TBW * 1000 gets us Gigabytes, then divide by 365 for years, and divide 5 for the number of years guaranteed gets us to almost 25GB of host writes per day for the 80GB Intel DC S3500 series drive. Then we cruise out to the web, pick a retailer or two, and come up with a roughly estimated holiday ’13 price for the 80GB Intel SSD DC S3500 Series at about $120.00 and the 80GB Intel SSD 530 Series at about $80.00.

At first glance, wait… 5 extra GB a day for half again the cost? The typical cost-conscious evaluator might exclaim, “NO WAY!” and simply purchase the 530 Series instead. This raises the “why” question, for data center workloads we'll need to go a little deeper with a discussion on endurance, end-to-end data checking, power-loss circuitry, the Intel drive controller, and performance consistency.


GB-Day-Writes.pngEndurance testing:

When we endurance test our SSDs, client SSDs are tested with a client workload and data center SSDs are tested with a data center workload. These workloads differ in the way they perform IO to the drives and the data center SSD workload is much more strenuous on the drive the client workload. Put this in context by thinking of SSD endurance as a 26 mile marathon. The client marathon is a relatively flat course with a few small hills; the data center course is runs through mountains and canyons for the entire course. Each of these races takes a different type of athlete, both trained, both skilled, and both experts, but to compare the two by using finish time does injustice to both. The same is true of SSDs specifically designed for the client or for the data center; they are two completely different types of athletes! The SSD you select for your data center needs to be ready for the rigors of the mountain paths.


End-to-end data checking:

Another reason to select an Intel SSD DC S3500 Series drive for the data center would be end-to-end (E2E) data checking and correction. Most drives in the Enterprise IT data center reside behind a RAID controller and are qualified by the major server OEMs (Original Equipment Manufacturers) for production support and warranty. One of the major milestones for product qualification/validation in this space is the E2E correction feature. In E2E data is checked, validated, and corrected if need be internally. This happens at every step of the way; from the point data enters the drive for a write and until it exits the drive for a read. Again with the running metaphor, client drives don’t require this type of strenuous error correction features and the Data Center Series of Intel SSDs meets this requirement. Now we have a mountain marathon runner, who can do error-correction on the trail.


PLI – Power Loss Imminent:

Looking for another reason to select a Data Center Series Intel SSD? PLI is a special feature that looks at the power supplied to the SSD and automatically stops both external and internal IO if it senses a power outage, then makes sure that all the in-flight data is written to the storage media (NAND) using the energy stored in a special super-capacitor. This feature guarantees, in the event of a power outage, that anything an operating system (OS) received an acknowledgement for is physically written to the drive. This is especially important in data center workloads and adds another feature to our mountain runner… the ability to stop the race and save in-flight data.


The Intel Data Center drive controller:

Some people ask about the about differences in the drive controller. In the Intel SSD 530 Series drive, we use a client controller that leverages an internal compression engine to help accelerate client workloads. In our Data Center Series of products, a purpose-built Intel controller does not use hardware compression. Our experience in the data center shows us that many data center workloads are either compressed by the application, or are already in a compressed format when stored in drives. Again, the Data Center Series Family of products was built from the ground up for a data center workload. With this “built for extremes” controller at the runner’s disposal, he can run both the canyons and the flats.


Last but not least, performance consistency:

There’s a difference in the expected duty cycle that client drives and the Intel Data Center Series Family of drives are built for. The data center drive is built to run 100% of the time, its housekeeping activities occur as the drive is operating and this activity does not cause changes in observed performance of the device. With that in mind, the Intel SSD DC S3500 Series drive includes a “Quality of Service” metric in the specification outlining the maximum observed latency up to %99.9999 of the time. In other words, our runner turns in the same marathon time every time no matter the running conditions.


Wrapping this up, if we look at the Intel Data Center Family of SSDs… We have an athlete who’s trained for the rigors of the data center, performs end-to-end data checking, can handle both compressible and incompressible workloads, looks for imminent power outages, and maintains consistent performance under load. In a nutshell, that’s the “why” behind data center class SSDs. The question for the reader now becomes, “Which runner you want on your data center team?”

- Chris


Christian Black is a Datacenter Solutions Architect in Intel’s Non-Volatile Memory Solutions Group. He comes from a 23 year career in Enterprise IT and you can follow his travels on Twitter at @RekhunSSDs.

24 months of Intel SSDs…. What we’ve learned about MLC in the enterprise…


The Enterprise Integration Center (EIC) private cloud lab (a joint Intel IT and Intel Architecture Group program) has been working with Intel SSDs (solid state disks) for the last two years in a number of configurations ranging from individual boot/swap volumes for servers to ultra performance iSCSI software based mini-SANs. So, what have we learned about performance, tuning, and use cases?


There are plenty of industry resources and comparisons available out at any number of trusted review sites, but most of these revolve around client usage and not server/datacenter uses. From my contact with industry, most engineers seem to think that using an SSD in the datacenter requires a SLC NAND device (Single Level Charge - Intel X25-E product) due to endurance requirements. For those new to NAND characteristics, endurance (usable lifetime) is determined by writes to the NAND device as block-erase cycles stress and degrade the ability of the flash cells to be read back. Basically, SLC devices last through more block-erase cycles than their less expensive and larger capacity MLC cousins (Multi Level Charge - Intel X25-M product). The assumption that ‘only SLC will do’ for the enterprise raises the $/GB cost flag and mires discussion. Endurance is the number one, “but those won’t for my use-case” argument.


The EIC cloud lab has some good news here, lower cost MLC or consumer grade devices can do just as well, especially in RAID arrays. To get the best out of these MLC devices though, we have to employ a few techniques that allow the drive and its components to function more efficiently. These techniques manipulate the three vectors in MLC… space, speed, and endurance by altering the useable size of the disk.


Assume I have a 160 GB X25-M MLC drive; this device is spec’ed at 250MB/s read and 100MB/s write (sequential) and has a lifetime of around 4-5 years in a ‘consumer’ use case (laptop-desktop). So if I was to use this same device as a repository for a database transaction log (lots of writes), the lifetime would shorten significantly (maybe as little as a year). There are specific formulas to determine endurance & speed, some that are unavailable to the public, but Principal Engineer Tony Roug wraps up the case for MLC in the enterprise quite well in this presentation from Fall 2010 Storage and Networking World.


Back to trade offs (space, speed, and endurance); my 160GB MLC drive won’t work for my database transaction log because the workload is too write intensive… What I can do about this is to take the 160GB drive and modify it to use only 75% (120GB) of the available capacity. Reducing the ‘user’ available space gives the wear-leveling algorithm in the drive more working room and increases both the speed (write speed as reads are unaffected by this) and the endurance of the drive, but also increases the $/GB as you have less available space.


With the ‘user’ space reduced to 120GB (over-provisioned is official term), that same 160GB is now capable of 250MB/s read and 125MB/s write (sequential) and has a lifetime of 8-10 years in the ‘consumer’ use case. Not terribly appealing to the average end-user who just spent $350 on an SSD as they lost 25% of the capacity, but in the performance and enterprise space this is huge. Once modified, my ‘consumer grade’ MLC drive gets roughly 75-80% of the speed & endurance of the X25-E SLC drive with 4x the space at about the same ‘unit cost’ per drive. Since the drive is 4x larger than SLC, will likely last as long as a standard hard disk once over-provisioned, has great throughput at 125-250MB/s, and can reach 100-400x the IO operations of a standard hard drive we can now begin the discussion around which particular enterprise application benefit from Intel MLC SSD.


For the enterprise, once we overcome the endurance hurdle, the value discussion can begin. For the performance enthusiast at home, this same technique allows a boost in disk write throughput, higher benchmark scores, and of course more FPS (frames per second) in whatever game they are thoroughly stressing their over-clocked water-cooled super-system with at the moment.


BKMs (Best Known Methods) for enterprise and use-case evaluation… AKA: The technical bits…


  • Get to know the IO characterization (reads/writes) of the target application & use case
  • Baseline the application before any SSD upgrades with standard disks, collecting throughput and utilization metrics
  • Knock a maximum of 25% off the top of any MLC drive you’re using in the datacenter
    • More than 25% has diminishing value
    • Use either an LBA tool, RAID controller, or partitioning tool after a fresh low level format
    • That % can be smaller based on the write intensity of the target application - less writes = less % off the top on a case by case basis
  • SAS/SATA RAID controller settings
    • Activate on-drive cache – OK to do in SSD
    • Stripe size of 256k if possible to match block-erase cycle of drive
    • Read/write on-controller DRAM cache should be on and battery backed
  • Make sure any drive to controller channel relationship in SAS controllers stays at 1:1
    • Avoids reducing drive speed from 3.0 Gbps to 1.5 Gbps
  • Avoid using SATA drives behind SAS expanders
    • Again… avoids reducing drive speed from 3.0 Gbps to 1.5 Gbps
  • SSDs are 5v devices, make sure the 5v rail in the power supplies has a high enough rating to handle to power-on of X number of SSDs
    • Only necessary if you’re putting 16+ drives in any particular chassis
  • Baseline the application after SSD upgrade to determine performance increase collecting throughput and utilization metrics
    • Look for higher IOPS and application throughput but also be looking for higher CPU utilization numbers now that you have eliminated the disk bottleneck from your system
    • There will likely be a new bottleneck in other components such as network, memory, etc… look for that as a target for your next improvement
  • Last but not least, when testing an application you’ll need to ‘season’ your SSDs for a while before you see completely consistent results
    • For benchmarks, fill the drive 2x times completely and then run the target test 2-3 times  before taking final measurements
    • For applications, run the app for a few days to a week before taking final performance measurements
    • Remember, a freshly low level formatted SSD doesn’t have to perform a block-erase cycle before writing to disk


Well, that’s it in a fairly large nutshell… We see using MLC disks in enterprise use cases as something that is growing now that the underlying techniques for increasing endurance are better understood. In addition, as Intel’s product lines and individual device capacities expand… so can enterprise use cases of these amazing solid-state disks. The question left to answer is, “In your datacenter, are there applications and end-users you can accelerate using lower cost MLC based Intel SSDs?”


- Chris

Cloud Computing & the Psychology of Mine

Legacy Thinking in the Evolving Datacenter

The 1957 Warner Brothers* cartoon “Ali Baba Bunny” shows a scene where an elated Daffy Duck bounds about a pile of riches and gold in Ali Baba’s cave exclaiming, “Mine, Mine.. It’s all Mine!” Daffy Duck, cartoons, Ali Baba… what do these have to do with the evolving datacenter and cloud computing?

The answer to this question is ‘everything’! Albeit exaggerated, Daffy’s exclamation is not far from the thinking of the typical application owner in today’s datacenter. The operating system (OS), application, servers, network connections, support, and perhaps racks are all the stovepipe property of the application owner. “Mine, Mine… It’s all Mine!” For most IT workloads, a singularly purposed stack of servers, 50-70% over-provisioned for peak load, and conservatively sized at 2-4x capacity for growth over time. The result of this practice is an entire datacenter running at 10-15% utilization in case of unforeseen load spikes or faster than expected application adoption. Given a server consumes 65% of its power budget when running at 0% utilization, the problem of waste is self-evident.

Enter server virtualization, the modern Hypervisor or VMM, and the eventual ubiquity of cloud computing. Although variations in features exist between VMware*, Microsoft*, Xen*, and other flavors of virtualization, all achieve abstraction of the guest OS and application stack from the underlying hardware and workload portability.

This workload portability and abysmal utilization rates allows consolidation of multiple OS-App stacks into single physical servers, and the division of ever larger resources such as the 4-socket Intel Xeon 7500 series platform which surpasses the compute capacity of mid-90s supercomputers. Virtualization is a tool that helps reclaim datacenter space, reduce costs, and simplify the provisioning and re-provisioning of OS-App stacks. However much like a hammer, virtualization requires a functioning intelligence to wield and could result in more management overhead if one refuses to break the paradigm of ‘mine’...

A portion of this intelligence lies with the application owner. In the past, the application owner had to sequester dedicated resources and over-provision to ensure availability and accountability. Although this thinking is still true to a degree, current infrastructure is much more fungible than the static compute resources of 10 or even 5 years ago. The last eight months working on the Datacenter 2.0 project, a joint Intel IT and Intel Architecture Group (IAG) effort, brought this thinking to the forefront as every Proof of Concept (PoC) owner repeatedly asked for dedicated resources within the project’s experimental ‘mini-cloud’. Time and time again, end users asked for isolated and dedicated servers, network, and storage demonstrating a fundamental distrust of the ability of cloud to meet their expectations. Interesting, most of the PoC owners cited performance as the leading reason for dedicated resource request yet were unable to articulate specific requirements such as network bandwidth consumption, memory usage, or disk IO operations.

The author initially shared this skepticism as virtualization and ‘the cloud’ have some as-yet immature features. For broad adoption, the cloud compute model must demonstrate both the ability to secure & isolate workloads and the ability to actively respond to demands from all four resource vectors of; compute, memory, disk i/o, and network i/o. Current solutions easily respond to memory and compute utilization however, most hypervisors are blind to disk and network bottlenecks. In addition, current operating systems lack the mechanisms for on-the-fly increase or decrease in the number of CPUs and memory available to the OS. Once the active measurement, response, trend analysis, security, and OS flexibility issues are resolved virtualization and cloud compute are poised to revolutionize the way IT deploys applications. However, this is the easy piece as it is purely technical and one of inevitable technology maturation.

The more difficult piece of this puzzle is the change in thinking and paradigm shift that the end users and application owners must make. This change in thinking happens when the question asked becomes, “is my application available” instead of, “is the server up?” and when application owners think in terms of meeting service level agreements and application response time requirements instead of application uptime. After much testing and demonstration, end users will eventually become comfortable with the idea that the cloud can adapt to the needs of their workload regardless the demand vector.

Although not a panacea, cloud computing promises flexibility, efficiency, demand-based resourcing, and an ability to observe and manage the resources consumed by application workloads like never before. As this compute model matures, our responsibility as engineers and architects is to foster credibility, deploy reliable solutions, and push the industry to mature those underdeveloped security and demand-vector response features.

Christian D, Black, MCSE

Technologist/Systems Engineer

Intel IT – Strategy, Architecture, & Innovation

Filter Blog

By date: By tag: