1 2 3 Previous Next

The Data Stack

1,655 posts

Are you ready? Today Microsoft released Technical Preview 5 (TP5) of Windows Server 2016 with Storage Spaces Direct.  We at Intel have been working with Microsoft on configurations to help OEMs, ODMs, and systems integrators bring solutions to market quickly.

 

The hyper-converged architecture of Storage Spaces Direct (S2D) is a giant leap forward for many key IT business applications. There are several enhancements in TP5, as discussed in Claus Joergensen’s blog: Storage Spaces Direct Technical Preview 5, along with the use of solid state drives (SSDs) for S2D that have caught the attention of the enterprise IT community.

 

Many IT professionals see the promise of hyper-converged and are rethinking how it can assist them from a compute as well as storage perspective. We are excited to share our joint work with Microsoft in helping prepare for TP5 evaluation. Learn from our combined knowledge working on S2D how to take advantage of Microsoft and Intel technologies for your target workloads.  These configurations include Intel® Xeon® processor- based servers and Intel® Solid State Drives (SSDs), providing a range of options for performance, reliability, and agile enterprise solutions.

 

We collaborated with Microsoft to develop three configurations that span a range of needs from the most latency/IOP sensitive business processing applications to capacity hungry data warehousing. We have been testing some of these configurations already and will be testing all three configurations with TP5 using hardware from different OEMs.  We plan to share the results in upcoming blogs as soon as the data is available.

 

1. IOPs Optimized


All Flash NVMe SSD configuration for IOP and latency sensitive business processing applications that demand the best quality of service (QoS).

 

  • Server : 1U 1Node or 2U 1Node
  • CPU: High core count processor, such as Intel® Xeon® processor E5-2699 v4 with 22 cores
  • DRAM: DDR4 - 16GBx24=384 GB (Min); 32GBx24=768GB (Max)
  • Cache Storage: Low-latency, high-endurance SSD, such as 2x Intel® SSD DC P3700: 800GBx2=1.6TB
  • Capacity Storage: 6-8x Intel® SSD DC P3520/DC P3500: 2TBx6-8=12-16TB
  • NIC: 2x40GbE RDMA NIC (iWARP preferred)
  • Switch:  40GbE switch

Config 1.jpg

2. Throughput/Capacity Optimized


All Flash configuration NVMe cache tier and high capacity SATA SSDs for blend of high performance and capacity for decision support and general virtualization

 

  • Server : 2U 1Node
  • CPU: Relatively high core count processor, such as Intel® Xeon™ processor E5-2695 v4 18 cores
  • DRAM: DDR4 -16GBx24=384 GB (Min); 32GBx24=768GB (Max)
  • Cache Storage: Low-latency, high-endurance SSD, such as 4x Intel® SSD DC P3700: 800GB X4=3.2TB
  • Capacity Storage: 20x SATA Intel® SSD  DC S3610: 1.6TB X20=32TB
  • NIC: 2x 40Gb RDMA NIC (iWARP Preferred)
  • Switch:  40 GbE switch

Config 2.jpg

 

3. Capacity Optimized


Hybrid configuration to optimize $/GB for efficiency using NVMe SSD cache plus HDDs for high capacity data storage, suitable for data warehousing, Exchange or SharePoint

 

  • Server : 2U 1Node
  • CPU: medium core count processor, such as Intel® Xeon® processor E5-2650 v4 with 12 cores
  • DRAM: DDR4 - 16GBx16=256 GB
  • Cache Storage: Low-latency, high-endurance SSD, such as 2x Intel® SSD DC P3700: 1.6TBx2=3.2TB
  • Capacity Storage: 8x HDD 3.5”: 6TBx8=48TB
  • NIC: 2x10Gb RDMA NIC (iWARP Preferred)
  • Switch:  10 GbE switch

Config 3.jpg

 

To maintain a reliable storage system, we selected SSD technology with the best blend of read and write performance, drive reliability, and endurance levels. NVMe provides the lowest latency and high performance consistency with 10 drive write per day endurance that is necessary for the performance critical cache tier. NVMe devices are also more CPU efficient than their SATA counterparts. We selected The Intel® SSD DC P3700 NVMe for the cache tier of all configurations.

 

Standard to mid endurance SSDs can be used in the capacity tier behind the high endurance cache drives. The choice between NVMe and SATA for capacity storage will depend on the performance and latency sensitivity of the applications and the platform capacity needed. Consistent performance is an important attribute for supporting all enterprise applications and larger numbers of users and virtual machines in Hyper-V virtualized environments.  We selected the Intel SSD DC P3520/DC P3500 NVMe and DC S3610 SATA SSD for capacity storage in the all flash configurations.

 

Not all “off the shelf” SSDs should be used in a S2D configuration. The Intel SSD Data Center Family is recommended because it provides a data integrity mechanism to protect against undetectable errors while maintaining superior levels of measured annual failure rate, which contributes to the high reliability of the S2D configurations.

 

Whether you are a DBA, developer or storage architect you can get up and testing quickly with one of these recommended Windows Server 2016 TP5 configurations.  Watch for our follow-on blogs sharing the testing data as it becomes available.

A Brief History of Software-Defined Storage


Software-Defined Storage (SDS) has become the new “It Girl” of IT, as storage technology increasingly takes center stage in the modern datacenter. That’s not difficult to understand, as SDS brings tremendous advantages in terms of flexibility, performance, reliability and cost-savings.

What might not be as easy for the new storage buyer to understand is “What IS SDS exactly?” Typically the answer is some reference to a particular software or appliance vendor, as though the term SDS is synonymous with a specific product or device. That’s savvy marketing, as companies would very much like you to think of their brand as the “Kleenex” or “Band-Aid” of the SDS world. What often gets missed in the process is any genuine explanation or understanding of SDS itself.

So, let’s correct that. I thought it would be useful to jump in a time machine back to the days of the first personal computers. Storage in those days was certainly not “Software-Defined”.  It was typically either a cassette tape recorder (with cassettes), or (if you were one of the cool kids) a floppy drive of some kind with the associated disks. Storage was “defined” by the hardware and physical media.

 

  




While the invention of the hard drive actually predates floppy discs by more than a decade, the first commercially viable consumer drives did not become popular until the adoption of the SCSI standard in the mid-1908s. (I purchased a SCSI drive for my own personal computer - a whopping 20MB - around that time for $795.00 … my how times have changed!)That’s where it started to get interesting. Someone realized along the way that – when you have “huge” amounts of storage – you can divide that up into separate partitions. Operating systems gained the ability to create these partitions. So, my 20MB hard drive became three “drives”: OS, PROGRAMS & DATA. It’s here where we see the first glimmerings of what would become Software-Defined Storage. All of a sudden a C: D: & E: drive did not literally have to refer to separate physical drives or media. Those 3 “drives” could be “defined” by the OS as residing on one, two or three physical devices.

 

So, we could at that point divide (or partition) a single media device into multiple drives. The next step was to make it possible to take multiple devices and make them appear as one resource. This was driven by the observation that hard drive capacity was increasing, but performance was not. The idea of using a “Redundant Array of Inexpensive Discs” (RAID) solved that performance problem (for the time being), but it was quickly realized that this came at the cost of lower reliability. Mirroring (RAID-1) and parity (RAID-5) approaches solved that issue, and now RAID is a ubiquitous part of almost all current data center storage designs.

For our purposes however, the important bit is how that changed the way storage was defined. With RAID, one could now take 2 or more drives and make them appear to the OS as one large drive, or some number of smaller drives. Storage was (and is) software-defined - at the level of the individual server.



  

 

While that might be technically correct, we still have a way to go before we get to what is currently considered SDS. It gets interesting when we take the general concept of RAID – using multiple resources as a single entity – and apply that to servers. This creates various kinds of “clusters” designed to improved performance, reliability or both. This is typical of something like Microsoft’s “Distributed File System”.

One problem encountered at this level is that shared storage resources cannot always truly act like physical drive. It’s often the case that you cannot use these shared file stores with certain applications, as they require a full implementation of a command protocol like SCSI or SATA. That’s where technology like iSCSI comes into play. It allows a complete storage command set (SCSI as you might guess) to communicate over a network link. Now it becomes possible to have truly virtualized drives, not simply shared file storage.

And that’s the level at which we get something that can truly be called “Software-Defined Storage”. All of these various technologies form a set of building blocks which allow a flexible pool of storage, spanning several servers. That storage can be divided-up (defined) as needed, expanded or contracted to meet business needs, and it works just like a local drive on the client systems which access that storage. That is the essence of “Software-Defined Storage”.

Of course that’s still a fairly primitive and basic implementation. Modern SDS configurations offer so much more. That will be the subject of the next post in this series.

By David Brown, Intel and Kenny Johnston, Rackspace

 

OpenStack is the world’s leading open source cloud operating  system. It’s been adopted by many of the world’s most prominent cloud service  providers and a growing list of global enterprises. Now the task at hand for  the OpenStack community is to address barriers to the widespread adoption of  OpenStack in the broad realm of enterprise environments and ensure the platform  is ready for the workloads of tomorrow.

 

In a word, that is the mission of the OpenStack Innovation  Center (OSIC). Launched in 2015 by Intel and Rackspace, the center is bringing  together teams of engineers to accelerate the evolution of the OpenStack  platform. Key areas of focus include improving manageability, reliability and  resilience, scalability, high availability, security and compliance, and  simplicity. The objective is to make OpenStack easy to install and deploy, with  all of the features of an enterprise-class computing platform.

 

To drive toward those goals, the center has launched an  OpenStack developer training program, assembled one of the world’s largest  joint engineering teams focused on upstream contributions to OpenStack, and  deployed the world’s largest OpenStack developer cloud.

 

While the training program is helping grow the OpenStack community,  the joint engineering team is following an open roadmap that is guiding their  development of new features in the OpenStack platform. This work is focused on  key platform challenges. To date, the team’s accomplishments include a long  list of enhancements to the building blocks for enterprise-ready OpenStack  environments, including Keystone, Tempest, Neutron, Swift, Ceilometer, Cinder,  Horizon, Nova, and Rally. This work includes rolling upgrades through support  for versioned objects and online schema migration; improvements in live migration  to counter service failures; scalability improvements through work on network  topology and IP capacity awareness; and early work to support multi-factor  authentication through one-time password support in Keystone. In addition, the  team is focused on testing each service within OpenStack to determine its  breaking point, including telemetry, instance provisioning of Nova APIs,  Autoscale in Heat, and Software Defined Networking in relation to third-party  plug-ins.

 

Meanwhile, Intel and Rackspace have launched a developer  cloud hosted by OSIC to empower the OpenStack community, ultimately comprised  of 2,000 nodes. To date, the first 1,000-node cluster has been brought online  and is being fully utilized to power work by OSIC participants, including such prominent  organizations as Cambridge University, IBM, Red Hat, Mirantis, and PLUMgrid.  Most of the current test cases focus on networking, storage, and provisioning  methods. The second cluster will be brought online and available to the  community in June of this year.

 

Since its launch, the OSIC is already delivering on the  things it set out to do. It is increasing the number of developers contributing  to upstream OpenStack code, enabling the broader ecosystem, and advancing the  scalability, manageability, and reliability of OpenStack by adding new features  and functionality and eliminating bugs.

 

All of this work makes OpenStack a more viable platform for  deployment in enterprise environments across a wide range of industries. In  delivering these gains, the work done by the OSIC is helping to bring the Intel®  Cloud for All vision to life — specifically, to unleash tens of thousands  of new clouds.

 

If you have an OpenStack test case that could benefit from  the resources of a world-class developer cloud, visit OSIC.org to request access.

In a recent blog post, I talked about the excitement that is growing around OpenStack and why you should be thinking about it. OpenStack is supported by tens of thousands of community members and more than 500 companies, and provides the foundation of a flexible private cloud that is gaining support across the business landscape. Cloud-centric companies, such as Netflix and Uber, have shown that fast innovation of digital services is key to surviving in today’s increasingly competitive business environment. But many cloud initiatives fail to deliver value simply because they’re implemented as cost-saving IT projects. Companies should instead view a private-cloud initiative as part of a larger organizational transformation because it can increase revenue and innovation while decreasing operational costs.

 

An OpenStack private cloud can provide an agile, API-accessible infrastructure foundation that lets developers integrate infrastructure directly into their application development, and provides the means to enable automated deployment. Companies can use an OpenStack foundation to transform their entire development and deployment process, which can lead to faster innovation of new digital services.

 

Mirantis, an Intel partner and a pure-play OpenStack company, suggests that companies follow an executive-driven, multi-phase approach to develop and implement a cloud strategy. These phases start with a small implementation that provides continuous integration and self-service capabilities, and then grows the OpenStack private cloud to a large implementation that additionally provides greater scalability and high availability. Each phase should contain measurable success metrics that validate your project’s return on investment (ROI), such as increased revenue from accelerated software development cycles, or increased developer productivity through deployment automation. Mirantis provides tools that are built on OpenStack itself to help you determine, in real-time, the value that your OpenStack private cloud is generating for your company, such as the revenue generated from accelerated deployments.

 

Remember that the real value in the cloud comes from how it enables fast innovation, not just its cost savings. Read our white paper, “The Business Case for Private Cloud,” to learn how to more effectively structure your private-cloud implementation. And for the latest information on private cloud technologies, be sure to follow me, @TimIntel, and my growing #techtim community on Twitter.

By Mauri Whalen, Vice President in the Software and Services Group and Director of Core System Software in the Open Source Technology Center at Intel

 

 

The message at Intel’s recent Cloud  Day event was clear:  we’re serious about making cloud deployments easier and faster through new products,  programs, and collaborations. OpenStack is critical to our cloud strategy, and we’re  excited about the OpenStack Mitaka release. I want to share some of the work  Intel is driving in the OpenStack community and through contributions to Mitaka.

 

The  OpenStack Innovation Center (OSIC), our collaboration with Rackspace, continues  to show strong momentum. The joint Intel/ Rackspace engineering team working at  OSIC has submitted 174 patches and reviewed almost 1,000 more. Additionally, the  OSIC environment itself is uniquely equipped to allow community testing of the  upstream code base at true enterprise scale. After opening the first of  two 1,000-node clusters to the community in October, we’ve seen great  response with  reservations for bare-metal allocations already at full capacity. Buildout of  the second cluster is nearing completion.

 

Turning  our attention to Mitaka, I look forward to what this release is delivering.  Intel has been very active in Mitaka, contributing tens of thousands of lines  of code targeting high availability for tenants and services, network and  storage support, and ease of deployment among other areas. Our team also  focused on improving the upgrade process, enabling the upgrade of many core  OpenStack components without downtime, and have made significant improvements  to live migration. These enhancements help enterprises deliver stable services,  supporting long-running enterprise workloads capable of withstanding  maintenance to the underlying infrastructure.

 

We  believe containers are critical to cloud computing, and we continue to push the  boundaries of what’s possible with this technology through our Clear Linux for  Intel Architecture project. From Intel Clear Container support in Magnum  to compiler techniques enabling architecture-specific optimizations at runtime  and leveraging the security features of Intel® Architecture, Intel is committed  to improving performance and security of containers in the cloud.

 

Finally, building on two successful hackathons Intel conducted with Huawei last year in  China to address OpenStack bugs, Intel upped the ante this year, joining with  six more corporate sponsors in bringing the worldwide community together for a  Global OpenStack Bug Smash March 7-9 that included new and experienced  developers, mentors, and official code reviewers. The results are impressive:  in 12 cities across 9 countries, 302 contributors authored patches to smash 293  bugs. Thanks to everyone who participated in the first global bug smash. We  look forward to many more!

 

These  efforts underscore Intel’s commitment to accelerating OpenStack adoption. I  look forward to continuing the discussion at OpenStack Summit Austin this week.  Be sure to join Intel in Austin to hear more  about how we’re improving the OpenStack experience for operators, community  members and developers alike.

By Manish Dave, Platform  Architect of Datacenter Security Products Division at Intel

 

The Intel Security Controller is  now Open Security Controller

 

As enterprise IT organizations discover the benefits of OpenStack, they  quickly realize that security can be a limiting factor for the platform’s growth  and adoption. When organizations look into virtualizing their networks and need  to re-architect their data centers to be more dynamic, manageable and adaptable,  previously effective security tools can no longer protect the virtual  infrastructure. This can lead to lack of security visibility, network  inefficiencies and unprotected east-west traffic.

 

In addition to that,  security integration on OpenStack can be resource intensive. Vendors of  security virtual network functions (VNF) and software-defined networks (SDN) have  to strike individual partnerships and do the integration on behalf of the  customer.

 

To address this issue, at OpenStack Summit Tokyo last October, Intel  presented a joint solution with Midokura to dynamically insert advanced security services like network IPS and next-gen  firewall into the OpenStack network. The Intel Security Controller was the  technology automating the deployment of security VNFs, while Midokura’s MidoNet  provided network virtualization system for this automation to happen.

 

This year, we’re taking a step further.

 

Today we are announcing the broadening of Intel Security Controller’s  scope so it becomes a platform that will support multiple SDN controllers and security  VNFs from multiple vendors. To reflect these changes, we’re renaming our platform  to Open Security Controller.

 

At Intel we realize the importance of open platforms to drive  innovations that significantly change the direction of the industry. Open  Security Controller takes this strategy to heart. It provides automated,  dynamic security provisioning, configuration and policy synchronization for SDI  and a seamless brokering service between SDN controllers and security VNFs,  making security management visible, more effective, agile and scalable.

 

Open Security Controller is part of Intel’s initiatives to accelerate  the adoption of network function virtualization. By working closely with user  community, datacenter security and network virtualization vendors, we are  building a platform that removes the integration burden for deploying advanced  security in OpenStack.

 

For more information or if you are interested in actively contributing  to the future of Open Security Controller, please visit us at http://www.intel.com/osc and contact us via email. We look forward  to hearing from you!

In their efforts to adapt to the demands of the digital  economy, the Internet of Things, and other disruptive changes, data centers are  facing big technical challenges in terms of flexibility and scale. This is all  because of traditional rigid architectures.

 

Today’s hardware infrastructure for data centers typically comes  as preconfigured 1U or 2U servers with individual processors, memory, I/O, and  network interface controller (NIC). To upgrade or add to this infrastructure, a complete system needs to be built and integrated into the rack, and connected  via management and virtual pooling. This system will essentially operate as a  single unit of compute, meaning its internal resources of CPU, memory, and  dedicated storage are accessed solely by that server, locking down resources that  are not always fully utilized.

To complicate the challenges, the conventional server  architecture is in general a vertical deployment model, with many different hardware/software  models present for management. So how can you overcome rigid, expensive,  time-consuming data center build-outs that can’t keep pace with the digital  demands of today? The answers are already here—in the form of disaggregation of  the data center rack.

 

With this new approach to the rack, a logical architecture  disaggregates and pools compute, storage, and network resources and provides a  means to create a shared and automated rack architecture that enables higher  performance, lower cost, and rapid deployment of services. At this point, agility  at hyperscale is no longer a distant dream. Add in analytics-based telemetry  exposed on the disaggregated management controller and you have the foundation  for a new logical architecture—a rack-level system.

 

This new logical architecture is available today in the form  of Intel®  Rack Scale Architecture. This architecture exposes a standard management  framework via REST APIs to discover and retrieve the set of raw components—like  drawers, blades, disks, and pooled resources like processors, memory, and NVMe  disks—in a rack and collectively in a pod. These resources can be provisioned  by a separate management network to compose a compute node or storage node.

 

In addition, a telemetry model is supported that exposes  capacity, capability, and bottlenecks at each component level, thus allowing  the right hosts to be composed for orchestrating a workload. Separation of the  data and virtual management plane from the hardware provisioning and management  plane and telemetry with analytics enables resources such as storage, memory,  and compute to be added as needed, creating flexibility and scalability that  can be fully utilized.

 

Of course, the success of this new logical architecture depends  on the creation of open standards for the configuration and management of the  rack components—such as compute, storage, network, and rack management  controllers. These standards allow IT organizations to connect various hardware  components together to form software-defined systems that more effectively utilize  all the hardware in a disaggregated rack.

Intel pioneered the rack scale concept, working closely with  key partners, OEMS, and standards bodies, such as DMTF Redfish. The players in  these collective efforts recognized the importance of working with standards  bodies to enable interoperable hardware, firmware, and software.

 

The resulting Intel Rack Scale Architecture is the result of  an open effort that allows Intel partners and solution providers to innovate  and create diverse solutions to give customers many different choices. At the  same time, the open approach establishes a platform for innovation at various  levels in the data center market. It allows for the architecture to evolve over  time based on hardware innovation and changing customer use cases.

At a personal level, the evolution of Intel Rack Scale  Architecture is particularly gratifying, given that I have been part of the  team that worked on this effort from its earliest days. We set out with a focus  on reducing TCO and meeting other business-driven objectives, and now we are  well on our way to achieving the vision, thanks in a large part to a tremendous  amount industry support. Already, many major OEMs are releasing products based  on Intel Rack Scale Architecture and are delivering innovative new designs to their  customers and end users.

Looking ahead, here is some of what we see on the horizon:

 

  • The ongoing disaggregation of compute, I/O,  memory, and storage, which will give data center operators the ability to  upgrade the different components independently of each other, everywhere in the  data center
  • The evolution to disaggregated NVMe-as-storage  solutions, pooled FPGA, and disaggregated networks delivered as solutions  architected in rack scale
  • The development of an agile orchestration of  hardware layer in open source solutions like OpenStack
  • The use of high-speed interconnections between  components with less copper and more prevalent optical/wireless technologies,  along with more security and telemetry at every level to drive more efficient  use of resources

 

If you happen to be at the OpenStack Summit in Austin this  week, you can catch multiple presentations on Intel Rack Scale Architecture  along with demos at the Intel booth.

 

And if you’re ready for a technical deep dive at this point,  you can explore the details of Intel Rack Scale firmware and software  components on GitHub: https://github.com/01org/IntelRackScaleArchitecture/

In just the last four months, Microsoft announced its Hololens VR headset, Google launched its VR view SDK that allows users to create interactive experiences from their own content, Facebook expanded its live video offering, Yahoo announced that it will live stream 180 Major League Baseball games, Twitter announced it will live stream 10 NFL games, Amazon acquired image recognition startup Orbeus and Intel acquired immersive sports video startup Replay Technologies.


Are these events unrelated or are they part of something bigger? To me, they indicate the next wave of the Visual Cloud. The first wave was characterized by the emergence of Video on Demand (e.g., Netflix), User Generated Video Content (e.g., YouTube) and MMORPG (e.g., World of Warcraft). The second phase will be characterized by virtual reality, augmented reality, 3D scene understanding and interactivity and immersive live experiences. To paraphrase William Gibson, the announcements I listed above indicate that the future is already here – it’s just not evenly distributed. And it won’t take long for it to spread to the mainstream – remember that YouTube itself was founded in 2005 and NetFlix only started streaming videos in 2007. By 2026, the second wave will seem like old technology. In the technology world, in five years, nothing changes; in ten years, everything changes.

 

But why now? As with any technology, a new wave requires the convergence of two things: compelling end user value and technology capability and maturity.

 

It’s pretty clear that this wave can provide enormous user value. One early example is Google Street View (launched 2007). I’m looking for a new house right now and I can’t tell you how much time I’ve saved not touring houses that are right next to a theater or service station or other unappealing neighbor. While this is a valuable consumer application, the Visual Cloud also unlocks many business and public sector applications like graphics-intensive design and modelling applications and cloud-based medical imaging.

 

But, is the technology ready? The Visual Cloud Second Wave is an integration of several technologies – some are well established, some still emerging. The critical remaining technologies will mature over the next few years – driving widespread adoption of the second wave applications and services. In my opinion, the key technologies are (in decreasing order of maturity):

 

1. Cloud Computing – the Visual Cloud requires capabilities that only cloud computing can deliver. In most ways, the Visual Cloud First Wave proved out this technology. These capabilities include:

 

    • Massive, inexpensive, on-demand computing. Even something as comparatively simple as speech recognition (think Siri, Google Now, Cortana) requires the scale of the cloud to make it practical. Imagine the scale of compute required to support real time global video recognition for something like traffic management.

 

    • Massive data access and storage capacity. Video content is big - a single high quality 4k video requires 30-50 GB of storage, depending on how it compressed.

 

    • Ubiquitous access. Many Visual Cloud applications are about sharing content between one user and another regardless of whether they might be in the world or what devices they are using to create and consume content.

 

    • Quick Start Development. The easy access to application development tools and resources through Infrastructure as a Service (IaaS) offerings like Amazon Web Services and Microsoft Azure make it much faster for innovative Visual Cloud developers to create new applications and services and get them out to users.

 

2. High Speed Broadband. See above re: Video Content is Big. Even today, moving video data around is a challenge for many service providers. Video is already over 64% of consumer internet traffic and is expected to grow to over 80% by 2019. High quality visual experiences also require relatively predicable bandwidth. Sudden changes in latency and bandwidth wreak havoc on visual experiences even with compensating technologies like HLS and MPEG-DASH. This is especially true for interactive experiences like cloud gaming or virtual and augmented reality. The deployment of wireless 5G technologies will be critical to enable the Visual Cloud to grow.

 

3. New End User Devices. – Most of these advanced experiences don’t rely solely on the cloud. For both content capture and consumption, devices need to evolve and improve. Device technologies like Intel® RealSense Technology’s depth images provide innovative visual information to applications that isn’t available from traditional devices. Consumption technologies and form factors like VR headsets are necessary to consume some experiences.

 

4. Visual Computing Technologies. While many visual computing technologies like video encoding and decoding, raster and ray traced rendering have been around for many years, they have not been scaled to the cloud in any significant way. This process is just beginning. Other technologies, like the voxel 3D point clouds used by Replay Technologies, are just emerging. Advanced technologies like 3D Scene Reconstruction and Videogrammetry have several years to reach the mainstream.

 

5. Deep Learning. Computer vision, image recognition, and video object identification have long depended on model based technologies like HOG. While these technologies have had some limited use, in the last couple of years, deep learning for image and video recognition– using neural networks to classify objects in image and video content as emerged as one of the most significant new technologies in many years.

 

If you’re interested in learning more about emerging workloads in the data center that are being made possible by the Visual Cloud, you can watch our latest edition of the Under the Hood video series or check out our Chip Chat podcasts recorded live at the 2016 NAB Show. Much more information about Intel’s role in the Visual Cloud can be found at www.intel.com/visualcloud.

groundbreaking.jpg

 

According to Gartner’s Hype Cycle, machine learning is the hot new trend in the technology industry.  Why all the hype and excitement around artificial intelligence, big data, machine learning and deep learning?  As many of us in the industry know, machine learning and neural networks are certainly nothing new.  The buzz however this time around is being driven by the confluence of multiple factors including bigger data (and more importantly, labeled data), advances in scale compute, algorithmic innovation – and most importantly, killer apps that can take advantage of a data explosion.  In a world with billions (and in the near future, tens of billions) of connected devices, the amount of unstructured data that is collected by large organizations has quickly become unmanageable by traditional analytical techniques. Machine learning (ML), and its related branch, deep learning (DL), provide excellent approaches to structuring massive data sets to generate insights and enable monetization opportunities.

 

Generally speaking, machine learning is a set of algorithms that learn from data.  However, ML these days isn’t your father’s simple regression technique that might have worked well on smaller data sets.  The explosion of unstructured data requires new algorithms to process it, and the ML/DL ecosystem is evolving quickly in support.  From a Deep Learning perspective, a great example of this is the recent Microsoft Research Imagenet winner, the 152 layer Residual Net.  This massive neural network has an amazing amount of representational power and actually outperforms human level performance on many visual recognition tasks.

 

 

These types of algorithms actually perform better the more data is consumed making them a perfect match for the unending amount of data created today, assuming of course we can efficiently annotate it. From an application perspective, ML is not limited to ImageNet and object recognition.  It has already changed the way we shop at websites like Amazon and the way we are entertained by services like Netflix. ML is also being leveraged by cyber security applications to adapt quickly to threats and financial services institutions for highly accurate fraud or insider trading detection.

 

To quote Sundar Pichai at Google, “"Machine learning is a core, transformative way by which we’re re-thinking how we’re doing everything.“

 

Because of this, Intel is investing heavily to enable the industry by providing a full stack solution for everything from highly scalable and efficient hardware to tools and libraries that will ease development and deployment of machine learning models into applications.

 

Starting at the lowest level, Intel is optimizing its hardware to target the highest single-node and cluster level performance including compute, memory, networking, and storage. This work builds on the capabilities of the Intel® Xeon® and Intel® Xeon Phi™ processor families, Intel® Solid-State Drives, new 3D XPoint memory technology, and Intel Omni-Path Architecture.  Our Intel Scalable System Framework (Intel SSF) configurations are designed to balance these technologies and efficiently and reliably scale to increasingly larger data sets.

 

Moving up the next level of the stack, a set of highly tuned and optimized libraries are required to truly extract maximum performance out of the hardware.  Enhancements and additions are being made to the Intel Math Kernel Library, which provides a set of tuned math primitives, and the Intel Data Analytics Acceleration Library, which optimizes and distributes a broad set of machine learning algorithms.  These libraries also abstract the complexity of the underlying hardware and instruction set architecture (ISA) providing a level of programming that is comfortable for most developers while still highly performant. In addition to enhancing the libraries themselves, we are actively integrating with and contributing code back to key open source projects that are influential in machine learning.  This includes seminal projects like Caffe from UC-Berkeley, the Apache-Spark project, Theano from the University of Montreal, Torch7 which is used by Facebook and Twitter and others like Microsoft’s CNTK and Google’s Tensor Flow.

 

On an even broader front, Intel is accelerating enterprises and application developers looking to use Machine Learning through the open source Trusted Analytics Platform (TAP) project which provides everything from big data infrastructure and cluster management tools to model development and training and application development and deployment resources. To further reduce friction for developers, TAP works with or is pre-integrated with popular frameworks and toolkits such as Spark-MLLib, H20, DL4J from Skymind and DataRobot to name a few.

 

For a deeper dive into Intel’s strategy, libraries and recent customer activities in the machine learning space, you can explore the slides from a machine learning session at the recent 2016 Intel Developer Forum in Shenzhen China. You can also access the video from a talk I gave in late March at the 2016 Hadoop + Strata World in San Jose.

 

Please stay tuned for more announcements and initiatives throughout 2016 from Intel regarding machine learning!

Machine learning holds the promise of not only structuring vast amounts of data but also to create true business intelligence

 

The sheer volume and unstructured nature of the data generated by billions of connected devices and systems presents significant challenges for those in search of turning this data into insight. For many, machine learning holds the promise of not only structuring this vast amount of data but also to create true business intelligence that can be monetized and leveraged to guide decisions.analyze-100653939-primary.idge.jpg

 

In the past, it wasn’t possible or practical to implement machine learning at such a large scale for a variety of reasons. Recently, three major advances have enabled more organizations to take advantage of machine learning to enhance business intelligence:

 

1) Bigger data (and more importantly, better labeled data)

 

2) Better hardware throughout datacenters and high performance computing clusters

 

3) Smarter algorithms that can take advantage of data at this scale and learn from it

 

Machine learning, generally speaking, refers to a class of algorithms that learn from data, uncover insights, and predict behavior without being explicitly programmed. Machine learning algorithms vary greatly depending on the goal of the enterprise and can include various algorithms targeting classification or anomaly detection, clustering of information, time series prediction such as video and speech and even state-action learning and decision making through the use of reinforcement learning. Ensembling, or combining various types of algorithms, is also common as researchers continue to push the state of the art and attempt to solve new problems. The machine learning arena moves very fast and algorithmic innovation is happening at a blistering pace.

 

With machine learning, enterprises can generate predictive models in order to accurately make predictions based on data from large, diverse, and dynamic sources such as text and metadata, speech, videos, and sensor information. Machine learning enables the scale, speed, and accuracy needed to uncover never-before identified insights. The promise of accurate, actionable, and predictive models will drive it to play a larger and larger role in business intelligence as data continues to get more and more unmanageable by humans. This enhanced intelligence provides utility in myriad ways across many industries including Health Sciences for medical imaging, financial services for fraud detection, and cloud service providers and social media platforms for services like powering automated “personal assistants,” image detection, and measuring sentiment and trends. There really is no end to the applicability of machine learning.

 

Banks, as an example, are applying machine learning algorithms to predict the likelihood of mortgage defaults and risk profiles. By retrospectively analyzing historical mortgages and labeling them as either acceptable or in default, a lender could leverage a trailing data set to build a more reliable analytical model that delivers direct and measurable value well into the future. By crafting models like this that learn from historical experiences, banks can more accurately represent mortgage risk, thereby reducing defaults and improving loan profitability rates.

 

In order to efficiently develop and deploy machine learning algorithms at scale, enterprises can leverage powerful processors like the Intel® Xeon® processor E7 family to deliver real-time analytics services, and open up new data-driven business opportunities. Organizations can also turn to highly parallel Intel® Xeon Phi™ processors to enable dramatic performance gains for highly data parallel algorithms such as the training of Deep Neural Networks. By maintaining a single source code between Intel Xeon processors and Intel Xeon Phi processors, organizations can optimize once for parallelism but maximize performance across architectures.

 

Of course, taking advantage of the latest hardware parallelism requires updating of the underlying software and general code modernization. This new level of parallelism enables applications to run a given data set in less time, run multiple data sets in a fixed amount of time, or run large-scale data sets that were previously prohibitive. By optimizing code running on Intel Xeon processors we can therefore deliver significantly higher performance for core machine learning functions such as clustering, collaborative filtering, logistic regression, support vector machine training, and deep learning model training and inference resulting in high levels of architectural, cost, and energy efficiencies.

 

Advances in high performance computing (HPC) and big data infrastructure combined with the computing capabilities of cloud infrastructure are fueling a new era of machine learning, and enabling enterprises to discover valuable insights that can improve their bottom line and customer offerings.

 

 

This blog originally appeared on InfoWorld.com.

There isn't always an expert with all the skills you need.

 

data-insight-graphic-100653821-primary.idge.jpgFor businesses, the days of the renaissance person have passed. Someone like a utility infielder who has some experience with a lot of functions used to be quite valuable because you could put them in wherever they were needed. But today’s organizations are so complex that they need people with expertise in very specific areas. These days it pays - frequently quite well - to be a specialist. For example, if you’re a data scientist, IT security expert, or computer engineer, there are people waiting to meet you in HR departments all across the globe.

 

But expertise can have its limits.

 

While it is always a good idea to have an expert doing what she or he is expert at, it is wrong to think that there is always an expert with all the skills you need. Especially when it comes to dealing with Big Data. This is one of the most complex, evolving, ambiguous, and important areas of business development today, and many companies are seeking a qualified expert to help them rein it in. But is acquiring one Big Data expert really the best path to success?

 

We’re at a time when the renaissance person is being replaced by the renaissance team. If you want to have success with your Big Data project, you need a group with many skill sets. So it’s important to recruit individuals with diverse capabilities which complement each other instead of spending your time searching for a mythical individual who can do it all. Of course the team has to include people who know math, statistics, and science - but those skills alone are not enough. After all, you can’t just point data scientists at your data and say “Go find stuff.”

 

You need to recruit people who can think about data in novel ways. So in addition to people who know about handling data, you also need people who know about your customers: their culture, their psychology, their behavior. To that end, you may want to consider onboarding a sociologist or others from the “soft” sciences.

 

When putting together your team, first figure out the skills you need and then find the people who have them, regardless of the field they’re in. This kind of team will be able to look at your data in a variety of ways. The people who are experts at handling data will give insights to those who understand your business needs and vice versa.

 

Having a variety of backgrounds and experiences is essential because there is no single way to interpret or process data. A collaborative, collective approach gives you greater insight into how your data analytics work. The sum of those unique perspectives is greater than the parts, and will continue to feed your decision-making power well into the future.

 

Of course there are some skills everyone on the team should have. They all need to be creative, able to handle ambiguity, and be effective at communicating. They should ideally possess some familiarity with the other teammates’ primary skill sets. It’s also good if your hard science types are a little unorthodox. A little bit of unconventional thinking can go a long way.

 

Data is like ore: Unless it is properly refined, shaped, and forged, it’s just a lump of rock. Doing everything needed to find the gold hidden inside is more than we can expect of any one person.



This blog originally appeared on InfoWorld.com.

I’m posing that question somewhat rhetorically.  The answer happens to be the theme for Percona* Live 2016 – “Database Performance Matters!” Databases are ubiquitous- if not also invisible, managing to hide in plain sight.  Reading this blog?  A database was involved when you signed in, and another one served up the actual contents you are reading.  Buy something from Starbucks this morning and use their app to pay?  I’m not an expert on their infrastructure, but I will hazard a guess that at least one database was involved.

 

So why the mention of Percona Live 2016?  Well, recently I was offered the opportunity to speak at the conference this year.  The conference takes place April 18-21.  For those able to attend, the session I’m delivering is at 11:30am on April 19th.  The session is titled “Performance of Percona Server for MySQL* on Intel® Server Systems using HDDs, SATA SSDs, and NVMe* SSDs as Different Storage Mediums”, creative and lengthy, I know…  Without revealing the entirety of the session, I’ll go into a fair amount of it below.  I had a framework in mind that involved SSD positioning within MySQL, and set out to do some additional research before putting the proverbial “pen to paper” to see if there was merit.  I happened upon a talk from Percona Live 2015 by Peter Zaitsev, CEO of Percona, coincidentally titled “SSD for MySQL”.  It’s a quick read, eloquent and concise, and got me thinking- just how much does storage impact database performance?  To help understand the answer, I need to offer up a quick definition of storage engines.

 

Database storage engines are an interesting topic (to me anyway).  The basic concept behind them is to take a traditional database and make it function as much as possible like an in-memory database.  The end goal being to interact with the underlying storage as little as possible, because working in-memory is admittedly preferred/faster than working with storage.  Generally speaking, performance is good and consistent so long as the storage engine doesn’t need more memory than it has been allocated.  In situations where allocated memory is insufficient, and these situations do arise, what happens next can make or break an application’s Quality of Service (QoS).

 

Percona Server, with its XtraDB* storage engine, is a drop-in replacement for MySQL.  So, I figured it was time for a quick comparison of different storage solutions behind XtraDB. One aspect I would be looking at is how well XtraDB deals with memory pressure when a database’s working set exceeds the RAM allotted to XtraDB.  This can be greatly influenced by the storage subsystem where the database is ultimately persisted.

 

To simulate these situations, I decided I would run a few benchmarks against Percona Server with its storage engine capped at sizes less than the raw size of the databases used in the benchmarks.  This would create the necessary memory pressure to induce interaction with storage. For the storage side of the equation, I decided to compare a RAID of enterprise-class SAS HDDs against a SATA SSD and also against an NVMe SSD.  My results are presented as relative to those of the HDD solution.  Rather than report raw numbers, the focus here is to highlight the impact storage selection has on performance rather than promote any single configuration as a reference MySQL solution.


I used the following tools to perform the benchmarking:

  • SysBench* 0.5: Open source, cross platform, scriptable, and well-known in the MySQL world.  SysBench provides modules for testing multiple aspects of a server, and for my testing, I used the modules for file I/O performance and database server performance (OLTP).  For SysBench results, well, I have to keep something for the talk at Percona Live, not to mention the brevity of this blog, so those are not recapped below.
  • HammerDB* 2.19: Also open source and cross platform, HammerDB provides a nice wrapper for running workloads based on/similar to TPC-C and TPC-H created by the Transaction Processing Performance Council (TPC*). HammerDB results are illustrated below.


Moving on to the base server platform:

 

And the underlying storage configurations tested:

  • HDD- HW RAID 10 comprised of the following:
    • 6x Enterprise-class 15K RPM, 600 GB, 12Gbps SAS HDDs
      • Raw capacity: 3600 GB
      • Capacity as configured: 1800 GB
  • SATA SSD:
  • NVMe SSD:


Next the software stack:

  • CentOS* 7.2 (64-bit)
  • Percona Server 5.7.11-4 (64-bit)
  • SysBench 0.5
  • HammerDB 2.19 (64-bit)
  • Inbox NVMe and RAID Controller drivers
  • XFS file system

 

Results:

 

The table below recaps some of the high level observations from these tests:


HammerDB TPC-C (NOPM)

Figure 1

HammerDB TPC-H (Run Time)

Figure 2

HammerDB TPC-H (QPH)

Figure 3

Performance gains of up to 53% for SATA

Reduction in in run time up to 23% for SATA

Up to 29% more Queries per Hour for SATA

Performance gains of up to 64% for NVMe

Reduction in run time up to 46% for NVMe

Up to 84% more Queries per Hour for NVMe

 

hammerdb-tpcc_1.png
Figure 1- HammerDB TPC-C Test: Relative Throughput Compared to HDD HW RAID 10
hammerdb-tpch_1.png
Figure 2- HammerDB TPC-H Test: Relative Run Time Compared to HDD HW RAID 10
hammerdb-tpch_2.png
Figure 3- HammerDB TPC-H Test: Relative Throughput Compared to HDD HW RAID 10

 

 

Recap:

All in all, this was an interesting (if not fun) exercise.  Six HDDs or a single SSD?  Relative performance results aside, one should also consider power consumption, reliability, and opportunity cost savings that derive from performance gains over the life time of a hardware platform, as often these can be more substantial than the upfront costs.  Speaking of upfront costs, the Percona Live talk itself also addresses the relative upfront cost of each storage configuration, which makes for an interesting conversation when that information is juxtaposed against usable capacity and performance results. 

 

Additional configuration details:

 

Additional, non-default, configuration parameters for HammerDB and Percona Server for these tests:

 

For HammerDB with TPC-C Option

  • Within HammerDB
    • Number of warehouses : 1140
    • Number of users: 31
    • Timed Test Driver Script: 2 minute ramp, 5 minute duration
  • The [mysql] innodb settings in my.cnf:
    • innodb_buffer_pool_size=10240M (~1/10th the database size, induce memory pressure)
    • innodb_log_file_size = 2560M
    • innodb_log_files_in_group = 2
    • innodb_log_buffer_size = 8M
    • innodb_flush_log_at_trx_commit = 0
    • innodb_checksums = 0
    • innodb_flush_method = O_DIRECT

 

For HammerDB with TPC-H Option

  • Within HammerDB
    • Scale Factor: 10
    • Number of users: 24
  • The [mysql] innodb settings in my.cnf:
    • innodb_buffer_pool_size=10240M (~ ½ the database, induce memory pressure)
    • innodb_log_file_size = 2560M
    • innodb_log_files_in_group = 2
    • innodb_log_buffer_size = 8M
    • innodb_flush_log_at_trx_commit = 0
    • innodb_checksums = 0
    • innodb_flush_method = O_DIRECT

 

Disclaimers

 

Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as HammerDB, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Source: Internal Testing

 

Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, visit Intel Performance Benchmark Limitations.

 

Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase.  For more complete information about performance and benchmark results, visit http://www.intel.com/performance.   

 

Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.

 

Copyright © 2016 Intel Corporation.  All rights reserved.

*Other names and brands may be claimed as the property of others.


Percona Live Banner.jpg

At Intel we are have been working on the optimization of the Ceph storage architecture for many of the key IT applications including MySQL, cloud, and big data. There have been a number of enhancements around the use of solid state drives (SSD) for Ceph that has caught the attention of the open source community. This work is important as CIOs see the promise of cloud and rethinking how open source can assist them as a cost effective environment for private cloud deployment.

 

I was excited to hear that Intel will be sharing our learnings at the Percona Live Data Performance Conference held in Santa Clara April 18-21. Come to Percona Live next week to join others in the open source community and learn from the best minds working on MySQL, NoSQL, cloud, big data and Internet of Things (IoT). Whether you are a DBA, developer or architect you can learn from your peers on the best methods for open source IT services on Ceph.

 

At this year’s conference, Intel is sponsoring the Data in the Cloud track. Intel solution architects will be giving 3 talks in the Big Data track. And, Reddy Chagam, Intel’s chief Software Defined Storage Architect, will be on the keynote panel. Here is a brief look at the panel and 4 Intel breakout session talks you should plan to attend.


We look forward to seeing you at Percona Live.


Keynote Panel with Reddy Chagam, Principal Engineer and Chief SDS Architect, Intel

Wednesday April 20 at 9:25am

Data in the Cloud Keynote Panel: Cloudy with a chance of running out of disk space? Or Sunny times ahead?

 

As larger and larger datasets move to the cloud new challenges and opportunities emerge in handling such workloads. New technologies, revamped products, and a never ending stream of idea’s follow in the wake of this advance. These aim to improve the performance and manageability of cloud based data, but are they enough? What issues still need to be worked out? and Where are we going as an industry?

 

Accelerating Ceph for Database Workloads with an all PCIe SSD Cluster

Tuesday April 19, 3:50pm in Room 203

Reddy Chagam, Principal Engineer and Chief SDS Architect, Intel

Tushar Gohad, Senior Engineer, Intel

 

PCIe SSDs are becoming increasingly popular for deploying latency sensitive workloads such as database and big data n enterprise and service provider environments. Customers are exploring low latency workloads on Ceph using PCIe SSDs to meet their performance needs.  In this session, we will look at a high IOPS, low latency workload deployment on Ceph, performance analysis on all PCIe configurations, best practices and recommendations.

 

Performance of Percona Server for MySQL on Intel Server Systems using HDD, SATA SSD, and NVMe SSD as Different Storage Mediums

Tuesday April 19, 11:30am in Room 203

Ken LeTourneau, Solutions Architect, Intel

 

This talk looks at the performance of Percona Server for MySQL for Linux running on the same Intel system but with three different storage configurations. We will look at and compare the performance of (1) a RAID of HDD vs. (2) a RAID of SATA SSD vs. (3) a RAID of NVMe SSD.  In the talk we’ll cover the hardware and system configuration and then discuss results of TPC-C and TPC-H benchmarks. Then we’ll look at the overall system costs including hardware and software, and cost per transaction/query based on overall costs and benchmark results.

 

Increase MySQL Database Performance with Intel® SSD plus Cache Acceleration Software

Wednesday April 20 1pm in Room 203

David Tuhy, Senior Director of Business Development Intel, Non-Volatile Memory Solutions Group, Intel

 

The primary storage for many of today’s databases is usually arrays of rotational media which can typically quickly becomes the largest bottleneck for database application performance. Improving database application performance by replacing slower HDDs with faster solid-state drives (SSDs) and providing a method to cache frequently accessed files can greatly improve application performance and reduce support costs. This talk shows how Intel® SSD with Intel® CAS can increase database performance immediately, without a modification or change to the existing applications or storage media back-end.

By: Chad Arimura, Iron.io CEO and Co-founder

 

You may have heard the term serverless computing being tossed around recently. This doesn’t mean we’re getting rid of the data center in any form or fashion, it simply means that we’re entering a world where developers never have to think about provisioning or managing infrastructure resources to run distributed applications at scale. This is done by decoupling backend jobs as independent micro services that run through an automated workflow when a predetermined event occurs. For the developer, it’s a serverless experience.

 

There have been two separate, but related innovations that enable this software defined future. One at the application layer where APIs and developer tools abstract away the complex configuration and operations required to distribute and execute workloads at massive scale. The other at the infrastructure layer, where workloads are profiled for their most optimal hardware conditions. These innovations narrows the gap between developer and chip, leading to more intelligent workload-aware systems.     

 

At Iron.io, we have been leading this serverless computing movement through our container-based job processing platform. Through an event-driven model, developers simply package their job code as a lightweight Docker image and set when they want it to run; on a regular schedule, from a web hook, when a sensor goes off, or from a direct API call. When the event triggers, an automated workflow kicks off behind the scenes to spin up a container and execute the job. All that’s needed is available infrastructure resources to distribute the workloads.

 

This convergence between the application layer and the infrastructure layer is where Intel and Iron.io intersect. For example, an encryption workload is best run on an Intel platform that uses CPUs that have Intel® AES-NI instructions, but it shouldn’t have to be up to the developer to make that call. The Snap Framework collects the telemetry data that tells Iron.io where best to deliver the job. This is done by including a Snap plugin within the Iron.io runtime that captures the memory, CPU, and block I/O environment for each container process. In addition, Snap can capture advanced characteristics using Intel® RDT (Cache Monitoring Technology and Cache Allocation Technology). The data is then analyzed and published back to Iron.io so the next time the job is triggered to run, it can be routed to the right Intel processor with the right resource allocation.

 

This collaboration between Intel and Iron.io represents the future of workload-aware computing environments. When dealing with web scale and beyond, incremental optimizations led by software defined infrastructure can make the difference between success and failure. We’re excited to collaborate with Intel to empower the modern Enterprise with a serverless future.

This year’s Strata + Hadoop World conference, held March 28-31 in San Jose, marked an interesting event – the 10th anniversary of Apache Hadoop*. Though Hadoop’s birthday was celebrated with circus-like festivities on the show’s second evening, I did notice that this year’s conference centered less on Hadoop than in previous years. Instead, the conference focused more on the cluster-computing analytics framework Spark, Internet of Things (IoT) technologies, and the ongoing challenge of deriving analytical insights from big data.

 

Intel had a substantial presence at the show, with a number of keynote speakers, sponsored sessions, an announcement about cloud infrastructure provider Rackspace and the open source Trusted Analytics Platform (TAP) driven by Intel, and other events to showcase its latest advances in big data and Internet of Things (IoT) technologies.

 

Thursday morning’s keynotes got off to a bang when writer and entrepreneur Alistair Croll (@acroll) welcomed the audience with a story of his lost jeans. Apparently, Croll’s luggage was misplaced on the way to San Jose, which sent him on a fruitless shopping trip to the mall to buy a replacement pair.

 

This proved to be the perfect set-up for Bob Rogers (@scientistBob), chief data scientist for big data solutions at Intel, and his presentation “Advanced analytics and the mystery of the missing jeans.” Rogers discussed how Intel and Levis has been working together to address a major problem in retail: Inventory accuracy in brick and mortar stores is only 65 percent. This means that 35 percent of the time, merchandise that is supposed to be in stock, isn’t. Rogers provided an overview of an analytics retail solution built for Levis that links IoT data from RFID inventory tags, video cameras, and sensors and transmits it via Intel® IoT Gateways to cloud-based advanced analytics engines on TAP. Watch this video for a glimpse into how Intel’s IoT and advanced analytics technologies have helped Levis make smarter business decisions and better serve its customers.

 

Bridget Karlin, the managing director of Intel’s Internet of Things Group, joined Bob Rogers for another session on IoT and TAP called “Master the Internet of Things with Integrated Analytics.” Intel offers a complete IoT platform that begins with reference architectures and extends to products and technologies from Intel and its partners to create an open, secure and scalable approach to IoT solutions. Karlin and Rogers discussed how applying analytics platforms such as TAP to rich data streams from IoT networks have the capacity to deliver enormous value to many industries, including healthcare, energy and utilities, and retail.

 

tap-iot-platform.png

Figure 1. Intel’s IoT Platform extends from edge to advanced analytics in the cloud.

 

These IoT systems are particularly powerful when they offer real-time analytics, not just from the cloud, but also from the network edge. Intel has worked closely with SAP to deliver an end-to-end IoT solution that can deliver actionable business insights on a near real-time basis. Watch this video featuring Karlin and Irfan Khan, CTO for Global Customer Operations at SAP, to learn more about the joint Intel-SAP IoT solution, and read the solution brief Business Intelligence at the Edge to find out how real-time data from the network edge helps protect remote workers and improves customer engagement and sales in retail applications.

 

During the same week as Strata + Hadoop World, Intel also announced the ]general availability of the new Intel® Xeon® processor E5-2600 v4 product family](https://newsroom.intel.com/news-releases/intel-makes-move-to-the-cloud-faster-easier/), which provides a strategic foundation for building modern, software-defined cloud infrastructures. The new processor family delivers improved performance for cloud workloads, with more than 20 percent more cores and cache than the prior generation , plus enhanced security and faster memory support. According to recent benchmarks, Intel Xeon processor E5-2600 v4 chips delivered up to 1.22x higher performance on procedural workloads  (such as MapReduce workloads on Apache Hadoop clusters). The new chips also delivered up to 1.27x higher performance for BigBench queries , a benchmark that measures efficient processing of big data analytics.

 

Included in the release was news that Intel is expanding its popular Intel® Cloud Builders program, which brings together reference architectures, solutions blueprints and leading solution providers to help facilitate the delivery of modern computing infrastructures, to include software defined infrastructure use cases. The Cloud Builders program is now joined by Intel® Network Builders and Intel® Storage Builder programs, which aim to accelerate adoption of software-defined storage and network innovations.

 

My last stop at the conference was at the SAP kiosk, where I filmed a Periscope video of our friend, Karen Sun introducing SAP HANA Vora. Vora* is an in-memory computing engine for Hadoop that runs on the Apache Spark* execution framework and helps sift massive volumes of unstructured data in hierarchies to simplify big data management in SAP HANA and Hadoop environments. Intel contributed engineering and enablement efforts for Spark, which SAP HANA Vora is based on, to maximize performance and security on Intel® architectures.

 

From lost jeans to Vora, Strata + Hadoop World was a busy, eventful show with engaging, provocative keynotes and events. The highlights are available for viewing on the Strata web site.

 

Follow me at @TimIntel and #TechTim to keep up with the latest with Intel and SAP.

Filter Blog

By date:
By tag:
Get Ahead of Innovation
Continue to stay connected to the technologies, trends, and ideas that are shaping the future of the workplace with the Intel IT Center