Skip navigation

Next-Generation Sequencing (NGS) technologies are transforming the bioinformatics industry. By sequencing whole human genomes at rates of up to 18,000 per year—an average of one genome every 32 minutes— new sequencers have broken the $1,000 genome barrier[1] and opened the door to population studies and clinical usage models that have not been possible before.


Of course, high-volume sequencing generates an enormous amount of data that must be analyzed as fast as it is produced. According to Illumina, it would take an 85-node high performance computing (HPC) cluster to keep pace with its top-of-the-line HiSeq X™ Ten sequencer operating at full capacity.[2]


Drive Down the Cost of Analyzing Your Genomes

Working together, Qiagen Bioinformatics and Intel have developed a reference architecture for a 35-node cluster based on the Intel® Xeon® processor E5 v3 family that meets these same performance requirements, while reducing total cost of ownership (TCO) by as much as $1.4 million over four years. [3] Depending on sequencing volumes and data center efficiency, this solution could enable full analysis of whole human genomes for as little as $22 each.


The Qiagen Bioinformatics and Intel reference architecture uses CLC Genomics Server with the Biomedical Genomics Server extension, which is highly optimized for Intel architecture. The Biomedical Genomics Server solution provides all the advanced tools and capabilities of CLC Genomics Workbench and the Biomedical Genomics Workbench, but is designed specifically for HPC clusters. Geneticists benefit from a powerful workbench, high quality results, and intuitive interfaces that insulate them from the complexities of cluster computing.


Manage Your Data on Massively-Scalable, Centralized Storage

Fast, scalable storage is as important as cluster performance for NGS. The reference architecture includes a 165 TB storage solution based on Intel® Enterprise Edition for Lustre*. Intel packages this open source software with powerful management tools and offers 24/7 support to help organizations manage and protect their data and maintain high reliability and uptime.


This centralized storage system uses high-capacity commodity disk drives to keep costs low, plus a small number of Intel® Solid State Drives (Intel® SSDs) to accelerate the operations that are most critical for fast genome analysis. Like the compute cluster, the storage system is designed to scale on-demand, so you can accommodate rapid growth in a straightforward and cost-effective manner.


Lay the Foundation for Next-Generation Breakthroughs

Today’s powerful NGS technologies will help scientists, labs, and clinics deliver the next wave of scientific and medical innovation. A fast, scalable, and affordable analytics solution can simplify your journey and help keep your costs under control.


Read the story on the Qiagen Blog


Learn more at:


[1] Based on the published output capacity of the Illumina HiSeq X Ten next-generation sequencer.

[2] Source: A workflow for variant calling based on BWA+GATK in the HiSeq XTM System Lab Setup and Site Prep Guide (Part # 15050093 Rev. H July2015). Current version for September 2015 can be found at:

[3] Based on internal performance tests and a total cost of ownership analysis performed by Qiagen Bioinformatics and Intel. Performance tests were conducted on a 16-node high performance computing (HPC) cluster. Each node was configured with 2 x Intel® Xeon® processor E5-2697 v3 (2.6 GHz, 14 core), 128 GB memory, and a 500 GB storage drive. All nodes shared a 165 TB storage system based on Intel® Enterprise Edition for Lustre, 256 TB of 7.2K RPM NL-SAS disk storage and 4 x 800 GB Intel Solid State Drive Data Center S3700 Series supported by an Intel® True Scale™ 12300 - 36 Port QDR Infiniband Switch and a 2x Intel® True Scale™ Single Port HCA’s – (QDR-80 configured by default). The TCO analysis was performed using an internal Intel tool and publicly available product pricing and availability as of October 9, 2015. The TCO for the test cluster was estimated over a 4-year period and compared with the estimated TCO of an 85-node cluster, as described in the Illumina HiSeq X System Lab Setup and Site Prep Guide, Document # 15050093 v01, September 2015.  To quantify the TCO comparison, specific products were chosen that would fulfill the general specifications defined within the Illumina guide.  Support costs for both systems were estimated as 60 percent of TCO. The performance and TCO results should only be used as a general guide for evaluating the cost/benefit or feasibility of a future purchases of systems. Actual performance results and economic benefits will vary, and there may be additional unaccounted costs related to the use and deployment of the solution that are not or cannot be accounted for.

We’re experiencing ever-increasing volumes of data within health and life sciences. If we were to sequence just once the ~14M new cancer patients (T/N) worldwide[1], it would require more than 5.6 Exabytes (and the reality is we need to be able to sequence them multiple times during the course of treatment using a variety of omics and analytics approaches). The technical challenges of big data are many, from how do we manage and store such large volumes of data to being able to analyse hugely complex datasets. However, we must meet these challenges head-on as the rewards are very real.


I’m pleased to tell you about a significant project that Intel is supporting to help overcome these types of challenges which will assist in the drive to comprehensively analyse cancer genomes. Our HPC solutions are already facilitating organisations around the world to deliver better healthcare and individuals to overcome diseases such as cancer. And our relationship with the Pan-Cancer Analysis of Whole Genomes (PCAWG) project is helping scientists to access and share analysis of more than 2,600 whole human genomes (5200 matched Tumor/Normal pairs).


Scientific discovery can no longer operate in isolation – there is an imperative to collaborate internationally working across petabytes of data and statistically significant patient cohorts. The PCAWG project is turning to the cloud to enhance access for all which will bring significant advances in healthcare through collaborative research.


By working directly with industry experts to accelerate cancer research and treatment, Intel is at the forefront of the emerging field of precision medicine. Advanced biomarkers, predictive analytics and patient stratification, therapeutic treatments tailored to an individual’s molecular profile, these hallmarks of precision medicine are undergoing rapid translation from research into clinical practice. Intel HPC Big Data/Analytics technologies support high-throughput genomics research while delivering low-latency clinical results. Clinicians together with patients formulate individualized treatment plans, informed with the latest scientific understanding.


For example, Intel HPC technology will accelerate the work of bioinformaticists and biologists at the German Cancer Research Centre (DKFZ) and the European Molecular Biology Laboratory (EMBL), allowing these organisations to share complex datasets more efficiently. Intel, Fujitsu, and SAP are helping to build the infrastructure and provide expertise to turn this complex challenge into reality.


The PCAWG project is in its second phase which began with the uploading of genomic data to seven academic computer centres, creating what is in essence a super-cloud of genomic information. Currently, this ‘academic community cloud’ is analysing data to identify genetic variants, including cancer-specific mutations. And I’m really excited to see where the next phase takes us as our technology will help over 700 ICGC scientists worldwide to remotely access this huge dataset, performing secondary analysis to gain insight into their own specific cancer research projects.


This is truly ground-breaking work made possible by a combination of great scientists utilising the latest high-performance big data technologies to deliver life-changing work. At Intel it gives us great satisfaction to know that we are playing a part in furthering knowledge in both the wider genomics field, but also specifically in better understanding cancer which will lead to more effective treatments for everyone.




As Bio-IT World approaches this week, we are sharing a pre-show guest blog series from industry experts on trends you can expect to hear about at the event. Below is a guest contribution from Dr. Pek Lum, Chief Data Scientist and VP of Solutions, Ayasdi.


It seems like every time that I turn on the news, I hear another story about someone with a traumatic brain injury whether they received it on the football field or the battlefield. The fact is that traumatic brain injury (TBI) is something that can affect anyone at anytime. A minor car accident, a trip on the stairs, or a tumble on a bike, can send you to an emergency room. In fact, up to one-third of people hit their head hard enough to go to the hospital by their mid-twenties, according to UCSF. Thankfully, most of those head injuries are mild and patients feel better in a few days. However, that’s not true for roughly 20 percent of patients who go on to develop persistent problems—such as depression, memory issues, or headaches—that can last weeks, months, or sometimes even years.


When concussion patients arrive at an emergency room, doctors diagnose the severity of these injuries based on clinical behavior and CT or MRI scans. But, those scans don’t tell us everything. They can often miss subtle physical injuries and they tell us nothing about which patients will be fine in a few days and which will go on to develop lingering adverse effects.


These are the questions that Dr. Adam Ferguson and Dr. Esther  Yuh at The Brain and Spinal Injury Center (BASIC) at UCSF wanted Ayasdi’s help to answer.


Advanced brain scan data from Diffusion Tensor Imaging (DTI) can reveal brain abnormalities that would not necessarily show up on an MRI or CT scan. That said, the amount of data that is generated from a DTI scan is massive and incredibly complex.  There are millions of questions that could be asked of the data and it takes time and money for Data Scientists to find the right ones, if they do at all. It’s literally like trying to find a needle in an enormous haystack. Ayasdi, on the other hand, is perfectly suited to this kind of task. The Ayasdi Cure application uses Topological Data Analysis (TDA), combined with an ensemble of machine learning techniques, to enable domain experts and Data Scientists to easily discover insights automatically. In fact, Ayasdi Cure is now processing 400 percent faster because the application is now optimized to run Intel® Xeon® processor, including the Intel® Math Kernel Library (MKL) and Intel® Advanced Vector Extensions (AVX), which dramatically reduces the time to insight.


What is abundantly clear is that the labels of mild, moderate, and severe head injury are too simplistic and don’t tell us the full story. By collaborating closely with UCSF and partnering with Intel, we hope to find more sensitive TBI markers that will define more precise patient subpopulations.


By doing this, doctors can potentially predict whether someone will develop complications at the time of the fall or hit.  Knowing this information ahead of time can help both doctors and patients mitigate complicating factors and potentially avoid them altogether.


This vision of the future is not just a dream. We’re making it happen. Our preliminary work is so promising that Ayasdi and UCSF were awarded the GE/NFL “Head Health” award to explore how to better diagnose and treat mild traumatic brain injuries in professional football players.  We are looking forward to sharing our findings in the coming year.


How are you tackling massive amounts of clinical and genetic data?


Dr. Pek Lum is the Chief Data Scientist and VP of Solutions at Ayasdi.

Keep up with Pek @peklum and Ayasdi @ayasdi

As Bio-IT World approaches next week, we are sharing a pre-show guest blog series from industry experts on trends you can expect to hear about at the event. Below is a guest contribution from Phil Eschallier, director of managed services at RCH Solutions.


In supporting research computing in life and material sciences, it’s clear that most pharmas and biotechs are pushing toward hosting and computing in the cloud. Is this prudent or wise or strategic? Let’s answer this, for now, with a definite “maybe.” Or, perhaps better viewed as a tool in the arsenal but not a panacea.


Let’s face it, the cloud is alluring and attractive for many reasons: it offers the utmost in flexibility with essentially no start-up (capital) costs; there are offerings of predefined (already engineered) services; it should have APIs to facilitate the automation of provisioning or scaling, and one can use a company credit card to facilitate (or perhaps skirt) the process. Lastly, provisioning servers via global IS organizations is often measured in “months,” but provisioning in the cloud can be measured in “days” or “weeks.”


Clearly, where temporary scale is the deciding factor, traditional computing hosted in corporate data centers and the CAPEX procurement model cannot compete with the cloud. But when demand is identified as a need over time, Gartner, US Government News, and others tell us that the cloud is notably more expensive (though the various sources reporting on cloud expense are not aligned). After all, cloud providers aren’t magic—they have to purchase compute / network / storage scale, amortize those spends over a defined period of time, and then sell it to others while still making a profit. In paying for a full-time resource, the price-tag has to include the cloud providers’ capital and operational costs plus their profits.


Will the move to the cloud always yield faster results? If your data remains within corporate firewalls for security or legal reasons, moving data (especially big data) to cloud compute platforms adds time to analysis runs and complicates the security model. And where is your data? Is it easier to bring your data to the computing or is it easier to bring computing to your data? And what happens to any intellectual property pushed to the cloud after the compute jobs are completed?


“Vendor Lock-in” has to be a factor in defining cloud strategy. Once entrenched in a vendor’s cloud offerings it will not be easy (or cheap) to migrate to another provider.


Finally, are those wielding credit cards in the business positioned to cost-effectively engineer solutions in the cloud? Some applications benefit from more CPU, others from faster storage, or more memory may be needed. Others from network tuning. Obviously, there are applications that need performance from all facets of the underlying infrastructure. One example, though common, is that an application is hamstrung by disk I/O. It’s cheap and we have a credit card, shall we just spin-up another VM? Ultimately, those paying the credit card bills may want confidence that what “spun-up” was used well.


The cloud is a tool in the toolbox. But if a business relies on heavy computational cycles or big data, it may not yet be time to promote the cloud from tool to being the entire toolbox. The cloud can serve businesses needing a public web presence or web services outside the corporate firewall. It can be a fantastic platform on which to prototype or pilot solutions, plus a fiscally responsible solution option for intermittent compute needs or when the needed scale [unpredictably] varies. However, if needs are well-defined over time or intellectual property is a concern, computing in the controlled environment of the corporate data center should be more cost-effect and secure than the cloud.


No sure how to proceed? Consider engaging a subject matter expert before deciding on the cloud vs. corporate data center -- a small up-front cost should help insure that budget computing solutions are monies well spent.


What questions do you have?

As Bio-IT World approaches next week, we are sharing a pre-show guest blog series from industry experts on trends you can expect to hear about at the event. Below is a guest contribution from Robert Cooper, business unit leader at Ceiba Solutions.


Combining new data integration techniques with mobile platforms compounds productivity gains. In pharmaceutical and other life science companies, where users rely on heterogeneous data sources and applications in the drug discovery process, the problem of interoperability and cross application function integration is of immense importance.


From large pharmaceutical to early stage biotech, the expense of building systems and integrations, although critical to success, can be prohibitive. New, innovative approaches which address these demands by applying intelligence to data and associated cross-application functions are an exciting new direction.


At Ceiba, we’ve developed a data-typing based solution that frees organizations from cost prohibitive integrations, putting a marketplace of functions and services at user’s fingertips – all accessible from common applications like Excel, Spotfire, ELN, LIMS, CRM, etc.  A dynamically adaptable platform supports integration of service-specific functions, empowers users to easily execute multiple functions and relate data in new and interesting ways – in particular “in context.”


The new framework allows users to interact with domain data (biology, chemistry, clinical, sales, marketing, manufacturing, etc.) utilizing a simple and consistent manner in which to traverse from one data type to another without prior knowledge of where the data live, or how to consume the data from a technical perspective. The user interaction is therefore one that is data- and associated function-driven rather than technologically-driven, leading to a consistent familiarity across the organization in the way application functions are leveraged and data are explored and utilized, leading to actionable data to drug information from complex relationships, with minimal time and effort. 


Partnering with Intel, Ceiba Solutions has enhanced its productivity software by combining with Intel- enabled mobility solutions, exposing even more opportunities for improved scientific effectiveness. Consumerization of software services via a marketplace of applications on top of new hardware platforms combine to provide new, exciting, and here to fore unforeseen, benefits.


For example, experimenting with hands-free communication on tablets enabled with productivity software surfacing relationships in experiment data has shown significant workflow savings, and/or immediately recovering from faults in the manufacturing process. Additionally, the desire of employees to share analysis is increased with mobility, as well as the potential number of workers capable of collaborating in near real time on projects. 


Mobility-enabled employees are more likely to reach out to research colleagues via social communication within a company. Involving employees of support or adjacent departments with other key areas and virtual teams are simplified. All resulting in positive results in organizational operating performance. As a result, the profitability of the business can grow, and costs get lower when the time required to make important decisions is reduced. The combination of a marketplace of productivity tools/services and mobility speeds decision making, leading to a significant positive affect on time to market…critical in the life sciences industry. 


What questions do you have?

The road to personalized medicine is paved with a whole series of big data challenges, as the emphasis shifts from raw sequencing performance to mapping, assembly and analytics. The need to transmit terabytes of genomic information between different sites worldwide is both essential and daunting, including:


Collaboration with research and clinical partners worldwide to establish statistically significant patient cohorts and leverage expertise across different institutions.

Reference Genomes used to assemble sequences, perform quality control, identify and annotate variants, and perform genome-wide association studies (GWAS).

Cloud-based Analytics to address critical shortages in bioinformatics expertise and burst capacity for HPC cluster compute.

Data Management and Resource Utilization across departments in shared research HPC cluster environments, analytics clusters, storage archives, and external partners.

Medical Genomics extends the data management considerations from research to clinical partners, CLIA labs, hospitals and clinics.


Most institutions still rely upon shipping physical disks due to inherent problems with commodity 1 Gigabit Ethernet (GbE) networks and TCP inefficiencies. When the goal is to reduce the analytics time from weeks to hours resulting in a meaningful clinical intervention, spending days just to transport the data is not a viable option. The transition from 1GbE to 10GbE and beyond has been unusually slow in healthcare and life sciences, likely due to an overemphasis on shared compute resources, out of context from the broader usage, system architecture, and scalability requirements.


Data centers in other industries have been quick to adopt 10GbE and unified networking due to impressive cost savings, performance and manageability considerations. Adopting a balanced compute model – where investments in processor capacity are matched with investments in network and storage – yields significant performance gains while reducing data center footprint, power and cooling costs. Demand for improved server density and shared resource utilization drives the need for virtualization. While I/O optimization historically has addressed jumbo packet transmissions on physical infrastructure, a more realistic test is that of regular packets, comparing physical and virtualized environments over both LAN/WAN traffic conditions. Aspera and Intel are working together to address these critical challenges to big data and personalized medicine.


Aspera develops high-speed data transfer technologies that provide speed, efficiency, and bandwidth control over any file size, transfer distance, network condition, and storage location (i.e., on-premise or cloud). Aspera® fasp™ Transfer Technology has no theoretical throughput limit and can only be constrained by the available network bandwidth and the hardware resources at both ends of the transfers. Complete security is built in, including secure endpoint authentication, on-the-fly data encryption, and integrity verification.


Intel has incorporated a number of I/O optimizations in conjunction with the Intel® Xeon® E5 processor and the Intel® 10Gb Ethernet Server Adapters:


Intel® 10 Gigabit Ethernet (Intel® 10GbE) replaces and consolidates older 1GbE systems, reducing power costs by 45 percent, cabling by 80 percent and infrastructure costs by 15 percent, while doubling the bandwidth.  When deployed in combination with Intel® Xeon® E5 processors, Intel 10GbE can deliver up to 3X more I/O bandwidth compared to the prior generation of Intel processors.

Intel® Data Direct I/O Technology (Intel DDIO) is a key component of Intel® Integrated I/O that increases performance by allowing Intel Ethernet controllers and server adapters to talk directly with cache and maximize throughput.

PCI-SIG* Single Root I/O Virtualization (SR-IOV) provides near-native performance by providing dedicated I/O to virtual machines and completely bypassing the software virtual switch in the hypervisor. It also improves data isolation among virtual machines and provides flexibility and mobility by facilitating live virtual machine migration.


Aspera® fasp™ demonstrated superior transfer performance when tested in conjunction with Intel® Xeon® E5-2600 processor and Intel® 10Gb Ethernet Server Adapter, utilizing both Intel® DDIO and SR-IOV. The real-world test scenarios transmitted regular packet sizes over both physical and virtualized environments, modeling a range of LAN/WAN traffic latency and packet loss:


• 300 percent throughput improvement versus a baseline system that did not contain support for Intel® DDIO and SR-IOV, showing the clear advantages of Intel’s innovative Intel® Xeon® E5 processor family.

• Similar results across both LAN and WAN transfers, confirming that Aspera® fasp™ transfer performance is independent of network latency and robust to packet loss on the network.

• Approximately the same throughput for both physical and virtualized computing environments, demonstrating the combined I/O optimizations effectively overcomes the performance penalty of virtualization.


International collaboration, cloud-based analytics, and data management issues with terabytes of genomic information will continue to pose challenges to life science researchers and clinicians alike, but working with I/O solutions driven by Aspera and Intel, we will get there faster.


Read the joint Intel-Aspera whitepaper, Big Data Technologies for Ultra-High-Speed Data Transfer in Life Sciences, for details of the I/O optimization results. Explore Aspera case studies with life science customers. Watch videos about the benefits of Intel DDIO and Intel Virtualization for Connectivity with PCI-SIG* SR-IOV.


How do you manage transport of your large medical genomics payloads?  What big data challenges are you working to overcome?