Intel Healthcare IT

3 Posts authored by: ABHI BASU

Genome resequencing allows us to understand how genetic differences affect health and cause diseases. This is an important step in detecting anomalies associated with many genetically inherited diseases like Heart Disorders, Down Syndrome, Cystic Fibrosis and Chromosomal Abnormalities.

 

Next Generation Sequencing (NGS) technologies running on High Performance Computing (HPC) architectures have enabled the sequencing on DNA at groundbreaking speeds. However the storage, analysis and management of the massive DNA sequence datasets produced as a result of NGS research, is a new challenge. Hadoop and Mapreduce technologies come into play here by allowing parallel read-mapping algorithms to scale effectively and resulting in shorter execution times and lower costs (from software execution and hardware).

 

Among other areas Hadoop technologies may be useful are data storage, data management, statistical analysis and statistical association between various data sources. Organizations are now able to store large datasets in Hadoop Distributed File Systems (HDFS) and are able to use real-time analytics software to access data directly from HDFS bypassing any data migration headaches. Software packages like Myrna, developed by Ben Langmead, Kasper Hansen and Jeff Leek (John Hopkins University) is one such tool that allows the calculation of differential gene expressions in RNA-seq datasets on cloud (Amazon Elastic Map Reduce) or Hadoop clusters .

 

Innovative companies like Intel Corporation are interested in collaborating with various key partners in the Life Sciences area in an effort to accelerate such work. Intel wants to provide businesses with an open enterprise Hadoop platform alternative for next generation analytics and life sciences, called the Intel® Distribution for Apache Hadoop Software, which provides better manageability and performance – optimized for Intel Xeon processors.

 

In this paper, we demonstrate how to install and configure Myrna and its required components – Bowtie, R/Bioconductor and SRA toolkit within the Intel® Hadoop Distribution. Read the paper.

 

What is your experience with big data and Hadoop in life sciences? Do you think Hadoop is ready to become the life sciences research and analytics platform of the future?

Genome resequencing in patients is an important step in the detection of mutation for congenital diseases. Traditionally, genomics software has been run on High Performance Computing (HPC) architectures.

 

Hadoop and MapReduce technologies are slowly transforming the Life Sciences arena by allowing parallel read-mapping algorithms to scale effectively and resulting in shorter execution times and lower costs (from software execution and hardware). Michael Schatz (University of Maryland) and Ben Langmead (John Hopkins University) have introduced various software applications like Crossbow into the Hadoop ecosystem, enabling gene resequencing to run on Hadoop clusters as well as on the cloud (Amazon Web Services via Elastic Map Reduce service). Crossbow provides a scalable software pipeline that can analyze over 35x coverage of the human genome on a 10-node Hadoop cluster in about one day.

 

However, being open source, Hadoop seems less polished in some areas and can be difficult to manage in others. Companies like Intel Corporation have started with the Apache Hadoop Distribution and added components to it for better manageability and performance – optimized for Intel Xeon processors, in order to provide businesses with an open enterprise Hadoop platform for next generation analytics and life sciences, called the Intel® Distribution for Apache Hadoop Software.

 

In this new paper, we demonstrate how to install and configure Crossbow and its required components – Bowtie, SOAPsnp and SRA toolkit within Intel® Hadoop Distribution. Read the full paper here.

 

Technological advancement is far outpacing our knowledge and abilities to interpret genomic information.  However, technology is allowing newer opportunities to interpret more and more information about humans and other animals.

 

What are your thoughts on the usage, benefits and side-effects of these technologies?

The advancement of technology has enabled us to work untethered from our traditional office environments. The increase in the mobile workforce also necessitates the adoption of security solutions that protect the devices (laptops, tablets, USB stick etc.) and data that is travelling (physically), even when it is at rest.  Industry surveys show that almost 86 percent of organizations have had laptops lost or stolen with 56 percent of those with data being breached [7]. Add to this the increased vigilance required for medical and Personal Health Information (PHI) and we quickly understand the need for solutions like full disk encryption to prevent unauthorized access to data.


In the healthcare sector, we find acts like the Health Insurance Portability and Accountability Act (HIPAA) mandating the encryption of PHI at rest and in motion [See HIPAA Security Rule - “Implement a mechanism to encrypt and decrypt EPHI.” Rule 164.312(e)(2)(ii), 164.312(a)(2)(iv)]. However, the adoption of such security solutions, even though mandatory, is sometimes circumvented by end users and organizations due to disk encryption solutions not being transparent enough and slowing down the host system significantly.


Companies like Intel Corporation hope to mitigate the impact of system slowdown through the use of technologies like Intel® Advanced Encryption Standards – New Instructions (AES-NI) which is hardware-accelerated encryption/decryption that may provide enough performance jump to offset the system performance degradation due to disk encryption solutions.

By using Intel® AES-NI, we were able to observe consistent and significant performance improvement in AES algorithm encryption/decryption over software-based Full Disk Encryption. Specifically, 74 percent (average) for encryption and 75 percent (average) for decryption, over a wide range of file buffer sizes and two of the most common forms of disk drives - standard and SSD drives.

Read a new white paper that describes AES-NI and full data encryption.

What questions do you have about hard drive encryption?

Filter Blog

By author: By date:
By tag: