The road to personalized medicine is paved with a whole series of big data challenges, as the emphasis shifts from raw sequencing performance to mapping, assembly and analytics. The need to transmit terabytes of genomic information between different sites worldwide is both essential and daunting, including:


Collaboration with research and clinical partners worldwide to establish statistically significant patient cohorts and leverage expertise across different institutions.

Reference Genomes used to assemble sequences, perform quality control, identify and annotate variants, and perform genome-wide association studies (GWAS).

Cloud-based Analytics to address critical shortages in bioinformatics expertise and burst capacity for HPC cluster compute.

Data Management and Resource Utilization across departments in shared research HPC cluster environments, analytics clusters, storage archives, and external partners.

Medical Genomics extends the data management considerations from research to clinical partners, CLIA labs, hospitals and clinics.


Most institutions still rely upon shipping physical disks due to inherent problems with commodity 1 Gigabit Ethernet (GbE) networks and TCP inefficiencies. When the goal is to reduce the analytics time from weeks to hours resulting in a meaningful clinical intervention, spending days just to transport the data is not a viable option. The transition from 1GbE to 10GbE and beyond has been unusually slow in healthcare and life sciences, likely due to an overemphasis on shared compute resources, out of context from the broader usage, system architecture, and scalability requirements.


Data centers in other industries have been quick to adopt 10GbE and unified networking due to impressive cost savings, performance and manageability considerations. Adopting a balanced compute model – where investments in processor capacity are matched with investments in network and storage – yields significant performance gains while reducing data center footprint, power and cooling costs. Demand for improved server density and shared resource utilization drives the need for virtualization. While I/O optimization historically has addressed jumbo packet transmissions on physical infrastructure, a more realistic test is that of regular packets, comparing physical and virtualized environments over both LAN/WAN traffic conditions. Aspera and Intel are working together to address these critical challenges to big data and personalized medicine.


Aspera develops high-speed data transfer technologies that provide speed, efficiency, and bandwidth control over any file size, transfer distance, network condition, and storage location (i.e., on-premise or cloud). Aspera® fasp™ Transfer Technology has no theoretical throughput limit and can only be constrained by the available network bandwidth and the hardware resources at both ends of the transfers. Complete security is built in, including secure endpoint authentication, on-the-fly data encryption, and integrity verification.


Intel has incorporated a number of I/O optimizations in conjunction with the Intel® Xeon® E5 processor and the Intel® 10Gb Ethernet Server Adapters:


Intel® 10 Gigabit Ethernet (Intel® 10GbE) replaces and consolidates older 1GbE systems, reducing power costs by 45 percent, cabling by 80 percent and infrastructure costs by 15 percent, while doubling the bandwidth.  When deployed in combination with Intel® Xeon® E5 processors, Intel 10GbE can deliver up to 3X more I/O bandwidth compared to the prior generation of Intel processors.

Intel® Data Direct I/O Technology (Intel DDIO) is a key component of Intel® Integrated I/O that increases performance by allowing Intel Ethernet controllers and server adapters to talk directly with cache and maximize throughput.

PCI-SIG* Single Root I/O Virtualization (SR-IOV) provides near-native performance by providing dedicated I/O to virtual machines and completely bypassing the software virtual switch in the hypervisor. It also improves data isolation among virtual machines and provides flexibility and mobility by facilitating live virtual machine migration.


Aspera® fasp™ demonstrated superior transfer performance when tested in conjunction with Intel® Xeon® E5-2600 processor and Intel® 10Gb Ethernet Server Adapter, utilizing both Intel® DDIO and SR-IOV. The real-world test scenarios transmitted regular packet sizes over both physical and virtualized environments, modeling a range of LAN/WAN traffic latency and packet loss:


• 300 percent throughput improvement versus a baseline system that did not contain support for Intel® DDIO and SR-IOV, showing the clear advantages of Intel’s innovative Intel® Xeon® E5 processor family.

• Similar results across both LAN and WAN transfers, confirming that Aspera® fasp™ transfer performance is independent of network latency and robust to packet loss on the network.

• Approximately the same throughput for both physical and virtualized computing environments, demonstrating the combined I/O optimizations effectively overcomes the performance penalty of virtualization.


International collaboration, cloud-based analytics, and data management issues with terabytes of genomic information will continue to pose challenges to life science researchers and clinicians alike, but working with I/O solutions driven by Aspera and Intel, we will get there faster.


Read the joint Intel-Aspera whitepaper, Big Data Technologies for Ultra-High-Speed Data Transfer in Life Sciences, for details of the I/O optimization results. Explore Aspera case studies with life science customers. Watch videos about the benefits of Intel DDIO and Intel Virtualization for Connectivity with PCI-SIG* SR-IOV.


How do you manage transport of your large medical genomics payloads?  What big data challenges are you working to overcome?

I frequently talk about the importance of creating shared services across a region in order to support health information exchange. I also advocate the use of secure healthcare cloud as a cost-effective means to overcome scarcities in clinical and IT expertise. One of the key infrastructure services required are patient identity management services. What is at stake with improving the accuracy of patient identity matching?

Patient identity matching is first and foremost a patient safety issue
The Bipartisan Policy Center issued a report in June 2012 “Challenges and Strategies for Accurately Matching Patients to Their Health Data.”  Care coordination requires assembling a view of current patient health information from a variety of sources across a region. The inability to accurately match patients to their health records can result in either missing information (false negatives) or incorrect information (false positives). Inaccurate matches can result in suboptimal care or worse, the risk of medical errors and adverse events. Indeed, College of Healthcare Information Executives (CHIME) conducted a May 2012 survey of 128 hospital CIOs who reported an average of 8 percent error rates with a range as much as 20 percent error rates. Also, 19 percent of those responding indicated their institutions experienced adverse events during the prior year as a result of inaccurate patient matches. More research is required in this area to accurately assess the true impact to patient safety because understandably, this is not an area that most institutions are comfortable with disclosing.

Most institutions do not have the resources nor expertise for ongoing curation of patient identity matching
To illustrate the magnitude of the problem, consider a typical community like Harris County, Texas: out of 3.4 million patients in the hospital database, 249,213 patients have the same first and last name; 76,354 patients share both names with four others; 69,807 pairs share both names and birthdates [source:  Houston Chronicle, 4/5/2011].

Medium-sized healthcare institutions currently operating patient identity services cite annual costs ranging from $500,000 to $1 million in human resources alone, not factoring the ongoing software and services expense. Respondents to the CHIME survey indicated a range of 0.5 to 20 full-time equivalents with more than three on average devoted to patient identity reconciliation. Smaller practices, clinics, rural hospitals, and independent physician associations do not have the expertise or the resources to support such a complex endeavor. More advanced institutions are starting to offer cloud-based services across a region in order to recoup some of their infrastructure costs.

Identity matching is not a problem unique to healthcare
The United States already has several regional and national identifiers in widespread use today, including social security number and state driver’s license. Additional healthcare identifiers typically include health insurance plan and local healthcare institution numbers. Indeed, despite the reluctance to use social security number out of a misplaced fear of identity theft, most clinicians in the U.S. now routinely collect further identity information including electronic copy of social security card, state driver’s license and patient photograph, in the misguided theory of fraud prevention (worsening likely the risk of data breach given that even more sensitive data is now being collected without the commensurate level of data protection and security practices applied). The central point here is that the problem posed by patient identity matching is not unique to healthcare, but applies to all government and financial-issued identities.

The same probabilistic matching algorithms are successfully applied and sold to state and federal governments worldwide because the problem is not unique. Each state has the same matching issue with their state driver’s license and other state-issued identities. What is unique is, in the United States, the perception that the healthcare identity is somehow politicized and taboo. Most countries treat healthcare identity (patient, provider, institution, device) as an important element which is required – for patient safety, for data quality, for consumer privacy, for fraud prevention, for claims processing, for benefits entitlement (whether public or private). Some countries simplify the governance by issuing a single identity for all government services including public and private financial institutions (e.g., your ATM card is your identity for all transactions), whereas some issue an identity separate for health transactions. Whatever the approach, most countries have learned (sometimes the hard way) that a single identity should be assigned to each individual at birth.

Data transparency, in addition to data protection, security and privacy, should be paramount in the use and disclosure of all consumer health and financial information
Data protection laws and regulations governing consumer health and financial information need to be applied consistently to any institution, associate or service provider or service who works with protected health and financial information. Security and privacy best practices, including encryption of sensitive data at rest and in transit, need to be consistently applied and monitored in the healthcare industry. Data transparency – the tracking, monitoring, audit and enforcement of how consumer health and financial information is used and disclosed – needs to be provided at the regional level for governments, institutions and consumers alike. In order for consumers to develop trust in the secure exchange of health information, they need to be able to easily inspect and review disclosures of their information across a region. They need to be assured that regular audit and oversight is maintained by several different levels of government, institutional, and independent auditors.

A combination of a single national identifier, along with improved adherence to data quality standards, and consensus approach to probabilistic matching algorithms are required
Selecting a single national identifier certainly reduces the cost and complexity associated with patient identity matching but must be done in conjunction with improved adherence to data quality standards across a required minimum set of patient demographics (e.g., HL7 standard data types for name, birthdate, address, etc.). Even when a national identifier is required, there will always be legacy systems which are unable to incorporate the new identifier, so a combination approach is always required. Improvements must also be made to probabilistic matching algorithms, taking a consensus approach to define levels of quality matches using weighted confidence scores across a standard set of demographics – those matches which can be automatically inferred vs. those which need to be referred for human curation and disambiguation.

An honest and open debate at the national level is required to move forward on improving the accuracy of patient identity matching. A national identifier is preferred as more expedient and more cost-effective but if this proves too politically problematic, innovations are possible through a consumer-directed voluntary identifier.

What are your experiences and concerns with patient identity matching?

I recently had the privilege of delivering the keynote at Duke University’s Third Annual Informatics Conference, “Business Transformation Through Informatics” in North Carolina, followed by a congressional staff briefing organized by Health IT Now in Washington D.C. I thought I would summarize the key takeaways here since the presentations and discussions which followed seemed to achieve a degree of resonance with the respective audiences.

I should preface this to say that each of these observations are my own, based on exhaustive research of different models worldwide. I regularly work with regional and national governments around the world to design their national healthcare architectures, establish a shared services strategy, and leverage cloud computing to cost-effectively share essential infrastructure and expertise across a region.

Care coordination realistically models health information exchange as a network of participants rather than as a point-to-point exchange
Health information exchange is better modeled as a complex network of participants rather than a simplified point-to-point exchange of information. Each exchange of health information requires numerous supporting utility services – to check authorization, lookup clinician and patient registries, normalize terminology, aggregate patient health information across disparate sources, etc. Healthcare itself more closely follows a document-centric model of workflows such as that modeled by HL7 CDA (Health Level 7 Clinical Document Architecture), embodied as standard healthcare documents such as encounter and discharge summaries, request for consult, etc. Care coordination requires timely and secure access to a shared patient record across a region, inclusive of the patient, the caretakers, the clinicians and the institutions all participating in the patient’s care.

Care coordination, quality metrics and clinical decision support require a standard informatics model
A key success factor in health reform is the establishment of a shared summary care record built upon a standard informatics model, leveraging HL7 CDA and terminology standards including SNOMED CT, LOINC, ICD10, and for medications, RxNorm or ATC. The Consolidated CDA represents a harmonized set of recommendations across HL7 balloted CDA implementation guides, IHE Implementation Guides, Health Story Project and S&I Framework. HL7 CDA has been proven worldwide, including the use of HL7 CDA for epSOS (Smart Open Systems for European Patients) transborder exchange of summary health records and medication histories. A standard informatics model enables doctors to pose queries like “Tell me which of my patients have a particular condition and are taking a particular medication” – perhaps there is a new potential drug interaction or a change in recommended treatment procotol.  Clinical decision support, population health, quality metrics, comparative effectiveness research all depend on a standardized informatics model.

Care coordination, quality metrics and clinical decision support require a critical mass of shared patient health information across all participants in the region
Countries that require electronic submission of encounter and discharge summaries within 24-48 hours of care episodes, have significantly accelerated their progress towards health reform. This protected health information is then aggregated, normalized and made accessible as a shared patient health record through a regional HIE using web-based service APIs. Patients and clinicians alike are given immediate access that is both secure and transparent. Patients are able to directly consent and authorize access to health professionals, as well as audit specific disclosures, thereby establishing trust in the system. Independent audits are conducted to ensure “need to know” and “least privileged” access to protected health information. A critical mass of shared patient health information is established because all healthcare participants in a region are included. Goals for patient safety and improved care delivery at reduced costs are met because patient care can be coordinated across each of the specialists and institutions in a region.

Time to Value:  health reform must be accelerated
The time to build out the necessary infrastructure must proceed aggressively, such that the collaborative economic model can be established before the stimulus funds are exhausted. The collaborative economic model depends on achieving a critical mass of normalized health information. Once a minimum set of normalized health information is established, local business innovation can develop value-add services, which further drive value in the network. Examples of value-add services include drug interaction checks, clinical trial patient recruitment, clinical decision support, and comparative effectiveness of particular treatment protocols, institutions, clinicians, even patient-focused wellness and behavior modifications. Time to Value is the single biggest cause of failed HIEs worldwide – they took too long to establish a sustainable business model, ran out of funds before completing the necessary infrastructure, and ignored the importance of a standard informatics model.

Regional HIEs form the backbone of a shared services strategy
A Shared Services model is a means to cost-effectively share the necessary infrastructure for health information exchange, while creating a collaborative economic model that drives local innovation and accelerates adoption of advanced healthcare usage models. Regional HIEs become the logical organizing point to collect, host and store the normalized health information, to centrally monitor and enforce patient consent and authorization, to offer value-add services which drive further value in the network. Regional HIEs provide necessary infrastructure which must be must be organized, monitored and enforced similar to transportation and utilities, to ensure interoperability at both national and regional levels. Health reform which follows a balanced approach across business drivers and metrics, policy and standards, architecture, and reimbursement and investment models demonstrate the highest levels of maturity and return on investment.

What challenges do you face with accelerating health reform?  What are your key learnings in the journey thus far?

Filter Blog

By author: By date:
By tag: