As Bio-IT World approaches this week, we are sharing a pre-show guest blog series from industry experts on trends you can expect to hear about at the event. Below is a guest contribution from Dr. Pek Lum, Chief Data Scientist and VP of Solutions, Ayasdi.


It seems like every time that I turn on the news, I hear another story about someone with a traumatic brain injury whether they received it on the football field or the battlefield. The fact is that traumatic brain injury (TBI) is something that can affect anyone at anytime. A minor car accident, a trip on the stairs, or a tumble on a bike, can send you to an emergency room. In fact, up to one-third of people hit their head hard enough to go to the hospital by their mid-twenties, according to UCSF. Thankfully, most of those head injuries are mild and patients feel better in a few days. However, that’s not true for roughly 20 percent of patients who go on to develop persistent problems—such as depression, memory issues, or headaches—that can last weeks, months, or sometimes even years.


When concussion patients arrive at an emergency room, doctors diagnose the severity of these injuries based on clinical behavior and CT or MRI scans. But, those scans don’t tell us everything. They can often miss subtle physical injuries and they tell us nothing about which patients will be fine in a few days and which will go on to develop lingering adverse effects.


These are the questions that Dr. Adam Ferguson and Dr. Esther  Yuh at The Brain and Spinal Injury Center (BASIC) at UCSF wanted Ayasdi’s help to answer.


Advanced brain scan data from Diffusion Tensor Imaging (DTI) can reveal brain abnormalities that would not necessarily show up on an MRI or CT scan. That said, the amount of data that is generated from a DTI scan is massive and incredibly complex.  There are millions of questions that could be asked of the data and it takes time and money for Data Scientists to find the right ones, if they do at all. It’s literally like trying to find a needle in an enormous haystack. Ayasdi, on the other hand, is perfectly suited to this kind of task. The Ayasdi Cure application uses Topological Data Analysis (TDA), combined with an ensemble of machine learning techniques, to enable domain experts and Data Scientists to easily discover insights automatically. In fact, Ayasdi Cure is now processing 400 percent faster because the application is now optimized to run Intel® Xeon® processor, including the Intel® Math Kernel Library (MKL) and Intel® Advanced Vector Extensions (AVX), which dramatically reduces the time to insight.


What is abundantly clear is that the labels of mild, moderate, and severe head injury are too simplistic and don’t tell us the full story. By collaborating closely with UCSF and partnering with Intel, we hope to find more sensitive TBI markers that will define more precise patient subpopulations.


By doing this, doctors can potentially predict whether someone will develop complications at the time of the fall or hit.  Knowing this information ahead of time can help both doctors and patients mitigate complicating factors and potentially avoid them altogether.


This vision of the future is not just a dream. We’re making it happen. Our preliminary work is so promising that Ayasdi and UCSF were awarded the GE/NFL “Head Health” award to explore how to better diagnose and treat mild traumatic brain injuries in professional football players.  We are looking forward to sharing our findings in the coming year.


How are you tackling massive amounts of clinical and genetic data?


Dr. Pek Lum is the Chief Data Scientist and VP of Solutions at Ayasdi.

Keep up with Pek @peklum and Ayasdi @ayasdi

As Bio-IT World approaches next week, we are sharing a pre-show guest blog series from industry experts on trends you can expect to hear about at the event. Below is a guest contribution from Phil Eschallier, director of managed services at RCH Solutions.


In supporting research computing in life and material sciences, it’s clear that most pharmas and biotechs are pushing toward hosting and computing in the cloud. Is this prudent or wise or strategic? Let’s answer this, for now, with a definite “maybe.” Or, perhaps better viewed as a tool in the arsenal but not a panacea.


Let’s face it, the cloud is alluring and attractive for many reasons: it offers the utmost in flexibility with essentially no start-up (capital) costs; there are offerings of predefined (already engineered) services; it should have APIs to facilitate the automation of provisioning or scaling, and one can use a company credit card to facilitate (or perhaps skirt) the process. Lastly, provisioning servers via global IS organizations is often measured in “months,” but provisioning in the cloud can be measured in “days” or “weeks.”


Clearly, where temporary scale is the deciding factor, traditional computing hosted in corporate data centers and the CAPEX procurement model cannot compete with the cloud. But when demand is identified as a need over time, Gartner, US Government News, and others tell us that the cloud is notably more expensive (though the various sources reporting on cloud expense are not aligned). After all, cloud providers aren’t magic—they have to purchase compute / network / storage scale, amortize those spends over a defined period of time, and then sell it to others while still making a profit. In paying for a full-time resource, the price-tag has to include the cloud providers’ capital and operational costs plus their profits.


Will the move to the cloud always yield faster results? If your data remains within corporate firewalls for security or legal reasons, moving data (especially big data) to cloud compute platforms adds time to analysis runs and complicates the security model. And where is your data? Is it easier to bring your data to the computing or is it easier to bring computing to your data? And what happens to any intellectual property pushed to the cloud after the compute jobs are completed?


“Vendor Lock-in” has to be a factor in defining cloud strategy. Once entrenched in a vendor’s cloud offerings it will not be easy (or cheap) to migrate to another provider.


Finally, are those wielding credit cards in the business positioned to cost-effectively engineer solutions in the cloud? Some applications benefit from more CPU, others from faster storage, or more memory may be needed. Others from network tuning. Obviously, there are applications that need performance from all facets of the underlying infrastructure. One example, though common, is that an application is hamstrung by disk I/O. It’s cheap and we have a credit card, shall we just spin-up another VM? Ultimately, those paying the credit card bills may want confidence that what “spun-up” was used well.


The cloud is a tool in the toolbox. But if a business relies on heavy computational cycles or big data, it may not yet be time to promote the cloud from tool to being the entire toolbox. The cloud can serve businesses needing a public web presence or web services outside the corporate firewall. It can be a fantastic platform on which to prototype or pilot solutions, plus a fiscally responsible solution option for intermittent compute needs or when the needed scale [unpredictably] varies. However, if needs are well-defined over time or intellectual property is a concern, computing in the controlled environment of the corporate data center should be more cost-effect and secure than the cloud.


No sure how to proceed? Consider engaging a subject matter expert before deciding on the cloud vs. corporate data center -- a small up-front cost should help insure that budget computing solutions are monies well spent.


What questions do you have?

As Bio-IT World approaches next week, we are sharing a pre-show guest blog series from industry experts on trends you can expect to hear about at the event. Below is a guest contribution from Robert Cooper, business unit leader at Ceiba Solutions.


Combining new data integration techniques with mobile platforms compounds productivity gains. In pharmaceutical and other life science companies, where users rely on heterogeneous data sources and applications in the drug discovery process, the problem of interoperability and cross application function integration is of immense importance.


From large pharmaceutical to early stage biotech, the expense of building systems and integrations, although critical to success, can be prohibitive. New, innovative approaches which address these demands by applying intelligence to data and associated cross-application functions are an exciting new direction.


At Ceiba, we’ve developed a data-typing based solution that frees organizations from cost prohibitive integrations, putting a marketplace of functions and services at user’s fingertips – all accessible from common applications like Excel, Spotfire, ELN, LIMS, CRM, etc.  A dynamically adaptable platform supports integration of service-specific functions, empowers users to easily execute multiple functions and relate data in new and interesting ways – in particular “in context.”


The new framework allows users to interact with domain data (biology, chemistry, clinical, sales, marketing, manufacturing, etc.) utilizing a simple and consistent manner in which to traverse from one data type to another without prior knowledge of where the data live, or how to consume the data from a technical perspective. The user interaction is therefore one that is data- and associated function-driven rather than technologically-driven, leading to a consistent familiarity across the organization in the way application functions are leveraged and data are explored and utilized, leading to actionable data to drug information from complex relationships, with minimal time and effort. 


Partnering with Intel, Ceiba Solutions has enhanced its productivity software by combining with Intel- enabled mobility solutions, exposing even more opportunities for improved scientific effectiveness. Consumerization of software services via a marketplace of applications on top of new hardware platforms combine to provide new, exciting, and here to fore unforeseen, benefits.


For example, experimenting with hands-free communication on tablets enabled with productivity software surfacing relationships in experiment data has shown significant workflow savings, and/or immediately recovering from faults in the manufacturing process. Additionally, the desire of employees to share analysis is increased with mobility, as well as the potential number of workers capable of collaborating in near real time on projects. 


Mobility-enabled employees are more likely to reach out to research colleagues via social communication within a company. Involving employees of support or adjacent departments with other key areas and virtual teams are simplified. All resulting in positive results in organizational operating performance. As a result, the profitability of the business can grow, and costs get lower when the time required to make important decisions is reduced. The combination of a marketplace of productivity tools/services and mobility speeds decision making, leading to a significant positive affect on time to market…critical in the life sciences industry. 


What questions do you have?

The road to personalized medicine is paved with a whole series of big data challenges, as the emphasis shifts from raw sequencing performance to mapping, assembly and analytics. The need to transmit terabytes of genomic information between different sites worldwide is both essential and daunting, including:


Collaboration with research and clinical partners worldwide to establish statistically significant patient cohorts and leverage expertise across different institutions.

Reference Genomes used to assemble sequences, perform quality control, identify and annotate variants, and perform genome-wide association studies (GWAS).

Cloud-based Analytics to address critical shortages in bioinformatics expertise and burst capacity for HPC cluster compute.

Data Management and Resource Utilization across departments in shared research HPC cluster environments, analytics clusters, storage archives, and external partners.

Medical Genomics extends the data management considerations from research to clinical partners, CLIA labs, hospitals and clinics.


Most institutions still rely upon shipping physical disks due to inherent problems with commodity 1 Gigabit Ethernet (GbE) networks and TCP inefficiencies. When the goal is to reduce the analytics time from weeks to hours resulting in a meaningful clinical intervention, spending days just to transport the data is not a viable option. The transition from 1GbE to 10GbE and beyond has been unusually slow in healthcare and life sciences, likely due to an overemphasis on shared compute resources, out of context from the broader usage, system architecture, and scalability requirements.


Data centers in other industries have been quick to adopt 10GbE and unified networking due to impressive cost savings, performance and manageability considerations. Adopting a balanced compute model – where investments in processor capacity are matched with investments in network and storage – yields significant performance gains while reducing data center footprint, power and cooling costs. Demand for improved server density and shared resource utilization drives the need for virtualization. While I/O optimization historically has addressed jumbo packet transmissions on physical infrastructure, a more realistic test is that of regular packets, comparing physical and virtualized environments over both LAN/WAN traffic conditions. Aspera and Intel are working together to address these critical challenges to big data and personalized medicine.


Aspera develops high-speed data transfer technologies that provide speed, efficiency, and bandwidth control over any file size, transfer distance, network condition, and storage location (i.e., on-premise or cloud). Aspera® fasp™ Transfer Technology has no theoretical throughput limit and can only be constrained by the available network bandwidth and the hardware resources at both ends of the transfers. Complete security is built in, including secure endpoint authentication, on-the-fly data encryption, and integrity verification.


Intel has incorporated a number of I/O optimizations in conjunction with the Intel® Xeon® E5 processor and the Intel® 10Gb Ethernet Server Adapters:


Intel® 10 Gigabit Ethernet (Intel® 10GbE) replaces and consolidates older 1GbE systems, reducing power costs by 45 percent, cabling by 80 percent and infrastructure costs by 15 percent, while doubling the bandwidth.  When deployed in combination with Intel® Xeon® E5 processors, Intel 10GbE can deliver up to 3X more I/O bandwidth compared to the prior generation of Intel processors.

Intel® Data Direct I/O Technology (Intel DDIO) is a key component of Intel® Integrated I/O that increases performance by allowing Intel Ethernet controllers and server adapters to talk directly with cache and maximize throughput.

PCI-SIG* Single Root I/O Virtualization (SR-IOV) provides near-native performance by providing dedicated I/O to virtual machines and completely bypassing the software virtual switch in the hypervisor. It also improves data isolation among virtual machines and provides flexibility and mobility by facilitating live virtual machine migration.


Aspera® fasp™ demonstrated superior transfer performance when tested in conjunction with Intel® Xeon® E5-2600 processor and Intel® 10Gb Ethernet Server Adapter, utilizing both Intel® DDIO and SR-IOV. The real-world test scenarios transmitted regular packet sizes over both physical and virtualized environments, modeling a range of LAN/WAN traffic latency and packet loss:


• 300 percent throughput improvement versus a baseline system that did not contain support for Intel® DDIO and SR-IOV, showing the clear advantages of Intel’s innovative Intel® Xeon® E5 processor family.

• Similar results across both LAN and WAN transfers, confirming that Aspera® fasp™ transfer performance is independent of network latency and robust to packet loss on the network.

• Approximately the same throughput for both physical and virtualized computing environments, demonstrating the combined I/O optimizations effectively overcomes the performance penalty of virtualization.


International collaboration, cloud-based analytics, and data management issues with terabytes of genomic information will continue to pose challenges to life science researchers and clinicians alike, but working with I/O solutions driven by Aspera and Intel, we will get there faster.


Read the joint Intel-Aspera whitepaper, Big Data Technologies for Ultra-High-Speed Data Transfer in Life Sciences, for details of the I/O optimization results. Explore Aspera case studies with life science customers. Watch videos about the benefits of Intel DDIO and Intel Virtualization for Connectivity with PCI-SIG* SR-IOV.


How do you manage transport of your large medical genomics payloads?  What big data challenges are you working to overcome?

I frequently talk about the importance of creating shared services across a region in order to support health information exchange. I also advocate the use of secure healthcare cloud as a cost-effective means to overcome scarcities in clinical and IT expertise. One of the key infrastructure services required are patient identity management services. What is at stake with improving the accuracy of patient identity matching?

Patient identity matching is first and foremost a patient safety issue
The Bipartisan Policy Center issued a report in June 2012 “Challenges and Strategies for Accurately Matching Patients to Their Health Data.”  Care coordination requires assembling a view of current patient health information from a variety of sources across a region. The inability to accurately match patients to their health records can result in either missing information (false negatives) or incorrect information (false positives). Inaccurate matches can result in suboptimal care or worse, the risk of medical errors and adverse events. Indeed, College of Healthcare Information Executives (CHIME) conducted a May 2012 survey of 128 hospital CIOs who reported an average of 8 percent error rates with a range as much as 20 percent error rates. Also, 19 percent of those responding indicated their institutions experienced adverse events during the prior year as a result of inaccurate patient matches. More research is required in this area to accurately assess the true impact to patient safety because understandably, this is not an area that most institutions are comfortable with disclosing.

Most institutions do not have the resources nor expertise for ongoing curation of patient identity matching
To illustrate the magnitude of the problem, consider a typical community like Harris County, Texas: out of 3.4 million patients in the hospital database, 249,213 patients have the same first and last name; 76,354 patients share both names with four others; 69,807 pairs share both names and birthdates [source:  Houston Chronicle, 4/5/2011].

Medium-sized healthcare institutions currently operating patient identity services cite annual costs ranging from $500,000 to $1 million in human resources alone, not factoring the ongoing software and services expense. Respondents to the CHIME survey indicated a range of 0.5 to 20 full-time equivalents with more than three on average devoted to patient identity reconciliation. Smaller practices, clinics, rural hospitals, and independent physician associations do not have the expertise or the resources to support such a complex endeavor. More advanced institutions are starting to offer cloud-based services across a region in order to recoup some of their infrastructure costs.

Identity matching is not a problem unique to healthcare
The United States already has several regional and national identifiers in widespread use today, including social security number and state driver’s license. Additional healthcare identifiers typically include health insurance plan and local healthcare institution numbers. Indeed, despite the reluctance to use social security number out of a misplaced fear of identity theft, most clinicians in the U.S. now routinely collect further identity information including electronic copy of social security card, state driver’s license and patient photograph, in the misguided theory of fraud prevention (worsening likely the risk of data breach given that even more sensitive data is now being collected without the commensurate level of data protection and security practices applied). The central point here is that the problem posed by patient identity matching is not unique to healthcare, but applies to all government and financial-issued identities.

The same probabilistic matching algorithms are successfully applied and sold to state and federal governments worldwide because the problem is not unique. Each state has the same matching issue with their state driver’s license and other state-issued identities. What is unique is, in the United States, the perception that the healthcare identity is somehow politicized and taboo. Most countries treat healthcare identity (patient, provider, institution, device) as an important element which is required – for patient safety, for data quality, for consumer privacy, for fraud prevention, for claims processing, for benefits entitlement (whether public or private). Some countries simplify the governance by issuing a single identity for all government services including public and private financial institutions (e.g., your ATM card is your identity for all transactions), whereas some issue an identity separate for health transactions. Whatever the approach, most countries have learned (sometimes the hard way) that a single identity should be assigned to each individual at birth.

Data transparency, in addition to data protection, security and privacy, should be paramount in the use and disclosure of all consumer health and financial information
Data protection laws and regulations governing consumer health and financial information need to be applied consistently to any institution, associate or service provider or service who works with protected health and financial information. Security and privacy best practices, including encryption of sensitive data at rest and in transit, need to be consistently applied and monitored in the healthcare industry. Data transparency – the tracking, monitoring, audit and enforcement of how consumer health and financial information is used and disclosed – needs to be provided at the regional level for governments, institutions and consumers alike. In order for consumers to develop trust in the secure exchange of health information, they need to be able to easily inspect and review disclosures of their information across a region. They need to be assured that regular audit and oversight is maintained by several different levels of government, institutional, and independent auditors.

A combination of a single national identifier, along with improved adherence to data quality standards, and consensus approach to probabilistic matching algorithms are required
Selecting a single national identifier certainly reduces the cost and complexity associated with patient identity matching but must be done in conjunction with improved adherence to data quality standards across a required minimum set of patient demographics (e.g., HL7 standard data types for name, birthdate, address, etc.). Even when a national identifier is required, there will always be legacy systems which are unable to incorporate the new identifier, so a combination approach is always required. Improvements must also be made to probabilistic matching algorithms, taking a consensus approach to define levels of quality matches using weighted confidence scores across a standard set of demographics – those matches which can be automatically inferred vs. those which need to be referred for human curation and disambiguation.

An honest and open debate at the national level is required to move forward on improving the accuracy of patient identity matching. A national identifier is preferred as more expedient and more cost-effective but if this proves too politically problematic, innovations are possible through a consumer-directed voluntary identifier.

What are your experiences and concerns with patient identity matching?

I recently had the privilege of delivering the keynote at Duke University’s Third Annual Informatics Conference, “Business Transformation Through Informatics” in North Carolina, followed by a congressional staff briefing organized by Health IT Now in Washington D.C. I thought I would summarize the key takeaways here since the presentations and discussions which followed seemed to achieve a degree of resonance with the respective audiences.


I should preface this to say that each of these observations are my own, based on exhaustive research of different models worldwide. I regularly work with regional and national governments around the world to design their national healthcare architectures, establish a shared services strategy, and leverage cloud computing to cost-effectively share essential infrastructure and expertise across a region.

Care coordination realistically models health information exchange as a network of participants rather than as a point-to-point exchange
Health information exchange is better modeled as a complex network of participants rather than a simplified point-to-point exchange of information. Each exchange of health information requires numerous supporting utility services – to check authorization, lookup clinician and patient registries, normalize terminology, aggregate patient health information across disparate sources, etc. Healthcare itself more closely follows a document-centric model of workflows such as that modeled by HL7 CDA (Health Level 7 Clinical Document Architecture), embodied as standard healthcare documents such as encounter and discharge summaries, request for consult, etc. Care coordination requires timely and secure access to a shared patient record across a region, inclusive of the patient, the caretakers, the clinicians and the institutions all participating in the patient’s care.

Care coordination, quality metrics and clinical decision support require a standard informatics model
A key success factor in health reform is the establishment of a shared summary care record built upon a standard informatics model, leveraging HL7 CDA and terminology standards including SNOMED CT, LOINC, ICD10, and for medications, RxNorm or ATC. The Consolidated CDA represents a harmonized set of recommendations across HL7 balloted CDA implementation guides, IHE Implementation Guides, Health Story Project and S&I Framework. HL7 CDA has been proven worldwide, including the use of HL7 CDA for epSOS (Smart Open Systems for European Patients) transborder exchange of summary health records and medication histories. A standard informatics model enables doctors to pose queries like “Tell me which of my patients have a particular condition and are taking a particular medication” – perhaps there is a new potential drug interaction or a change in recommended treatment procotol.  Clinical decision support, population health, quality metrics, comparative effectiveness research all depend on a standardized informatics model.

Care coordination, quality metrics and clinical decision support require a critical mass of shared patient health information across all participants in the region
Countries that require electronic submission of encounter and discharge summaries within 24-48 hours of care episodes, have significantly accelerated their progress towards health reform. This protected health information is then aggregated, normalized and made accessible as a shared patient health record through a regional HIE using web-based service APIs. Patients and clinicians alike are given immediate access that is both secure and transparent. Patients are able to directly consent and authorize access to health professionals, as well as audit specific disclosures, thereby establishing trust in the system. Independent audits are conducted to ensure “need to know” and “least privileged” access to protected health information. A critical mass of shared patient health information is established because all healthcare participants in a region are included. Goals for patient safety and improved care delivery at reduced costs are met because patient care can be coordinated across each of the specialists and institutions in a region.

Time to Value:  health reform must be accelerated
The time to build out the necessary infrastructure must proceed aggressively, such that the collaborative economic model can be established before the stimulus funds are exhausted. The collaborative economic model depends on achieving a critical mass of normalized health information. Once a minimum set of normalized health information is established, local business innovation can develop value-add services, which further drive value in the network. Examples of value-add services include drug interaction checks, clinical trial patient recruitment, clinical decision support, and comparative effectiveness of particular treatment protocols, institutions, clinicians, even patient-focused wellness and behavior modifications. Time to Value is the single biggest cause of failed HIEs worldwide – they took too long to establish a sustainable business model, ran out of funds before completing the necessary infrastructure, and ignored the importance of a standard informatics model.

Regional HIEs form the backbone of a shared services strategy
A Shared Services model is a means to cost-effectively share the necessary infrastructure for health information exchange, while creating a collaborative economic model that drives local innovation and accelerates adoption of advanced healthcare usage models. Regional HIEs become the logical organizing point to collect, host and store the normalized health information, to centrally monitor and enforce patient consent and authorization, to offer value-add services which drive further value in the network. Regional HIEs provide necessary infrastructure which must be must be organized, monitored and enforced similar to transportation and utilities, to ensure interoperability at both national and regional levels. Health reform which follows a balanced approach across business drivers and metrics, policy and standards, architecture, and reimbursement and investment models demonstrate the highest levels of maturity and return on investment.

What challenges do you face with accelerating health reform?  What are your key learnings in the journey thus far?

Filter Blog

By date:
By tag: