By now, many of you have likely heard about the four V’s of Big Data: Volume, Velocity, Variety and Value. The ideas behind this construct for Big Data were conceived by Gartner over a decade ago. In the coming months, you will find a number of blogs, papers, videos and other resources here that discuss Big Data solutions for healthcare and life sciences in greater detail.
These solutions will take advantage of advanced platform capabilities from Intel and ecosystem partners to improve the reliability, scalability, and security of these solutions. As an introduction, I wanted to use this space to set the stage for what Big Data means to healthcare, and why these solutions are needed:
• Volume: The amount of healthcare data that needs to be stored, managed, processed and protected is growing at an ever-increasing rate. This situation is exacerbated by strict data retention requirements. Medical imaging is one area where the growing volume of data is especially evident. According to IBM, 30 percent of the data stored on the world’s computers are medical images. Advances in the life sciences industry in the area of cost effective genomic sequencing are causing data storage needs in this segment to explode. Many traditional solutions have trouble scaling to accommodate this growing volume of data. “Scale-Out” solutions, where computing nodes are added to an existing cluster to meet growing demand have several advantages to traditional “Scale-Up” solutions, where one big, powerful server is replaced with another bigger more powerful server.
• Velocity: Many existing analytics / data warehouse solutions are batch in nature. Meaning all the data is periodically copied to a central location in a ‘batch’ (for example every evening). Clinical and administrative end users of this information are, by definition, not making decisions based on the latest information. Use cases such as clinical decision support really only work if end-users have a complete view of the patient with the latest information. Solutions that make use of in-memory analytics or column-store databases are typically used to improve the velocity of the data or “time to insight.”
• Variety: Traditional analytics solutions work very well with structured information, for example data in a relational database with a well formed schema. However, the majority of healthcare data is unstructured. Today, much of this unstructured information is unused (for example, doctor’s free form text notes describing a patient encounter). Sophisticated natural language processing techniques and infrastructure components such as Hadoop Map-Reduce are being used to normalize a variety of different data formats, unlocking the data in a sense for clinical and administrative end users.
• Value: Analysis by McKinsey Global Institute has identified a potential $300 billion value for Big Data per year in the healthcare industry in the U.S. alone. The majority of this value would be realized through savings/reduced national healthcare spending. For individual healthcare organizations, Big Data value will be realized by more efficient, more scalable management and processing of a quickly growing volume of data, and by enabling faster, better-informed decisions by clinicians and administrative end users.
If you would like more information on the role Intel plays in Big Data for healthcare, visit this site: Big Data and Analytics in Healthcare and Life Sciences.
What questions do you have about Big Data in healthcare? What challenges is your organization facing in regards to the four V’s? Leave a comment or follow me on Twitter @CGoughPDX.