Architectural patterns for building real time applications with Apache HBase

Version 1

    Author introduction:

    Andrew is a Principal Architect at Intel in the Big Data Platform Engineering Group, and a Committer and PMC member for the Apache HBase project, whose current focus includes security, the next generation of commodity hardware, and the constraints of the Java sandbox on the Hadoop ecosystem. Previously, Andrew worked at Trend Micro, Sparta, and McAfee.

    Document Introduction:

    Real-time analytics is often what business analysts and executives ask for, but what they usually want is something else: interactive exploration of data sets. Decisions happen on human timescales, the analyst's query leading to an action. The real-time advantage is usually how quickly results are available from a query, sometimes how fresh the data is when presented. Apache HBase is increasingly used as the persistence option for interactive dataset exploration because it is fast, capable of quickly ingesting very large amounts of data, and optimized for on-demand analytics. For the subset of applications requiring truly realtime decisions, HBase is found there as well because of its predictable latency. In this presentation, we distill the commonalities of Apache HBase use cases observed in practice into a set of 9 end-to-end architectural patterns for data ingest, processing, and query - a set of blueprints for fast Big Data analysis.