Hadoop Summit, held June 26-27 in San Jose, presented Apache Hadoop at a tipping point: The big data framework has quickly moved from niche to mainstream, and many at the summit warned about losing Hadoop’s scrappy, open-source heritage as the technology gains in ubiquity.
Wednesday’s keynote provided an overview of the journey that Hadoop has traveled since it was first conceptualized seven years ago. Gartner analyst Merv Adrian (@merv) gave a fascinating speech about the maturing of Hadoop from its early days of experimentation and open development to today, when Hadoop is a buzzword for Big Data and nearly all the big players in tech have their own their own distribution or offer services built on Hadoop.
However, in the mainstreaming of Hadoop, we need to make sure that smaller boutique companies (the Innovators) aren’t displaced by the mega tech companies (the Suits). There’s room for both groups, says Adrian, because we’re nowhere near the end of the innovation cycle with Hadoop. There are lots of ways to compose new Hadoop-based solutions, depending on your requirements and business model, and we need both Innovators and Suits in the Hadoop ecosystem.
Meanwhile, Hadoop is getting close to being used by real people for real purposes, says Adrian, a pivot point for the whole big data movement. Adrian praised the cooperative spirit surrounding Hadoop as an open, community-developed software, and urged that companies of differing sizes and focus continue to move forward jointly to further Hadoop’s future.
Speaking of innovation and evolution surrounding Hadoop, the summit presented a number of important launches and announcements that point the way forward for Hadoop. The biggest topic of conversation was YARN, a major restructuring of MapReduce, the batch-processing framework that parallelizes compute across nodes in a Hadoop cluster. Due for release with Hadoop 2.0, YARN is in alpha testing and promises to deliver real-time processing to Hadoop and help transform it into a more flexible and faster data management and analytics platform.
At the Intel booth, we had our own announcement: Project Gryphon, which allows for full SQL92 compliance on top of Hadoop. One of Hadoop’s limits has been the inability to run SQL applications on Hadoop, as its data warehouse system recognizes only a small subset of SQL queries as valid. Project Gryphon is the natural evolution of the Phoenix query engine and Project Panthera, both open source efforts that sought to improve support for standard SQL features on Hadoop. Project Gryphon will offer 100 percent TCP-H compatibility with SQL92, which allows you to run queries in your Hadoop cluster just as if it were an OLTP database.
It also promises big performance gains by using the HBase co-processor framework to push queries to the server faster, and leveraging the Intel AVX instruction set in Java (Hadoop is written in Java) to bring more power to compute-heavy workloads.
As Merv Adrian said in his keynote, Hadoop is at a pivot point where it’s moving from niche to mainstream. We’ve arrived here through open development and bottoms-up innovation, and we need to make sure that, as Hadoop matures, Innovators and Suits are both still invited to the party.
Follow Tim and the Hadoop community at Intel at @TimIntel.