There’s more to the Intel® Distribution for Apache Hadoop* software (IDH) than meets the eye.


Intel is building an entire partner ecosystem around IDH that extends optimized hardware, data storage and analytics support to help ensure that IT orgs get the value and intelligence they need out of big data.


IDH helps organizations store and analyze big data by providing an open source data management platform with the hardware-level security, manageability and performance acceleration features of Intel® Xeon® processors. It also comes with technical support, training, and professional services from Intel. IDH is the distribution of choice for enterprises seeking to deploy open source Hadoop for processing big data at multi-petabyte scale.


However, the compute intensive processes of Hadoop require a combination of hardware and software optimizations and specialized analytics and visualization tools to deliver the insights, scale and ROI demanded of big data. In addition to server architectures and IDH, Intel provides a number of tools to help manage Hadoop, including Intel® Graphbuilder, which enables distributed graph analytics on top of Hadoop. However, Intel turns to its partners to help create a larger vendor ecosystem of optimized and co-engineered solutions and to build a more complete IDH computing environment.


RainStor Takes Complexity, Cost out of Big Data Querying

I estimate that up to 90 percent of organizations that have deployed Hadoop clusters today are just using them as an ETL offload. In other words, they are using Hadoop to store big content, but haven’t gotten around to taking advantage of Hadoop’s benefits as an engine for big data analysis. That’s where RainStor comes in. RainStor helps organizations achieve business insights at lower costs than other data stores, and uses familiar query and BI tools to reduce the complexity of big data analytics.


Rainstor for Hadoop* is a big data infrastructure that runs natively on the Intel Distribution for Hadoop. It’s made to not just handle the velocity and growth of today’s data, but also tackle the changing nature of data itself—log files, web clickstreams, Twitter content, machine generated data, and more, all in great volume.


But most IT admins don’t want to become query specialists or data scientists just to analyze their big data. Using RainStor, they can run real SQL queries on Hadoop stores, taking much of the complexity out of big data analysis. Rainstor is standards-based, and it uses specialized JDBC and ODBC drivers to peer into persistent Hadoop data. It can then run queries through the data using the familiar SQL environment—and DBAs don’t have to be stuck with HiveQL. To some that could be like telling a C++ developer that JavaScript coding requires no further training.


Rainstor also offers data compression and de-duplication capabilities that can lower the storage footprint by as much as 20 to 40 times. These compression features not only reduce the hardware needs and costs of big data, they also speed up querying. Rainstor is a great example of how partners are building out the Intel Hadoop ecosystem with innovative technologies.  Learn more about Rainstor for Hadoop, and follow Tim and the growing #Intel #BigData Hadoop community at @TimIntel.