Intel has released its own Hadoop distribution in a move intended to accelerate adoption of the big data platform while ensuring more of those workloads run on Intel's own Xeon processors.
The Intel Distribution for Apache Hadoop includes core pieces of the data analysis platform that Intel is releasing as open-source software, as well as deployment and tuning tools that Intel developed itself and which are not open source.
More about Hadoop and big data
According to an InfoWorld report, Hadoop will be in two-thirds of advanced analytics products by 2015. Get a beginner's introduction to MapReduce programming with Hadoop, then find out how Twitter programmers use Hadoop. Get JavaWorld's Enterprise Java newsletter delivered to your inbox.
Organizations will be more willing to expand their investments in Hadoop if they know there's a consistent distribution backed by a big, stable vendor like Intel, said Boyd Davis, general manager of Intel's data center software division, at a launch event in San Francisco Tuesday.
Intel has been upping its investments in software for several years, to help ensure its processors are widely used beyond their traditional stronghold in client/server computing. It said it has worked with customers over the past few years to develop its Hadoop distribution, and that this is actually its third release of the software.
Still, it's a significant announcement that moves Intel deeper into the software industry. Like many other open-source providers, Intel will now sell support and maintenance services for its distribution, Boyd said.
Hadoop includes a dozen or so open-source projects that work together to make it easier for users to store, manage and analyze large amounts of data. It's become the go-to software platform for companies mining Web logs, transaction histories and other data in search of added value.
Intel's distribution includes versions of the Hadoop Distributed File System, the Hadoop Processing Framework, Hive and Hbase. Intel has tweaked those programs to take advantage of capabilities in its own Xeon chips, such as its processor instructions for accelerating AES encryption.
"By incorporating silicon-based encryption support of the Hadoop Distributed File System, organizations can now more securely analyze their data sets without compromising performance," it said in a news release.
But Intel says the core components of its distribution remain open and compatible with other implementations of Hadoop. If customers choose Intel's distribution, "they're not getting locked into a technology," Boyd said.
At the same time, Intel has developed some of its own tools that will not be released as open source. They include a deployment and configuration tool called Intel Manager for Apache Hadoop, and a tool for tuning cluster performance, called Active Tuner for Apache Hadoop.
Customers who run Intel's Hadoop distribution on servers loaded with Intel hardware, including its processors, solid-state drives and 10 Gigabit Ethernet cards, will see a 40 percent performance boost over users who don't go with an all-Intel platform, according to Boyd.
Intel is working with about 20 partners to make their products run well on its distribution, including SAP, Red Hat, Cisco Systems, Infosys, Simba Technologies, Teradata and Wipro.
Boyd was joined by a representative from SAP who said the company is keen to make its Hana in-memory database work well with Hadoop. A Savvis executive said customers of the cloud hosting company are asking for big data services.
"We're not looking to provide vertical applications," said Boyd. "Instead, we think we're a good bet for a horizontal layer that partners can build on top of."