Newsletter sign-up
View all newsletters

Enterprise Java Newsletter
Stay up to date on the latest tutorials and Java community news posted on JavaWorld

JavaWorld Daily Brew

Big data processing with Hadoop

 

Data storage has become cheap. Consequently, we’re storing tons of it:

  • in less than 10 years since launching its image search feature, Google has indexed over 10 billion images
  • thirty-five hours of content are uploaded to YouTube every minute
  • Twitter is said to handle, on average, 55 million tweets per day
  • in early 2010, Twitter’s search feature was logging 600 million queries daily

In lockstep with the explosive growth of data are tools designed to facilitate data processing — one such tool is Apache’s Hadoop. Hadoop is essentially a mechanism for analyzing huge datasets, which do not necessarily need to be housed in a datastore. Hadoop abstracts MapReduce’s massive data-analysis engine, making it more accessible to developers. Hadoop scales out to myriad nodes and can handle all of the activity and coordination related to data sorting. Yahoo! and countless other organizations have found it an efficient mechanism for analyzing mountains of bits and bytes. Hadoop is also fairly easy to get working on a single node; all you need is some data to analyze and familiarity with Java code, including generics.

In IBM developerWorks‘ article “Big data analysis with Hadoop MapReduce” you’ll get started with Hadoop’s MapReduce programming model and learn how to use it to analyze data for both big and small business information needs. You’ll find that analyzing data with Hadoop is easy and efficient!

Looking to spin up Continuous Integration quickly? Check out www.ciinabox.com.