Data Storage & Management

Data Storage & Management news, information, and how-to advice

Apache Spark
chuck norris

3d printed robotic hands

Review: MongoDB 3.0 reaches for the enterprise

MongoDB zeroes in on operations with pluggable storage engines and revamped management tools

big data painpoints

9 big data pain points

Do enough Hadoop and NoSQL deployments, and the same problems crop up again and again. It's time for the industry to nail them sooner rather than later.

streaming data

Why streaming analytics is such a big deal

Analytics drive decisions, but some decisions shouldn't wait until batch processes complete -- which is why, eventually, we'll all analyze data as it streams in.

developer choice

Which freaking Hadoop engine should I use?

These four truths will help you determine which Hadoop technology to use for the types of workloads you anticipate.

dev challenges
Tip

Big data, big challenges: Hadoop in the enterprise

Fresh from the front lines: Common problems encountered when putting Hadoop to work -- and the best tools to make Hadoop less burdensome.

big data

Spark 1.4 adds support for R, Python 3, cluster management

Spark data processing framework adds languages used by many data crunchers, as well as container-based cluster management features.

A better mousetrap: A JSON data warehouse takes on Hadoop

Sure, a NoSQL or JSON data warehouse sounds faddish, but SonarW is a better solution for many.

hadoop sql

LinkedIn fills another SQL-on-Hadoop niche

LinkedIn's open source, home-brew OLAP project is a new way for Hadoop users (and others) to query both real-time and historical data.

apex datatorrent

Spark and Storm face new competition for real-time Hadoop processing

DataTorrent is releasing its real-time data processing engine for Hadoop and beyond as the open source Project Apex.

business storm 157689723

Review: Storm’s real-time processing comes at a price

Storm may be the only real-time processing framework that has been proven to process millions of messages per second, but there's a steep learning curve ahead.

spark

Spark, big data's brightest star, needs to grow up

Spark is the hottest project in big data -- but Databricks, the company behind it, needs to ensure its implementation has a plausible path to maturity.

hadoop

What you need to know about Hadoop right now

Andrew updates his cheat sheet for developers navigating the ever-expanding Hadoop ecosystem. Storm and Spark still top the list, but don't miss new additions like Phoenix, Kafka, and Falcon.

First look: MongoDB 3.0, for mature audiences

The new MongoDB features document-level locking, better write performance, big memory support, and more. At last, MongoDB is all grown up.

google cloud

Google hitches cloud data analysis to Java SDK

Google Cloud Dataflow is based on FlumeJava but can be extended to other languages and environments.

Hands on: Build a Storm analytics solution

Storm lets you create real-time analytics for every conceivable need. Here's a tasty example using Twitter data and source code hosted on GitHub.

Comparing JVM libraries for MongoDB

Get a quick look at how four leading Java-based libraries for MongoDB handle a common REST-services use case.

big data

MongoDB gets its first native analytics tool

A new open source analytics tool, SlamData adds extensions to SQL that enable analysts to query MongoDB directly, without conversion to an RDBMS.

11 open source tools for making the most of machine learning

11 open source tools to make the most of machine learning

Tap the predictive power of machine learning with these diverse, easy-to-implement libraries and frameworks

Load More