Big data: There's signal in that noise

To get real value from big data, take the blinders off and look for the gems in correlation

Fearmongers warn that capturing data means capturing noise and well-tended gardens are the only way to manage data. Well, guess what? Sometimes noise is the point.

Read any article by any not-very-technical journalist parroting those who sell fear -- or any comment on one of my posts from a scared cube dweller who was hoping to ride PL/SQL to retirement -- and you'll hear dire warnings that capturing all this data before you can interpret it will spell doom and disaster. Caution! Much of the data is noise! Noise is bad and you risk terrible error!

With any new approach you take risks. What if the noise turns out to be more valuable than the data you're trying to capture?

That's exactly what happened recently with Jawbone's Up, a popular activity-tracking wristband. Last year, Jawbone hired a vice president of data, Monica Rogati, to start mining the gobs of data accumulated so far. After the earthquake in Napa last weekend, Jawbone published a graph showing a large percentage of Up wearers awaking, with their wake time and the percentage of those awoken directly related to their proximity to the quake. Those in San Francisco, for example, woke up slightly later and in fewer numbers than those in Napa.

Not long ago the Wall Street Journal noted that Twitter found quakes faster than seismometers -- so imagine how quickly Up might work in detecting disasters compared to Twitter. There are obviously limitations and problems. (Heck, I still wish Jawbone could tell me how much more reliable the Up24 is compared to the Up Gen 2, but it refuses to say.) But imagine the potential!

Consider wildlife tracking. For years people have been radio tagging catch-and-release animals. Wouldn't they run from the epicenter? Perhaps that noise is exactly what we need. I realize they might not be tagged in sufficient numbers or have sufficient range to make that feasible, but it's a thought.

We've seen other instances of noise being more interesting than the data. Recently, while watching for plate movement, GPS recordings indicated that the West Coast is rising. Why? Because all those people moved into the desert and planted palm trees and started drinking more water than could be piped in. Meanwhile, everyone was putting more carbon in the skies, which warmed the air and depleted the water further (not a shocker that there's a drought). The GPS readings indicate how water weighs down the land -- and the degree to which land rises when the water leaves. Now we have a new measure of true water depletion across the land.

Pure science revels in these side-effect numbers -- and curious scientists or people paid to rationalize the data figure out why. This "noise" ... what does it mean? Is it significant?

Now imagine the data you're capturing data for your business. What useful noise might it contain? Therein lie key opportunities for business development, loss prevention, cost reduction, and especially supply chain planning. These opportunities span business units, demand data liberalization, and require pooling data beyond what we can plan for.

They also require people dedicated to mining the data, except that conventional data miners tend to put on blinders and do what they're told. You need domain experts to get involved and look for stuff you don't realize is important yet. This requires people to be curious, ask questions, and seek serendipity. Sure, you might find pirates fight global warming and note that correlation is not always causality, but you might also find that animals run from the epicenter faster than your fancy devices detect an earthquake.

You may also find out that your customers or coworkers are unconsciously communicating something to you. To some degree, this is a game of Data Katamari: You need to gather enough mass to create a star. There are whispers of wisdom in that noise. It's our job to listen hard and discover them.

This story, "Big data: There's signal in that noise" was originally published by InfoWorld.