Optimize with a SATA RAID Storage Solution
Range of capacities as low as $1250 per TB. Ideal if you currently rely on servers/disks/JBODs
The surging popularity of Hadoop has paved the way to storing and processing gobs of semistructured data. Batch processing data is a great way to study the past in high definition, but it's constrained by the simple phrase "next time." As in: "Looks like our customers didn't like the way the checkout process went, let's change that for next time."
Constructing a "this time" solution can be approached in several ways. One angle of attack is to combine batch and real-time analytics: Set up MapReduce jobs to run every night, for example, and pipe the results into a NoSQL database to be queried throughout the day. MapReduce distills and condenses the data set, allowing it to be accessed quickly as needed.
Another approach is to focus on the state of the user at the current moment. In other words, context -- not what will be, but what is, right now. And particularly for mobile applications, what information could be more important about the user's state than his or her geolocation?
Location awareness is an idea that's spreading both inside the tech world and out (just think about the locavore or farm-to-table trends in the foodie community). GPS-enabled mobile devices have been a boon to developers and technology providers paying attention to these trends. One such provider is our new favorite billion-dollar NoSQL company, MongoDB.
MongoDB's new geospatial query in action
MongoDB has had native geospatial queries for a while (famously used by Foursquare), which make finding documents that are near a given point or documents that lie within a given polygon a breeze.
The latest production release introduced a new query operator:
$geoIntersects. The operator packs quite a punch, filling in functionality lacking in previous versions. For example, if you wanted to supply
a point and find all the documents enclosing that point, you had to do it on your application layer. If you wanted to supply
a polygon and see which documents overlap, you had to do it on your application layer. If you wanted to supply a line and
see which documents could be found on that line, you did it in your application layer.
$geoIntersects does all that and more. On top of the flexibility, it's incredibly easy to use. The general idea is that you can supply a
point, line, or polygon, and any document that intersects with the supplied geometry will be returned. To demonstrate, I've
built a little Web app that relies on the query.
The app takes a starting and ending address, finds a bike route for the supplied points, and tells you all the Chicago neighborhoods you'd pass through. It will also tell you which Chicago neighborhood you're in at the moment. The app is simple, but shows two possible uses of the new query operator. Here's what it looks like:
To start, I pulled all the neighborhood boundary data from the city of Chicago's website. After a quick conversion to GeoJSON,
I imported the data into a collection in MongoDB named
neighborhoods. Each neighborhood document looks roughly like this: