|
|
Optimize with a SATA RAID Storage Solution
Range of capacities as low as $1250 per TB. Ideal if you currently rely on servers/disks/JBODs
Page 3 of 4
Take a moment to think through how this same entity would be represented in an RDBMS. In this case, a business card table would need columns for both fax and Twitter, even though many rows would not have any data in those fields. Furthermore, altering a table's definition after the fact can be problematic, especially when that table contains a large amount of data. Thus, in some cases, a document store's freedom of data definition permits a high degree of variance in rapidly evolving data collections. In essence, a document store can permit data agility.
Developers can access Mongo via its own shell, which uses a JavaScript query language, while applications can talk to Mongo via a large selection of drivers that implement its communication protocol. Thus, a wide variety of applications can leverage Mongo from Java to Ruby to PHP, just to name a few. What's more, the community around Mongo has created higher-level, ORM-like libraries, which leverage core platform drivers, thus providing a closer mapping of objects in code to documents.
MongoDB: RDBMS style queries
Although many document stores (and NoSQL implementations in general) eschew the notion of SQL and thus implement custom query
languages and data access schemes, Mongo's query language is SQL-like. In fact, at a high level, it operates a lot like SQL
and is rather easy to pick up.
For instance, using Mongo's shell, inserting a document representing a business card is done as follows:
db.business_cards.insert({name:"Andrew Glover", cell: "703-555-5555", fax: "703-555-0555", address: "29210 Corporate Dr, Suite 100, Anywhere USA"})
In this case, I've inserted a JSON document into a collection named business_cards by issuing an insert command on that collection directly. I can search the business_card collection via the find command like so:
db.business_cards.find({name:"Andrew Glover"})
This would return the document I just inserted. Plus, that document would contain Mongo's _id field, which was automatically generated upon insert. Note that Mongo's query language is closely analogous to SQL -- the
same query in SQL would be something like:
select from business_cards where name = "Andrew Glover"
Mongo's query language supports a wide variety of searches, ranging from boolean expressions:
db.business_cards.find({$or: [{cell:"703-555-5555"}, {cell:"301-555-5555"}]})
This leverages a boolean OR and returns:
{ "_id" : ObjectId("4efb731168ee6a18692d86cd"), "name" : "Andrew Glover", "cell" : "703-555-5555", "fax" : "703-555-0555", "address" : "29210 Corporate Dr, Suite 100, Anywhere USA" }
{ "_id" : ObjectId("4efb73a868ee6a18692d86ce"), "name" : "Mark Smith", "cell" : "301-555-5555", "address" : "23 Corporation Way, Anywhere USA", "twitter" : "msmith" }
It also covers regular expressions:
db.business_cards.find({address: {$regex: ' corporat*', $options: 'i'}})
In return, you get two documents:
{ "_id" : ObjectId("4efb731168ee6a18692d86cd"), "name" : "Andrew Glover", "cell" : "703-555-5555", "fax" : "703-555-0555", "address" : "29210 Corporate Dr, Suite 100, Anywhere USA" }
{ "_id" : ObjectId("4efb73a868ee6a18692d86ce"), "name" : "Mark Smith", "cell" : "301-555-5555", "address" : "23 Corporation Way, Anywhere USA", "twitter" : "msmith" }
Naturally, Mongo supports updating documents and removing them. Updates are quite powerful, as there are a number of operations available for updating various aspects of a document. For instance, updating the cell number in a document containing the Twitter handle "msmith" would be performed as follows:
db.business_cards.update({twitter:"msmith"}, {$set: {cell:"202-555-5555"}})
In this case, the $set modifier changes a particular value in the first document matching the query. If all documents matching a query should be
updated, additional parameters to the update command can be added.
Removing documents is similarly straightforward. Simply use the remove command and issue a query:
db.business_cards.remove({twitter:"msmith"})
Finally, Mongo offers MapReduce, a powerful searching algorithm for batch processing and aggregations that is somewhat similar
to SQL's group by. At a high level, the MapReduce algorithm breaks a big task into two smaller steps. The map function is
designed to take a large input and divide it into smaller pieces, then hand that data off to a reduce function, which distills
the individual answers from the map function into one final output. For instance, MapReduce could be used in the business_card collection to determine how many documents contain a Twitter attribute.
MongoDB: New paradigm, new challenges
Although MongoDB itself is open source, the code base is actively maintained and indeed sponsored by a commercial entity known
as 10gen, which provides support, consulting, monitoring, and training for Mongo. 10gen supports a number of well-known companies
using Mongo, including Disney, Intuit, and MTV Networks, to name a few. In addition to strong community backing and commercial
support, Mongo benefits from excellent documentation. A number of published books are available.
Working with MongoDB is not without challenges. For starters, Mongo requires a lot of memory, preferring to put as much data as possible into working memory for fast access. In fact, data isn't immediately written to disk upon an insert (although you can optionally require this via a flag) -- a background process eventually writes unsaved data to disk. This makes writes extremely fast, but corresponding reads can occasionally be inconsistent. As a result, running Mongo in a nonreplicated environment courts the possibility of data loss. Furthermore, Mongo doesn't support the notion of ACID transactions, which is a touchstone of the RDBMS world.
As with traditional databases, indexing in Mongo must be thought through carefully. Improperly indexed collections will result in degraded read performance. Moreover, while the freedom to define documents at will does provide a high degree of agility, it has repercussions when it comes to data maintenance over the long term. Random documents in a collection present search challenges.