Tonight I'm going to a sold-out MongoDB event hosted by my company at our local startup incubator. Already, 250 have people signed up. That's a pretty good crowd for Durham, NC, but not surprising considering the focus is MongoDB schema design.
Schema design with MongoDB is almost too easy, but if you've been flattening things out into tables for 20 years, it may seem hard. If you create Mongo entities that are a 1:1 port of equivalent items in your RDBMS, you'll be sorely disappointed by MongoDB's performance, consistency, and so on.
On the Web, you'll find plenty of negative opinions about the quality of MongoDB. You'll also come across rave reviews and glowing recommendations. I'm willing to bet that the difference between thumbs-up and thumbs-down in 99 percent of those cases is whether the schema was or was not appropriate for a document database.
Identifying documents is the key
If you've used Hibernate or another JPA or OR-Mapping tool, then you're familiar with the concept of a "dependent object" or "composite component."
The classic example is a street address that often consists of two lines, a city, a state or province, and a postal code. In your database table you may just have columns, but because your object-oriented system needs to validate the simple types, you have a type hierarchy and a separate class from your entity. However, entries frequently have more than one or two addresses, so you may one day break that into a separate table.
We don't truly care about duplicates (meaning two people with the same address might result in rows of the same address in the table that differ only in their key) -- we only care that we can add your beach house to your personnel record so that we can find you. For a person, addresses, phone numbers, and IM accounts are all examples of things that do not usually require another document and areembedded in the parent object in MongoDB. In the event a person is deleted, you'd delete the addresses, phone numbers, and IM accounts with them or cascade the delete. You wouldn't have those other items without the parent object.
In addition to cascading deletes, other key signs that your objects belong in the same document are where a foreign key is part of a primary key, most 1:1 relationships, and nearly anywhere that you would never read from a "child" table row without first reading from the "parent" table row. There are exceptions, but this is a good place to start. In short, it probably matches your object model more closely than your average RDBMS schema.
While these more structural design principles are good rules of thumb, there is always an and/if/or/but. Remember that while document databases like MongoDB are usually not transactional in a manner similar to that of a relational database, they allow you to have atomic writes within a single document. There are times you may make a compromise from your object model in order to achieve atomicity. This means moving something that would be in another document into the same document as another thing you need to modify together. You may also use subdocuments in order to achieve this.
On the other hand, you may divide documents in order to achieve better distribution -- that is, shard more effectively. Sharding, if you recall, is where the collection of documents is distributed among server nodes using a hashing algorithm. If you are likely to use sets of subdocuments or document elements concurrently, you may find it's faster to divide them into separate shards, so they are distributed among more nodes and you achieve more parallelism on your read operations. In other words, sometimes you may need to break up something that more naturally fits in one document or subdocument in order to achieve better shard peformance.
What if you don't do that?
If you do a 1:1 table-to-document port of your RDBMS, you can expect the following:
- Miss joins where if you'd have embedded documents you wouldn't
- Lose atomicity
- Do more operations
- Gain little in terms of parallelism
- Risk writing a ranty hate blog that makes you look ignorant or worse, like a crusty ol' PL/SQL developer or a DBA fearing for his cushy job maintaining triggers
- Look like one of those hipster hackers who embrace every new technology but fail to use it correctly
To sum up, holy crap, a lot of people love Mongo! Schema design is the critical path to using MongoDB efficiently and effectively. Just as with your RDBMS, you may have to make compromises in order to achieve better atomicity or scale. Before you hate on MongoDB, you should at least make sure you're holding it right.
This article, "How to screw up your MongoDB schema design," was originally published at InfoWorld.com. Keep up on the latest developments in application development, and read more of Andrew Oliver's Strategic Developer blog at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.
This story, "How to screw up your MongoDB schema design" was originally published by InfoWorld.