Recommended: Sing it, brah! 5 fabulous songs for developers
JW's Top 5
Optimize with a SATA RAID Storage Solution
Range of capacities as low as $1250 per TB. Ideal if you currently rely on servers/disks/JBODs
Page 2 of 3
If you have multiple nodes in your cluster, Couchbase spreads the documents over all the nodes in the cluster. Say you have a three-node Couchbase cluster with one bucket that has three documents in it; each node might have one document. This is done via a hashing algorithm that's part of the Couchbase client -- but there can be more than one replica of this data. You specify the number of replicas when creating your bucket. After the bucket is created, you cannot change the number of replicas, so be sure to choose the number you really want.
Another consideration for replicas is that the replica data is also stored in memory. This means you use the memory on the first and second nodes. This is not necessarily a bad thing -- in the event of a failure, the data is available almost instantaneously. The catch is that you have to turn on auto-failover and specify an interval for a node to be considered down; the data won't be available for reads until this auto-failover takes place. The default is 30 seconds, during which time your application has to deal with having certain documents from a bucket be unavailable. In our configuration, we had a single testing server, so we had merely one node.
Couchbase multinode setup is fairly easy and requires next to nothing to maintain. It doesn't require anything complicated like Zookeeper or extra configuration nodes. In case of a failed node, the Couchbase server can be auto-configured to initiate a failover, which means the failed node is removed from the cluster and read-write access is still available for other nodes.
Figure 1. Couchbase needs two servers to provide failover
For performance reasons, Couchbase manages the application's working set in memory up to the amount of memory specified for the bucket. If the amount of data exceeds the amount of memory, the oldest documents are evicted from memory, though they're still on disk. This makes for a speedy system. It also means you don't have to deal with a separate caching layer -- it is built-in cache.
The data is stored in the bucket as a key-value pair. You can store entire sets of objects this way because of the flexibility of JSON. Everything we stored for the application was stored as JSON. This requires marshaling (and unmarshaling) the JSON data from and to the objects in the application. We used Jackson mapper for this; it is widely used and allowed us more flexibility for circular references and the like.
The data is also schema-less. What this means is that if you want to add another field to a document, you just add it. You don't have to worry about all the documents that already exist. They are more than happy to exist without the new field, and changes to the schema are painless and quick.
Couchbase has additional features that set it apart from other document databases -- and other NoSQL databases, for that matter.
It is a distributed key-value store, the data manager is written in C/C++, and the cluster manager is written in Erlang. By
having a large amount of built-in mapreduce functions, many simple operations become very easy to implement. This also provides a great reference for writing our own
mapreduce functions.
Couchbase has B-tree-based indexes. You can index anything from entire views to embedded documents. However, it lacks in geospatial indexes (currently available in experimental mode only), although this becomes an issue solely if you are working with location data. Couchbase also does not have in-place updates. This is not a huge sticking point because the working set remains in memory all the time, so the updates are superfast.
Couchbase does not have any concept of capped collections. This is only an issue if you are working mainly with log data analysis. Couchbase maintains the working set in memory, but you can have much more data than the amount of memory. Although Couchbase 2.0 has a relatively higher cache miss rate, its developers are working to optimize this in the next release.
The data models for the Pet Store in NoSQL
The Java Pet Store application was originally deployed in Apache Derby using Hibernate and JPA. Because Derby is an embedded
implementation, we switched the configuration to use MySQL. This enabled us to have an in-depth look at the relational schema
design.
Figure 2: The relational scheme design for the JavaEE Pet Store
The application is being driven primarily by two events:
We built the Couchbase documents around these two events: Customer and Order. These documents were designed to contain related entities as embedded documents. We also created a third document type, Category, to store inventory information: categories, products, and items. This design decision enabled the creation of separate indexes (or views) so that they can be fetched quickly. This also provides examples of both linked and embedded documents.
| JSON example for Customer |
JSON example for Order |
JSON example for Category |
{ |
{ |
{ |
As the application is deployed, the Categories, Products, and Items are generated in the database by the database populator class. When a new customer goes to the home page, he or she has the option to sign in or register. At that point, a new customer document is created. The customer can then browse the existing categories, create an order, and save it. When the order is saved, a new order document is created with order details. The associated items are added as embedded documents in the order.