Newsletter sign-up
View all newsletters

Enterprise Java Newsletter
Stay up to date on the latest tutorials and Java community news posted on JavaWorld

Sponsored Links

Optimize with a SATA RAID Storage Solution
Range of capacities as low as $1250 per TB. Ideal if you currently rely on servers/disks/JBODs

How to teach a Java EE app new NoSQL tricks

Lessons learned from porting Pet Shop to NoSQL via Couchbase 2.0

  • Print
  • Feedback

Page 2 of 3

If you have multiple nodes in your cluster, Couchbase spreads the documents over all the nodes in the cluster. Say you have a three-node Couchbase cluster with one bucket that has three documents in it; each node might have one document. This is done via a hashing algorithm that's part of the Couchbase client -- but there can be more than one replica of this data. You specify the number of replicas when creating your bucket. After the bucket is created, you cannot change the number of replicas, so be sure to choose the number you really want.

Another consideration for replicas is that the replica data is also stored in memory. This means you use the memory on the first and second nodes. This is not necessarily a bad thing -- in the event of a failure, the data is available almost instantaneously. The catch is that you have to turn on auto-failover and specify an interval for a node to be considered down; the data won't be available for reads until this auto-failover takes place. The default is 30 seconds, during which time your application has to deal with having certain documents from a bucket be unavailable. In our configuration, we had a single testing server, so we had merely one node.

Couchbase multinode setup is fairly easy and requires next to nothing to maintain. It doesn't require anything complicated like Zookeeper or extra configuration nodes. In case of a failed node, the Couchbase server can be auto-configured to initiate a failover, which means the failed node is removed from the cluster and read-write access is still available for other nodes.

Figure 1. Couchbase needs two servers to provide failover
Figure 1. Couchbase needs two servers to provide failover

For performance reasons, Couchbase manages the application's working set in memory up to the amount of memory specified for the bucket. If the amount of data exceeds the amount of memory, the oldest documents are evicted from memory, though they're still on disk. This makes for a speedy system. It also means you don't have to deal with a separate caching layer -- it is built-in cache.

The data is stored in the bucket as a key-value pair. You can store entire sets of objects this way because of the flexibility of JSON. Everything we stored for the application was stored as JSON. This requires marshaling (and unmarshaling) the JSON data from and to the objects in the application. We used Jackson mapper for this; it is widely used and allowed us more flexibility for circular references and the like.

The data is also schema-less. What this means is that if you want to add another field to a document, you just add it. You don't have to worry about all the documents that already exist. They are more than happy to exist without the new field, and changes to the schema are painless and quick.

Couchbase has additional features that set it apart from other document databases -- and other NoSQL databases, for that matter. It is a distributed key-value store, the data manager is written in C/C++, and the cluster manager is written in Erlang. By having a large amount of built-in mapreduce functions, many simple operations become very easy to implement. This also provides a great reference for writing our own mapreduce functions.

Couchbase has B-tree-based indexes. You can index anything from entire views to embedded documents. However, it lacks in geospatial indexes (currently available in experimental mode only), although this becomes an issue solely if you are working with location data. Couchbase also does not have in-place updates. This is not a huge sticking point because the working set remains in memory all the time, so the updates are superfast.

Couchbase does not have any concept of capped collections. This is only an issue if you are working mainly with log data analysis. Couchbase maintains the working set in memory, but you can have much more data than the amount of memory. Although Couchbase 2.0 has a relatively higher cache miss rate, its developers are working to optimize this in the next release.

The data models for the Pet Store in NoSQL
The Java Pet Store application was originally deployed in Apache Derby using Hibernate and JPA. Because Derby is an embedded implementation, we switched the configuration to use MySQL. This enabled us to have an in-depth look at the relational schema design.

Figure 2: The relational scheme design for the JavaEE Pet Store
Figure 2: The relational scheme design for the JavaEE Pet Store

The application is being driven primarily by two events:

  1. When a new customer registers
  2. When a new order is created by a customer

We built the Couchbase documents around these two events: Customer and Order. These documents were designed to contain related entities as embedded documents. We also created a third document type, Category, to store inventory information: categories, products, and items. This design decision enabled the creation of separate indexes (or views) so that they can be fetched quickly. This also provides examples of both linked and embedded documents.

JSON example for
Customer
JSON example for
Order
JSON example for Category
{
"id":"customer_marc",
"type":"customer",
"login":"marc",
"password":"marc",
"firstname":"Marc",
"lastname":"Fleury",
"telephone":null,
"email":"marc@jboss.org",
"homeAddress":{
"street1":"65 Ritherdon Road",
"street2":null,
"city":"Los Angeles",
"state":null,
"zipcode":"56421",
"country":"USA"
},
"dateOfBirth":1363794557891,
"age":null
}
{
"id":"Marc",
"type":"order",
"orderDate":null,
"customer":{
"id":1,
"login":"marc",
"password":"marc",
"firstname":"Marc",
"lastname":"Fleury",
"telephone":null,
"email":"marc@jboss.org",
"homeAddress":{
"street1":"65 Ritherdon Road",
"street2":"",
"city":"Los Angeles",
"state":"",
"zipcode":"56421",
"country":"USA"
},
"dateOfBirth":1363722361660,
"age":0
},
"orderLines":[
{
"id":null,
"quantity":1,
"item":{
"id":"item_Goldfish_Male Puppy",
"type":"item",
"name":"Male Puppy",
"description":"Lorem ...",
"unitCost":12,
"imagePath":"fish2.jpg"
}
},
{
"id":null,
"quantity":1,
"item":{
"id":"item_Angelfish_Large",
"type":"item",
"name":"Large",
"description":"Lorem ...",
"unitCost":10,
"imagePath":"fish1.jpg"
}
}
],
"deliveryAddress":{
"street1":"65 Ritherdon Road",
"street2":"",
"city":"Los Angeles",
"state":"",
"zipcode":"56421",
"country":"USA"
},
"creditCard":{
"creditCardNumber":"1234",
"creditCardType":"VISA",
"creditCardExpDate":"03/15"
},
}
{
"id":"category_Birds",
"type":"category",
"name":"Birds",
"description":"Any of ...",
"products":[
{
"id":"product_Amazon Parrot",
"type":"product",
"name":"Amazon Parrot",
"description":"Great companion for up to 75 years",
"items":[
{
"id":"item_Male Adult",
"type":"item",
"name":"Male Adult",
"description":"Lorem ...",
"unitCost":120,
"imagePath":"bird2.jpg"
},
{
"id":"item_Female Adult",
"type":"item",
"name":"Female Adult",
"description":"Lorem ...",
"unitCost":120,
"imagePath":"bird2.jpg"
}
]
},
{
"id":"product_Finch",
"type":"product",
"name":"Finch",
"description":"Great stress reliever",
"items":[
{
"id":"item_Male Adult",
"type":"item",
"name":"Male Adult",
"description":"Lorem...",
"unitCost":75,
"imagePath":"bird1.jpg"
},
{
"id":"item_Female Adult",
"type":"item",
"name":"Female Adult",
"description":"Lorem ...",
"unitCost":80,
"imagePath":"bird1.jpg"
}
]
}
]
}

As the application is deployed, the Categories, Products, and Items are generated in the database by the database populator class. When a new customer goes to the home page, he or she has the option to sign in or register. At that point, a new customer document is created. The customer can then browse the existing categories, create an order, and save it. When the order is saved, a new order document is created with order details. The associated items are added as embedded documents in the order.


  • Print
  • Feedback