How to teach a Java EE app new NoSQL tricks

Lessons learned from porting Pet Shop to NoSQL via Couchbase 2.0

Java Blueprints were developed to show you design patterns in enterprise Java. The Java Pet Store was designed to demonstrate the quintessential Java 2 Enterprise Edition (J2EE) application. This was mainly in the heady days of EJB 1.1-2.1, which had many failed and defective technologies, including the now-dumped Container Managed Persistence.

Around the same time, Puff Daddy became P. Diddy, then dropped the "P" with the explanation that it was getting in between him and his fans. Likewise, J2EE dropped the "2" and became Java EE possibly for the same reason. Meanwhile, Sun abandoned the Pet Store business in 2007. But Antonio Goncalves recently picked the application back up and modernized it as the Pet Shop. It uses CDI and all Java EE's latest fixings and versions to demonstrate the "right way" to do a Java EE application. You can find the code for Goncalves's Pet Store on GitHub.

[ Know your Java? Test your skills in the Java IQ test. | Learn how to work smarter, not harder with InfoWorld's roundup of all the tips and trends programmers need to know in the Developers' Survival Guide. Download the PDF today! | Keep up with the latest developer news with InfoWorld's Developer World newsletter. ]

The opportunity: Port Pet Store to NoSQL

As frequently noted, I'm incredulous about the JavaEE programming model's continued relevance in the modern era. CDI is mainly a codification or "standardization" of Spring, assuming Oracle's blessing means "standardization" to you. The Spring Framework more or less won the programming model. Meanwhile, the traditional "there is only the RDBMS" thought pattern is a little less of a given now with the big data and NoSQL revolution in place. Moreover, in the era of consumerization of computing and the Web, the traditional transaction manager is less relevant. It is unlikely that a device (à la database) transaction is sufficient any longer to support consistency.

With Couchbase 2.0's recent release, it seemed to me the perfect time for my colleagues and I to try to port the Pet Store as a NoSQL application. What may surprise you is how little work it required. See for yourself in our extensive guide; you can find our code for this project on GitHub as well.

The example exposes a weakness of Java EE related to NoSQL in that we had to have a lot of direct to Couchbase API code to make it work. Ideally, we'd have used something more like Spring Data because Spring supports CDI. However, Spring Data does not yet support Couchbase 2.0 for more than caching. (Full support is in beta.)

This gave us a good chance to test drive Couchbase 2.0 with the Java Pet Store.

Couchbase 101: How it differs from what you're used to

At a high level, Couchbase is a combination of a back-end data store and a built-in, document-level cache. It provides auto-failover, auto-sharding, and automatic load balancing. Couchbase does not have the concepts of databases and tables. Instead, data isolation is achieved through buckets. Buckets are like a database, where users store documents with different schemas. Thus, in our new Pet Store application, we created one bucket called petstore where all the documents for the application reside.

If you have multiple nodes in your cluster, Couchbase spreads the documents over all the nodes in the cluster. Say you have a three-node Couchbase cluster with one bucket that has three documents in it; each node might have one document. This is done via a hashing algorithm that's part of the Couchbase client -- but there can be more than one replica of this data. You specify the number of replicas when creating your bucket. After the bucket is created, you cannot change the number of replicas, so be sure to choose the number you really want.

Another consideration for replicas is that the replica data is also stored in memory. This means you use the memory on the first and second nodes. This is not necessarily a bad thing -- in the event of a failure, the data is available almost instantaneously. The catch is that you have to turn on auto-failover and specify an interval for a node to be considered down; the data won't be available for reads until this auto-failover takes place. The default is 30 seconds, during which time your application has to deal with having certain documents from a bucket be unavailable. In our configuration, we had a single testing server, so we had merely one node.

Couchbase multinode setup is fairly easy and requires next to nothing to maintain. It doesn't require anything complicated like Zookeeper or extra configuration nodes. In case of a failed node, the Couchbase server can be auto-configured to initiate a failover, which means the failed node is removed from the cluster and read-write access is still available for other nodes.

Figure 1. Couchbase needs two servers to provide failover
Figure 1. Couchbase needs two servers to provide failover

For performance reasons, Couchbase manages the application's working set in memory up to the amount of memory specified for the bucket. If the amount of data exceeds the amount of memory, the oldest documents are evicted from memory, though they're still on disk. This makes for a speedy system. It also means you don't have to deal with a separate caching layer -- it is built-in cache.

The data is stored in the bucket as a key-value pair. You can store entire sets of objects this way because of the flexibility of JSON. Everything we stored for the application was stored as JSON. This requires marshaling (and unmarshaling) the JSON data from and to the objects in the application. We used Jackson mapper for this; it is widely used and allowed us more flexibility for circular references and the like.

The data is also schema-less. What this means is that if you want to add another field to a document, you just add it. You don't have to worry about all the documents that already exist. They are more than happy to exist without the new field, and changes to the schema are painless and quick.

Couchbase has additional features that set it apart from other document databases -- and other NoSQL databases, for that matter. It is a distributed key-value store, the data manager is written in C/C++, and the cluster manager is written in Erlang. By having a large amount of built-in mapreduce functions, many simple operations become very easy to implement. This also provides a great reference for writing our own mapreduce functions.

Couchbase has B-tree-based indexes. You can index anything from entire views to embedded documents. However, it lacks in geospatial indexes (currently available in experimental mode only), although this becomes an issue solely if you are working with location data. Couchbase also does not have in-place updates. This is not a huge sticking point because the working set remains in memory all the time, so the updates are superfast.

Couchbase does not have any concept of capped collections. This is only an issue if you are working mainly with log data analysis. Couchbase maintains the working set in memory, but you can have much more data than the amount of memory. Although Couchbase 2.0 has a relatively higher cache miss rate, its developers are working to optimize this in the next release.

The data models for the Pet Store in NoSQL

The Java Pet Store application was originally deployed in Apache Derby using Hibernate and JPA. Because Derby is an embedded implementation, we switched the configuration to use MySQL. This enabled us to have an in-depth look at the relational schema design.

Figure 2: The relational scheme design for the JavaEE Pet Store
Figure 2: The relational scheme design for the JavaEE Pet Store

The application is being driven primarily by two events:

  1. When a new customer registers
  2. When a new order is created by a customer

We built the Couchbase documents around these two events: Customer and Order. These documents were designed to contain related entities as embedded documents. We also created a third document type, Category, to store inventory information: categories, products, and items. This design decision enabled the creation of separate indexes (or views) so that they can be fetched quickly. This also provides examples of both linked and embedded documents.

JSON example for

Customer

JSON example for

Order

JSON example for Category

{

"id":"customer_marc",

"type":"customer",

"login":"marc",

"password":"marc",

"firstname":"Marc",

"lastname":"Fleury",

"telephone":null,

"email":"marc@jboss.org",

"homeAddress":{

"street1":"65 Ritherdon Road",

"street2":null,

"city":"Los Angeles",

"state":null,

"zipcode":"56421",

"country":"USA"

},

"dateOfBirth":1363794557891,

"age":null

}

{

"id":"Marc",

"type":"order",

"orderDate":null,

"customer":{

"id":1,

"login":"marc",

"password":"marc",

"firstname":"Marc",

"lastname":"Fleury",

"telephone":null,

"email":"marc@jboss.org",

"homeAddress":{

"street1":"65 Ritherdon Road",

"street2":"",

"city":"Los Angeles",

"state":"",

"zipcode":"56421",

"country":"USA"

},

"dateOfBirth":1363722361660,

"age":0

},

"orderLines":[

{

"id":null,

"quantity":1,

"item":{

"id":"item_Goldfish_Male Puppy",

"type":"item",

"name":"Male Puppy",

"description":"Lorem ...",

"unitCost":12,

"imagePath":"fish2.jpg"

}

},

{

"id":null,

"quantity":1,

"item":{

"id":"item_Angelfish_Large",

"type":"item",

"name":"Large",

"description":"Lorem ...",

"unitCost":10,

"imagePath":"fish1.jpg"

}

}

],

"deliveryAddress":{

"street1":"65 Ritherdon Road",

"street2":"",

"city":"Los Angeles",

"state":"",

"zipcode":"56421",

"country":"USA"

},

"creditCard":{

"creditCardNumber":"1234",

"creditCardType":"VISA",

"creditCardExpDate":"03/15"

},

}

{

"id":"category_Birds",

"type":"category",

"name":"Birds",

"description":"Any of ...",

"products":[

{

"id":"product_Amazon Parrot",

"type":"product",

"name":"Amazon Parrot",

"description":"Great companion for up to 75 years",

"items":[

{

"id":"item_Male Adult",

"type":"item",

"name":"Male Adult",

"description":"Lorem ...",

"unitCost":120,

"imagePath":"bird2.jpg"

},

{

"id":"item_Female Adult",

"type":"item",

"name":"Female Adult",

"description":"Lorem ...",

"unitCost":120,

"imagePath":"bird2.jpg"

}

]

},

{

"id":"product_Finch",

"type":"product",

"name":"Finch",

"description":"Great stress reliever",

"items":[

{

"id":"item_Male Adult",

"type":"item",

"name":"Male Adult",

"description":"Lorem...",

"unitCost":75,

"imagePath":"bird1.jpg"

},

{

"id":"item_Female Adult",

"type":"item",

"name":"Female Adult",

"description":"Lorem ...",

"unitCost":80,

"imagePath":"bird1.jpg"

}

]

}

]

}

As the application is deployed, the Categories, Products, and Items are generated in the database by the database populator class. When a new customer goes to the home page, he or she has the option to sign in or register. At that point, a new customer document is created. The customer can then browse the existing categories, create an order, and save it. When the order is saved, a new order document is created with order details. The associated items are added as embedded documents in the order.

Using Couchbase views for data access

Couchbase uses views to simplify access to the data. They are generally a cross between a view and a stored procedures in the relational database world. In our application, we added a type field to the entity objects and set it to the type of object. For example, in the Order entity type="order". Then we created a view to pull back any document where the type was whatever we were looking for.

Figure 3: Our view code in Couchbase
Figure 3: Our view code in Couchbase

You can also drill into a document to have the view return anything in the document. An example of this is an item, which is embedded in the order line in the order document. We set up a view that query for the item for an ID. That is how the application pulls back an item relating to a specific product, which is in a specific category. This JavaScript code is very flexible.

One important note on views: When you first put a document into the database, it does not immediately become available to the view. It must first be saved to disk.

What we changed in the code

To change this application to work in Couchbase, a few files needed to change, primarily around the services. This is where, in the RDBMS version, there was a reference to the EntityManager. We decided to use the DbPopulator class to initialize the Couchbase connection because it is a singleton that starts at application start. We then use this connection similarly to the Entity Manager. The following is an example of some of the code changes that were necessary.

Derby repository calls

@Inject

EntityManager em;

find : TypedQuery<Category> typedQuery = em.createNamedQuery(Category.FIND_BY_NAME, Category.class);

typedQuery.setParameter("pname", categoryName);

save : em.persist(category);

update : em.merge(category);

delete : em.remove(em.merge(category));

Couchbase repository calls

find : client.get(categoryName);

save : client.set(category.getName(), EXP_TIME, mapper.writeValueAsString(category));

update : client.replace(category.getName(), EXP_TIME, mapper.writeValueAsString(category));

delete : client.delete(category.getName());

Code changes were required for each of the CRUD operations for Customers, Orders, and Categories. Below is an example of the changes made to create a new customer.

The original code for customer creation

public Customer createCustomer(final Customer customer) {

if (customer == null) throw new ValidationException("Customer object is null");

em.persist(customer);

return customer;

}

The Couchbase code for customer creation

1 2 Page
Join the discussion
Be the first to comment on this article. Our Commenting Policies
See more