REST for Java developers, Part 3: NetKernel

A next-generation environment where SOA meets multicore

NetKernel is a URI-based microkernel environment that combines the best concepts from REST, Unix, and service-oriented architecture to build extremely productive, scalable, and maintainable systems. NetKernel not only makes it easy to build RESTful systems in Java and other languages that run on the JVM; it also changes the way you think about tying components together.

So far this series has given you an introduction to REST and a quick walk-through of Restlet, an API that makes it easier to build and consume RESTful interfaces in Java. You have seen that REST is only partially about URLs. At a deeper level, it is about logically connected components, late-binding resolution of representation, and linkage between related information resources. With all of this, you ought to feel that you know enough to successfully combine REST and Java; in fact, you do. But the story doesn't end there.

In this article, you will learn about NetKernel, a next-generation software environment that mixes what people like about REST, Unix pipes and filters, and service-oriented architecture (SOA). Not only can NetKernel make you more productive, but the abstractions also allow you to fully use modern, multicore, multi-CPU systems with little to no effort. At a minimum, if you understand NetKernel, you will understand REST at a deeper level.

Clashing abstractions and excessive coupling

The software industry has made great progress over the years, but there's still work to do. The clash of abstractions and excessive coupling between components remain some of the biggest problems needing to be solved. Countless efforts have been made in the object-oriented world to address the issue with interfaces, the Law of Demeter, separation of concerns, dependency injection, design patterns, remoting abstractions, interface definition languages, SOAs, and so on. At the end of the day, however, objects end up feeling like the wrong level of abstraction to bridge the worlds of software components, data, services, and concepts. They are fragile and break easily in the face of inevitable change. Modern languages like Ruby, Groovy, Scala, and Clojure can help, but at some point it feels like any language or object binding is too specific for all needs.

It is certainly possible to build good, running systems with the object-oriented mindset, but as requirements and technologies change, these systems become crufty and hard to maintain. Organizations end up committing to a combination of technologies and sticking with them. They settle on a particular language, a particular data model, a particular database schema, and a particular platform. Sure, you can change any one or more of these commitments over time, but it's still painful to do. This strategy of picking a small set of technologies can help minimize variance, and maintenance and training costs, but it also often prevents you from using the right tool for the job.

Many people are starting to realize that REST can fit into this space to provide an information-based solution to a wide variety of integration needs. Information references can be converted into specific forms on demand. With tools like the Restlet API, you have the flexibility of logical bindings but the convenience of local object references. The next article in this series will tie together many of these ideas into a larger vision; in this article I'll focus on the benefits that you might accrue from pushing the ideas of REST into the inner environment, not just the boundaries between systems.

Object-oriented abstractions

Today, we are used to dealing with URLs for documents. http://javaworld.com, for example, is a logical name that resolves to the JavaWorld homepage. You do not (necessarily) know anything about what's involved with producing the page, nor do you care what technologies are used; you simply want to consume the content on the page. By default, when you input the URL to your browser, you get a nicely formatted HTML page back. As long as that contract is maintained, the folks behind the scenes at JavaWorld are free to change their underlying technology without breaking the site for you.

The industry is becoming comfortable with the same idea for information in general. The URL http://someserver.com/customer/012345 might produce an XML document such as:

<customer id="012345">
      <name>Sue Jones</name>
      <address>
        <street>12345 Main St.</street>
        <city>Cleveland</city>
        <state>OH</state>
        <zipcode>44114</zipcode>
      </address>
    </customer>

As long as you get back what you expect from such a server, the information producers are free to support alternate representations (JSON, for example) for other clients, change their underlying technology, and so on.

In the object-oriented world, we approximate this behavior by using interfaces. But without some extraordinary gymnastics, we lack the freedom to return completely arbitrary object representations. Anything that satisfies the interface language binding can stand in, but it is difficult to substitute arbitrary resources (scripts, an alternate language implementation, raw data, and so on). We also can't easily identify the results of issuing arbitrary requests of our interface implementations. Because we can't identify a particular invocation explicitly, we can't determine whether a request has been processed already and whether it needs to be done again. Imagine a complicated XSL Transformation (XSLT) being applied to a document, or issuing a database query for a particular customer, or invoking a SOAP Web service. Sure, our implementations can cache at each level, but that becomes a nontrivial amount of work, and we lack the visibility behind the scenes to cache what is most useful to cache. There is no opportunity to optimize across these caches based on how the system is being used.

Objects do have their place in our software strategies, but there is room to consider other ways of building modern systems.

Other approaches to scaling software

NetKernel solves several of these problems. In many ways, it came to the solutions indirectly. The project, which began at HP Labs in the UK, originally aimed to help solve the impending mismatch between software and hardware. Shared-state object-based systems and other conventional techniques do not scale to take advantage of extra CPUs and cores. The NetKernel work was eventually spun off into a company called 1060 Research, Ltd. NetKernel is available through a dual (open source and commercial) license.

Others have tried to solve the hardware/software mismatch in various ways. Ericsson's Erlang is a language and runtime environment that forces developers to write functional software that scales. The ejabberd XMPP server from ProcessOne is an example of a modern, scalable system built in this manner. Clojure, Haskell, Scala, and F# are other increasingly popular languages that try to solve this (and other) problems with cool, modern features.

Another approach to taking better advantage of spare computational horsepower is to create an infrastructure that virtualizes the request and schedules it elsewhere. This tactic is cutely known as cloud computing these days, but the ideas have been around for decades as old-school time-sharing systems, SETI@Home, the Great Internet Mersenne Prime Search (GIMPS), distributed.net, Parabon Computation, Popular Power, grid computing, and so on.

Although each of these approaches works, and can work well, you often have to commit to the vision prematurely. To embrace Erlang, you must embrace its syntax and idiosyncrasies. There are certainly ways to integrate, say, Erlang and Java, but the bindings are brittle and often point-to-point. The cloud approach is also a useful and usable way of building flexible, scalable systems, but the infrastructure can easily overshadow smaller uses. It is the kind of thing you want to use if it is a good fit, but that may not always be obvious right away. Deciding after the fact could be a costly and expensive change.

Neither the language approach nor the cloud approach solves the caching issues, because they don't let you to identify uniquely both behavior and the result of calling that behavior. Some modern languages support a technique called memoization, but it is usually specific to a particular calculation, not a general strategy.

URLs for arbitrary behavior

NetKernel combines many of these ideas by supporting a URI-based, functional model that might resolve locally or might involve distributed requests across multiple servers completely transparently to the client. Internally it uses a microkernel and runs on the JVM so it is easy to build solutions using languages that are already comfortable to you. You can adopt new languages as needed (as long as they run on the JVM) and take full advantage of any extra cores or CPUs your hardware has available. This all sounds too good to be true, but it really isn't.

Modern object-oriented-based systems might abstract over a persistence layer by using an interface like this one:

public interface PersistenceLayer {
    Customer getCustomer(String id);
    List<Customer> getCustomers(String status);
    boolean storeCustomer(Customer a);
}

You could then imagine specific implementations for storing to flat files, an RDBMS via JDBC, an RDBMS via Hibernate, a remote fetch via a SOAP service -- anything really.

In NetKernel, you give this concept of a persistence layer a logical name and bind it to a specific instance. Internally, it uses an active URI model to address functionality, so you might call it active:fetch-customers, active:store-customer, and so on. This is how you'd refer to the functionality to fetch and update customer records. The active URI scheme takes key-value pairs as a series of +key@value arguments.

For reasons that are probably not yet clear, you want the referenced values to be URIs as well if possible. Retrieval of customer accounts might be active:fetch-customer+id@cust:012345 or something similar. NetKernel would interpret this URI based on a certain module configuration and map the request to the behavior to perform the request. Just as with a URL and a Web site, you have no idea what kind of technology is used to retrieve the information. As a client who needs the information, you probably do not care. Documentation will tell you what form it will come back in. As long as it stays in that form, the module definition can change and your code will not break. This is the benefit of late-binding, logically connected components to help improve maintainability. You are treating the result of fetching a customer record as an information resource. Weakly typed and dynamic languages help get you started in this direction; this approach is just the next step.

NetKernel allows many ways to invoke this kind of behavior. For this article, we'll use BeanShell for basic scripting:

main() {
    req = context.createSubRequest("active:fetch-customer");
    req.addArgument("id", "cust:012345");
    resp = context.issueSubRequest(req);
    context.createResponseFrom(resp);
}

Behind the scenes, NetKernel flattens this request to the full active:fetch-customer+id@cust:012345 URI. This represents a stateless, RESTful request; all of the information necessary to satisfy the request is passed in as part of the request. It identifies both the handler (active:fetch-customer) and the full state (+id@cust:012345). It treats the invocation of this behavior with this parameter as an information resource. This is an example of bringing REST inside. The behavior is invoked through a logical name, just as you might invoke a REST service at http://someserver/customer/012345.

The request is scheduled asynchronously on one of the microkernel threads. If you are running on a multicore, multi-CPU box, this will automatically scale up to take advantage of those resources without you (or the module implementor) having to think much about it. This is possible because what comes back is an immutable view of the result set, not unlike an HTML page or some JSON that might be returned as the result of issuing an HTTP request.

With a loosely typed language like BeanShell (or Groovy, also supported by NetKernel), you often do not know or care in what form information is returned. If you know that it came back as some form of XML by default, you might add the line:

resp.setMimeType("text/xml");

Using NetKernel: A short demo

It's time to give NetKernel a try, using a sample application. Download the application source and install it according to the directions in the included README.txt file. There's an XML skew to the configuration files and these examples, but NetKernel has fundamentally nothing to do with XML per se. It would be just as easy to write custom modules to deal with your domain objects if you wanted to.

The sample module is configured via the module.xml file to accept requests that match the following patterns (as well as others):

<export>
    <uri>
      <match>active:fetch-customer.*</match>
      <match>ffcpl:/customer.*</match>
      ...
    </uri>
  </export>
1 2 3 Page 1