Optimize with a SATA RAID Storage Solution
Range of capacities as low as $1250 per TB. Ideal if you currently rely on servers/disks/JBODs
Page 4 of 5
The notion of a fabric of resources that are individually described, queried, and resolved may seem unmanageable or like science fiction. For organizations that are used to large, manual, centralized efforts to standardize on everything, it may seem anarchic to allow resources to grow organically and be described by anyone. The same people would probably not believe the Web possible in the first place if there were not already ample proof of its success.
To underscore that organic development of distributed data and services is possible, you need look no further than the Linking Open Data project. Begun approximately a year ago, it has become the poster child of Web and Semantic Web architecture sexiness and feasibility. A small group of loosely affiliated professionals around the world has successfully described and linked billions of resources through billions of relationships, at minimal cost. This does not undervalue their efforts; they simply required nothing of the large, centralized data-model planning most organizations go through to deal with much less complicated models.
Figure 2 shows a representation of the datasets involved as of March 2009. Each individual collection of data existed previously on the Web, often for quite some time. Like anything on the Web, you have very little knowledge about what is going on behind the scenes. You are simply able to ask for the content and subcontent by navigating RESTful links. What is different is that these silos of useful data are now connected to one another. The terms in one set are mapped into the terms of another via RDF and other Semantic Web technologies. This makes it possible to take facts about a resource in one set and mix it in with information from another. Clearly you have to have at least some trust in the sources as well as those making the connections, but the technology allows you to decide whom to trust and when. The bigger hurdle is allowing the fluid connectivity in the first place.
The data sets that have been integrated come from a wide set of domains, including metadata about music, Wikipedia entries, geographic locations, and census information,. The kind of dynamic systems that are possible with this web of linked data, accessible through RESTful services, is highlighted by the Flickr Wrapper Web service developed at the Freie Universität of Berlin. This site finds a reference to a concept that it resolves via DBPedia (metadata extracted automatically from Wikipedia) through a RESTful interface. It analyzes the content that is returned for alternate terms by which the resource might be known and geographic location references. This information is then used to parameterize a Flickr query for images tagged with those words and constrained to a particular geographic region. With precious little code, it becomes easy to find high-quality relevant pictures of Los Angeles (see Figure 3) or Yokohama, Japan.
You can also start to query Twitter or Crunch Base for information about people or companies and convert it to RDF on the fly for use in this model. Give it a try. Go query and explore the Linked Data cloud directly -- try a text search for information about JavaWorld, for instance.
As organizations move from linking public data to linking private data, security becomes a serious issue quickly. Not only does REST enable technical freedom and new business functionality; it also can make things more secure in the process. A frequent early reaction to REST is nervousness about leaving cookie-crumb trails to all of your sensitive information. In a world of hackers, do you really want to give them a path to follow to get to the data? Somehow, it seems much safer in a database, locked up and inaccessible to the world. The problem is that this same safety prevents the data from being as useful as it could be.
The trick is to realize that with REST, giving something a name and asking for it are two separate activities. There are things in this world that you know exist, but that does not mean you have access to them. You may somehow figure out how to contact a celebrity, but that does not mean you will get to talk to her. You may know that there are codes to nuclear weapons arsenals, but (thankfully), you do not get to see them. The distinction between knowing about something's existence and getting to it applies to RESTful URLs too.
Consider what alternate strategies induce in your systems. If you want to orchestrate a handful of (non-RESTful) Web services, you must pass actual data between them. A first service might query a database and return some results. A second service might sort and filter the results. A third service initiates a new business process based on the data, and so on. The model usually followed is to lock down these services with some manner of authentication and authorization system ("Bob can call this service"). The data itself is often left unprotected as it is transferred from service to service. Sure, you can lock the transaction down to protected channels with SSL, but it sits in an unprotected state within each service. If those services have no need to access sensitive information, they should not be given it. Imagine if the data were medical results moving between a physician's office, an insurance company, and your place of employment. Suddenly these patterns seem a lot more relevant.
Cryptography fans will quickly insist that the point of standards like XML Encryption is to allow sensitive information to be shared through potentially untrusted intermediaries. This is true, but it's exactly the kind of thinking that has made WS-*-based systems heavy, cumbersome, and expensive. First, this model introduces the problem of key management, a complex issue that is rarely given the thought needed to be successful and secure. Second, it locks you into a specific format, usually. Think instead of a model in which the results of queries are not data itself, but references to data through RESTful URLs. Now, only those steps in an orchestration that need access to the information will get it.
A huge part of minimizing risk for internal fraud and the expense of external audits is to keep access to sensitive information very narrowly scoped. What you would like to be able to say is "Bob is allowed to call this service with this data in this context." The military has always had this model (with concepts like "eyes only" and "need to know"), but the corporate world is still learning it. Barings Bank, one of the United Kingdom's oldest investment banks, went under because Nick Leeson was able to make trades against markets that he should not have been given permission to use. Effective security is not just about granting access to services; it is about doing so within a context for particular data sets. RESTful APIs make this level of sophistication much easier by adopting global, logical, resolvable names to data. These are safer to pass around and easier to protect than data itself.